Skip to main content

Command Palette

Search for a command to run...

Balancing Velocity vs Long-Term Scalability: Engineering Philosophy from a Startup

Updated
11 min read
A

Software Engineer

This post was written with Claude Code, distilled from my engineering journal and experiences over the years at Hotelzify.

Introduction

Every startup engineering team faces a fundamental tension: ship fast to validate product-market fit, or build robust systems for long-term scale?

The answer isn't binary. At Hotelzify, we developed a philosophy that balances both. This post shares our approach, decision-making frameworks, and lessons from optimizing a hotel booking platform.

For technical implementation details and specific optimizations, see: Critical Engineering Optimizations


The Velocity vs Scalability Dilemma

The Problem: You can't optimize for both simultaneously at every stage.

Common Pitfalls:

  • Ship too fast → technical debt compounds, rewrites needed

  • Optimize too early → slow validation, missed market opportunities

Our Approach: Velocity first, optimization second with a clear path to scale.


Our Development Philosophy

1. Functional Correctness First

Get business logic right before optimizing performance. A fast feature that doesn't meet requirements is worthless.

Example: When implementing member rate pricing for Google Hotel Center, we reused existing promotion lookup code. It was functionally correct but slow (150 database queries per batch). We shipped it, validated it worked correctly, then optimized.

Lesson: Functional correctness → production validation → optimization based on real data.

2. Incremental Optimization

Once a feature is validated, improve scalability based on actual usage patterns, not assumptions.

Real Case: Promotion lookup worked fine for occasional calls. When we extended it to batch operations (150 calls), performance became a bottleneck. We only discovered this through production metrics.

Lesson: When reusing code in new contexts, verify performance characteristics scale appropriately.

3. Measure Real Impact

Don't optimize prematurely. Ship, measure production performance, then optimize bottlenecks that actually matter.

Tools We Use:

  • EXPLAIN ANALYZE for query plans

  • Application-level timing (console.time)

  • Database profiling

  • Production metrics monitoring

Lesson: All our optimizations came after features were live with real performance data.


Core Engineering Principles

1. Backward Compatibility is Non-Negotiable

Every change we make maintains full backward compatibility.

How:

  • Optional parameters with safe defaults

  • Graceful fallbacks for missing data

  • No breaking changes to existing APIs

Benefits:

  • Continuous deployment without coordination

  • No migration periods

  • Existing callers continue working

Example Pattern:

const someFunction = async (
  requiredParam,
  optionalNewParam = null  // Safe default
) => {
  if (!optionalNewParam) {
    optionalNewParam = extractFromContext();
  }
  // Function works with or without new parameter
};

2. Database as Configuration, Not Code

Configuration driven by database fields, not environment variables or code deployments.

Examples:

  • is_member_rate_ghc → controls strike-through pricing

  • mlos → controls length of stay restrictions

  • chainId → determines API token selection

Benefits:

  • Instant rollback (SQL UPDATE)

  • No code deployment for config changes

  • Per-entity control granularity

  • Clear audit trail

Real Impact: When implementing undocumented Google API features, database flags let us enable for 1 hotel, test, then gradually roll out. Rollback was a single SQL query.

3. Graceful Degradation

All integrations handle failures without breaking core functionality.

Pattern:

try {
  const result = await externalApiCall();
  return result;
} catch (error) {
  console.error('External API failed:', error);
  return null;  // Safe default
  // Core functionality continues
}

Philosophy: Degraded service > broken service.

Example: Multi-vendor token selection. Any error (missing hotelId, database error, invalid chainId) → default token. System continues working.

4. Optimize in Layers

Apply optimizations incrementally, not all at once.

Layered Approach:

Layer Effort Impact When
1. ORM raw mode Low 30-50% Always
2. Indexes Medium 20-40% EXPLAIN shows scans
3. NULL optimization Medium 10-20% NULL checks dominate
4. Raw SQL High 40-60% Critical paths (<20ms)

Decision: Start Layer 1 (always beneficial). Add layers only if profiling shows bottleneck and impact justifies maintenance cost.

Benefit: Layers stack multiplicatively (raw mode + indexes = 50-70%).

5. Plan Rollback From Day One

Every feature includes a rollback strategy.

Mechanisms:

  • Database flags (toggle instantly)

  • Environment variables (remove and restart)

  • Code changes (revert commits)

Requirement: No permanent side effects that can't be undone.

Result: Confident production deployments. We can enable a feature for 1 entity, monitor, and rollback in seconds if needed.


Decision-Making Frameworks

When to Optimize

Red Flags (Don't Optimize):

  • Premature (no measurement showing problem)

  • Wrong bottleneck (optimizing non-critical path)

  • Negligible impact (improvement too small)

  • High complexity (makes code unmaintainable)

  • Temporary problem (will resolve with other changes)

Green Lights (Do Optimize):

  • Measured impact (clear metrics)

  • User-facing (affects response time/throughput)

  • Scalability (problem grows with usage)

  • Maintainable solution (doesn't add significant complexity)

  • Safe rollback (can revert easily)

Trade-off Analysis Pattern

Every optimization involves trade-offs. Document what you're giving up.

Example Trade-offs We Made:

What We Gave Up What We Gained Why Acceptable
5-10 MB memory 150x faster batch processing Memory is cheap, DB load expensive
ORM type safety 40-60% performance Critical paths only, most code uses ORM
Simple JOINs 90% less DB CPU Same-datacenter, network not bottleneck
50 rows transferred 250 rows (5x) but 97% less CPU Row size small (~1KB), worth it

Framework: Explicitly document trade-offs to make informed decisions.

Batch + Cache vs Individual Queries

Use Batch + Cache When:

  • Dataset <50MB

  • Iterations >50

  • Static data during processing

  • Complex filtering expensive in SQL

Avoid When:

  • Dataset >100MB

  • Data changes mid-loop

  • Memory constrained

  • Few iterations (<10)

Key Insight: Network round-trip (1-5ms per query) compounds at high iteration counts.

JOIN vs Separate Queries

Use Separate Queries When:

  • Complex aggregations (GROUP BY, JSON_AGG)

  • High concurrency needed

  • Lock duration impacts other queries

  • Same-datacenter deployment

Use JOIN When:

  • Simple joins without aggregation

  • Very small result sets

  • Cross-region database (high network latency)

  • Atomic consistency critical

Key Insight: Lock duration > execution time. A 35ms query holding locks is worse than two 6ms queries releasing locks early.

Concurrency Impact:

  • JOIN (10 concurrent): 10 × 35ms = 350ms (serialized)

  • Separate (10 concurrent): ~12ms total (interleaved)


Phased Rollout Strategy

Complex features benefit from multi-phase rollouts.

Case Study: MLOS Implementation

Phase 1: Database (Day 1)
- Add column, DEFAULT 1, nullable, indexed
- Zero impact on existing code

Phase 2: Service Layer (Day 2-3)
- Update CRUD operations
- Default to 1 if not provided
- Existing callers still work

Phase 3: API Response (Day 4-5)
- Include in responses
- Frontend can display
- No enforcement yet

Phase 4: Google Integration (Day 6-10)
- Push to Google Hotel Center
- Feature fully live

Benefits:

  • Each phase independently testable

  • No breaking changes

  • Incremental value (Phase 3 valuable without Phase 4)

  • Can stop at any phase

Philosophy: Enables continuous deployment without cross-team coordination.


Lessons from Real Optimizations

1. Hidden Performance Characteristics

What Happened: A promotion lookup function was built for single calls. We reused it in a batch operation (150 calls). Function was hidden in a utility method, so repeated queries weren't obvious.

Impact: 150 database queries per batch, 7-15 seconds execution time.

Solution: Batch fetch once + in-memory filtering → 1 query, 0.2 seconds.

Lesson: When reusing code, always consider performance characteristics in new contexts.

2. Lock Duration vs Execution Time

Discovery: A 35ms JOIN query was causing concurrency issues despite being "fast".

Analysis: Query held locks on multiple tables. With 10 concurrent operations, serialization created 350ms bottleneck.

Solution: Two separate queries (6ms each) with independent lock release → 12ms total, interleaved execution.

Lesson: Lock duration matters more than execution time. Shorter locks improve concurrency.

3. In-Memory Can Beat Database

Conventional Wisdom: Database is optimized for aggregations, use it.

Reality: For small datasets (hundreds of rows), JavaScript filtering is faster.

Why:

  • No network round-trip (1-5ms per query)

  • Modern JavaScript engines highly optimized

  • No lock contention

  • Simpler queries execute faster

Result: 150 queries → 1 query, 99% less database load.

Lesson: Question conventional wisdom with measurement.

4. Database Flags for Instant Rollback

Challenge: Implementing undocumented Google API (RateModifications).

Risk: Could break anytime, no official support.

Solution: Single database field: is_member_rate_enabled

  • TRUE → send rate modification

  • FALSE → skip

Rollout:

  • Phase 1: 1 hotel (test)

  • Phase 2: 10 hotels (monitor)

  • Phase 3: All hotels (if stable)

Rollback: One SQL UPDATE, instant effect.

Lesson: For undocumented/beta APIs, design for instant rollback without code deployment.

5. Technical Debt in Queries

Discovery: A reporting query had 7 JOINs, only 3 were actually used.

Root Cause: Features were removed/refactored, JOINs remained.

Audit Process:

  1. List all SELECT columns

  2. Trace to source tables

  3. Identify unused tables

  4. Remove safely

Impact: Simpler query, faster execution, easier maintenance.

Lesson: Technical debt accumulates in queries too. Regular audits prevent degradation.


Anti-Patterns We Avoided

1. Premature Abstraction

Didn't create generic "optimization framework" upfront. Each optimization was specific to its problem. Abstractions emerged naturally after multiple similar patterns.

Why: Early abstractions often miss the mark. Let patterns emerge from real use cases.

2. Over-Engineering

Didn't add features "just in case". MLOS implementation added exactly what was needed in each phase, nothing more.

Why: YAGNI (You Aren't Gonna Need It). Build for current needs, not hypothetical futures.

3. Ignoring Trade-offs

Explicitly documented what we gave up for each optimization. No solution is perfect; knowing trade-offs helps make informed decisions.

Why: Honest assessment prevents regret later. "We chose X knowing we gave up Y" is better than "Why is Y broken?"

4. Optimizing Without Measuring

Every optimization started with metrics. No guessing about bottlenecks.

Why: Intuition about performance is often wrong. Measurement reveals truth.


The Compound Effect

Individual optimizations stack:

Database Optimizations:

  • Batch + cache: 150x reduction

  • Separate queries: 90% less DB CPU

  • ORM raw mode + indexes: 50-70% faster

System Impact:

  • Price push: 15s → 0.2s (75x faster)

  • Database CPU: 90% reduction

  • Concurrent throughput: Improved via shorter locks

Development Velocity:

  • MLOS: 4 phases in 2 weeks

  • Strike-through pricing: Instant rollout control

  • Multi-vendor: 17 functions, zero breaking changes

Philosophy Enabled This: Velocity first → ship and validate → measure → optimize → maintain backward compat → compound improvements.


Practical Takeaways

For Early-Stage Startups

  1. Velocity First: Get to product-market fit before optimizing

  2. Functional Correctness: Validate business logic before performance

  3. Measure Real Usage: Production data reveals actual bottlenecks

  4. Plan Rollback: Database flags enable confident experimentation

For Growing Startups

  1. Optimize Strategically: Fix real bottlenecks, not hypothetical ones

  2. Backward Compatibility: Enables continuous deployment

  3. Phased Rollouts: De-risk complex features

  4. Document Trade-offs: Inform future decisions

For Scale-ups

  1. Lock Duration: Matters more than execution time

  2. In-Memory Processing: Can beat database for small datasets

  3. Technical Debt Audits: Prevent query degradation

  4. Layered Optimization: Incremental improvements stack


Conclusion

Balancing velocity and scalability isn't about choosing one over the other. It's about:

  1. Sequence: Velocity first, optimization second

  2. Measurement: Real data, not assumptions

  3. Compatibility: Never break existing functionality

  4. Rollback: Plan for instant reversal

  5. Trade-offs: Document what you're giving up

  6. Iteration: Small improvements compound over time

This philosophy enabled us to:

  • Ship features quickly (validate product-market fit)

  • Optimize based on real usage (no premature optimization)

  • Maintain system stability (backward compatibility)

  • Scale confidently (measured improvements)

Performance optimization is a journey, not a destination. Start with measurement, make targeted improvements, measure impact, repeat.


Discussion

How does your team balance velocity and scalability? What frameworks do you use for deciding when to optimize? Share your experiences in the comments.

For technical details on specific optimizations, see: Critical Engineering Optimizations

Questions? Happy to discuss in the comments.

More from this blog

Akshay R R | Engineering, Startups

21 posts

Software engineering concepts in a concise manner! Documenting my learning journey!