Skip to main content

Command Palette

Search for a command to run...

Shipping Fast & Building in Iterations: A Startup Engineering Playbook

Updated
17 min read
A

Software Engineer

This post was written with Claude Code, distilled from my engineering journal and experiences over the years at Hotelzify.

Why Velocity Matters More Than Perfection

In startups, time is the most precious resource. Every week spent building the "perfect" feature is a week not learning from real users. Every month spent on a monolithic release is a month of compounding risk.

But speed without strategy is chaos. Ship too fast with breaking changes, and you'll spend weeks firefighting instead of building. Ship too slow with over-engineered solutions, and you'll miss market windows.

The answer? Iterative shipping with backward compatibility.

This isn't just a philosophy—it's a survival skill. Here's how we've shipped major features at Hotelzify without breaking existing systems, learned from production data, and maintained velocity even as our codebase grew.


The Core Principles

1. Ship in Phases, Not Monoliths

Never ship a large feature all at once. Break it into:

  • Phase 1: Infrastructure (database schema, models, basic APIs)

  • Phase 2: Core logic (without affecting existing flows)

  • Phase 3: Integration (connect to existing systems with flags)

  • Phase 4: Rollout (gradual enablement with monitoring)

Each phase ships to production independently. Each phase can be rolled back without touching the others.

2. Backward Compatibility is Non-Negotiable

Every change must answer: "What happens to existing code that doesn't know about this feature?"

Practical rules:

  • New database columns: Always NULL or sensible defaults

  • API parameters: Always optional

  • Function parameters: Always default values

  • Feature toggles: Always OFF by default

  • Error handling: Always fail gracefully

If existing code breaks, you haven't shipped a feature—you've shipped a bug.

3. Test in Production (Yes, Really)

Staging environments lie. They have:

  • Different data distributions

  • Different load patterns

  • Different edge cases

  • Different vendor behaviors

The only way to truly validate a feature is to ship it to production with:

  • Single hotel testing (hotel ID hardcoded)

  • Feature flags (database-level toggles)

  • Extensive logging (track every decision)

  • Graceful failure (return empty, don't throw)

Philosophy: "If in doubt, do nothing." Returning an empty result is always safer than returning a wrong result.


Real Example 1: MLOS (Minimum Length of Stay) - 4 Phases

The Problem: Our v1 booking engine didn't support minimum stay requirements. Hotels need to enforce "book at least 3 nights" policies.

The Wrong Approach: Add MLOS column, update all booking logic, test everything, ship in 3 weeks.

The Right Approach:

Phase 1: Database Schema (Day 1)

-- Add column with safe defaults
ALTER TABLE rooms_pricing_and_inventory
ADD COLUMN mlos TINYINT NULL DEFAULT 1
COMMENT 'Minimum length of stay requirement';

-- History table too (for audit trail)
ALTER TABLE rooms_pricing_and_inventory_history
ADD COLUMN mlos TINYINT NULL DEFAULT 1;

-- Index for queries
CREATE INDEX idx_mlos ON rooms_pricing_and_inventory(mlos);

Backward compatibility:

  • NULL values treated as "no restriction"

  • Default value 1 means "1 night minimum" (no restriction)

  • Existing rows automatically get NULL → safe

  • Existing code doesn't query this column → safe

Ship it. Nothing breaks. Monitor for 24 hours.

Phase 2: Model & API Layer (Day 2-3)

// Sequelize model (src/models/dbmodels/rooms_pricing_and_inventory.js)
mlos: {
  type: DataTypes.TINYINT,
  allowNull: true,
  defaultValue: 1,
}

// Validation schema (yup)
mlos: yup.number().integer().min(1).max(365).optional().nullable(true)

API endpoints now accept MLOS but don't require it:

  • POST /pricing with mlos: 3 → stores 3

  • POST /pricing without mlos → stores 1 (default)

Backward compatibility:

  • Old API calls (no mlos param) → defaults to 1

  • New API calls (with mlos param) → stores value

  • Frontend doesn't need immediate update

Ship it. Test manually with Postman. Monitor API logs for validation errors.

Phase 3: Service Layer Defaults (Day 4-5)

// In all create/update operations
if (params.mlos == null) {
  params.mlos = 1;  // Default: no restriction
}

// In preservation logic (when updating price but not MLOS)
let needToChangeMlos = params.mlos == null;
if (needToChangeMlos) {
  params.mlos = lastU ? (lastU.mlos || 1) : 1;
}

Key insight: When updating price, preserve existing MLOS. When creating new records, default to 1.

Backward compatibility:

  • Channel manager imports (HyperGuest, Channex) → default to 1

  • Hourly booking system → default to 1

  • Booking availability reduction → default to 1

Ship it. Monitor all create/update operations. Check channel manager syncs.

Phase 4: Future (Not Yet Shipped)

Out of scope for initial release:

  • Filtering logic (reject bookings shorter than MLOS)

  • Frontend UI (calendar view, bulk updates)

  • Channel manager push (sync to HyperGuest)

  • Google Hotel Ads integration

Why not ship now?

  • Need production data on how hotels use MLOS

  • Need to validate edge cases (multi-night bookings, rate plans)

  • Need UI/UX design for admin panel

  • Can test with manual SQL updates first

When to ship: After 2-4 weeks of data collection. When hotels request the feature.


Real Example 2: Strike-Through Pricing - Database Flags First

The Problem: Google Hotel Ads wants to show "member-only pricing" with public rates struck through. Complex XML format, high risk of breaking existing price sync.

The Iterative Approach:

Phase 1: Database Flag (Week 1)

ALTER TABLE hotel
ADD COLUMN is_member_rate_ghc BOOLEAN DEFAULT FALSE;

-- Enable for ONE test hotel only
UPDATE hotel SET is_member_rate_ghc = TRUE WHERE id = 2740;

Ship immediately. Existing price sync unaffected. Flag sits dormant.

Phase 2: XML Generation (Week 1)

// New function: Generate strike-through XML
async function generateStrikeThroughTransactionXml({ hotelId, checkIn, checkOut }) {
  // Safety check #1: Hotel ID whitelist (testing only)
  if (hotelId !== 2740) return [];

  // Safety check #2: Feature flag
  const hotel = await hotelModel.findOne({ where: { id: hotelId } });
  if (!hotel.isMemberRateGhc) return [];

  // Safety check #3: Valid data
  const pricings = await _getRoomAriFutureForGoogle(hotelId, checkIn, checkOut);
  if (!pricings?.length) return [];

  // Calculate public + member rates
  const transactions = pricings.map(price => {
    const publicRate = price.priceBeforeTax;
    const memberRate = publicRate * 0.9;  // 10% discount
    const publicTax = getCalculatedTax(publicRate);
    const memberTax = getCalculatedTax(memberRate);

    return generateTransactionXml({
      publicRate,
      publicTax,
      memberRate,
      memberTax,
      rateRuleId: 'member_discount_existence'
    });
  });

  return transactions;
}

Graceful failure strategy: Any error returns []. Existing OTA price sync continues normally.

Ship to production. Call the function, but don't send to Google yet. Just log the XML.

Phase 3: Google Integration (Week 2)

async function sendHotelRateAmount({ hotelId, checkIn, checkOut }) {
  // Step 1: Send normal OTA messages (existing flow)
  const otaXml = getSendHotelRateAmountXml({ hotelId, checkIn, checkOut });
  await pushToGoogle('/hotel_rate_amount_notif', otaXml);

  // Step 2: Send Transaction messages (new flow, fail-safe)
  try {
    const transactions = await generateStrikeThroughTransactionXml({
      hotelId, checkIn, checkOut
    });

    if (transactions.length > 0) {
      for (const txn of transactions) {
        await axios.post(
          'https://www.google.com/travel/hotels/uploads/property_data',
          txn,
          { headers: { 'Content-Type': 'application/xml' } }
        );
      }
    }
  } catch (error) {
    console.error('Strike-through failed, continuing...', error);
    // Don't throw—let OTA sync succeed
  }
}

Critical insight: Send OTA messages first, then Transaction messages. If Transaction fails, hotels still get normal pricing. Google uses newest price, so Transaction overwrites OTA.

Ship it. Watch logs. Check Google Hotel Center feed status.

Phase 4: Gradual Rollout (Week 3-4)

// Week 3: Remove hotel ID check, rely only on flag
if (hotelId !== 2740) return [];  // DELETE THIS

// Week 3: Enable for 5 hotels
UPDATE hotel SET is_member_rate_ghc = TRUE WHERE id IN (2740, 3001, 3002, 3003, 3004);

// Week 4: Monitor price accuracy scores in Google Hotel Center

// Week 5: Enable for 50% of hotels
UPDATE hotel SET is_member_rate_ghc = TRUE WHERE MOD(id, 2) = 0;

// Week 6: Full rollout
UPDATE hotel SET is_member_rate_ghc = TRUE WHERE ghc_enabled = TRUE;

At every step: Monitor error rates, price accuracy, Google feed status. If anything breaks: Flip the flag to FALSE. Instant rollback.


Real Example 3: Multi-Vendor Support - 17 Functions, Zero Breaking Changes

The Problem: HyperGuest channel manager has multiple sub-brands (OWNIA, etc.) with separate API tokens. Existing code uses single global token. Need to use different tokens based on hotel's chain.

Complexity:

  • 17 functions call HyperGuest API

  • Functions spread across 1,500+ line service file

  • External callers in booking, search, subscription systems

  • MongoDB and MySQL data sources

  • Cannot break existing integrations

The Challenge: Update 17 functions without breaking ANY existing code.

Phase 1: Foundation (Day 1-2)

// Helper: Fetch chain ID from hotel ID
const _getChainIdFromHotelId = async (hotelId) => {
  if (!hotelId) return null;  // Null-safe

  try {
    const hotel = await hotelModel.findOne({
      where: { id: hotelId },
      attributes: ['chainId'],
      raw: true,
    });
    return hotel?.chainId || null;  // Safe access
  } catch (error) {
    console.error('Error fetching chainId:', error);
    return null;  // Fail gracefully
  }
};

// Helper: Select correct token
const _getHyperguestToken = (chainId) => {
  if (chainId === 60 && HYPERGUEST_API_TOKEN_OWNIA) {
    return HYPERGUEST_API_TOKEN_OWNIA;
  }
  return HYPERGUEST_API_TOKEN;  // Default for all others
};

Safety guarantees:

  • null chainId → default token (existing behavior)

  • Missing environment variable → default token

  • Database error → default token

  • Invalid hotelId → default token

Ship it. No behavior changes yet. Just helper functions.

Phase 2: Update Common Functions (Day 3-4)

// Before (old signature)
async function _commonGetAPI(url, headers) {
  const token = HYPERGUEST_API_TOKEN;
  return axios.get(url, { headers: { Authorization: token } });
}

// After (backward compatible signature)
async function _commonGetAPI(url, headers, chainId = null) {
  const token = _getHyperguestToken(chainId);
  return axios.get(url, { headers: { Authorization: token } });
}

Backward compatibility:

  • Old calls: _commonGetAPI(url, headers) → chainId = null → default token

  • New calls: _commonGetAPI(url, headers, chainId) → chain-specific token

  • Zero breaking changes

Apply to 4 common functions:

  • _commonGetAPI

  • _commonPostAPI

  • _commonPostBookingAPI

  • _commonPostCancelBookingAPI

Ship it. All existing calls work unchanged.

Phase 3: Update High-Level Functions (Day 5-7)

Pattern A: Functions with hotelId parameter

// Booking creation
async function pushBookingToHyperguest(data) {
  const chainId = await _getChainIdFromHotelId(data.hotelId);
  return _commonPostBookingAPI(url, data, chainId);
}

// Search
async function checkAvailabilityInHyperguest(booking) {
  const chainId = await _getChainIdFromHotelId(booking.hotelId);
  return _commonGetAPI(url, null, chainId);
}

Pattern B: Functions with chainId parameter (external callers)

async function getRoomAvailabilityForBookingFromHyperguestRateplansV2({
  vendorHotelId,
  chainId = null,  // Optional, defaults to null
  // ... other params
}) {
  // External caller can pass chainId, or we fetch it
  if (!chainId) {
    const vendorMapping = await getVendorHotelMapping({ vendorHotelId });
    chainId = await _getChainIdFromHotelId(vendorMapping.hotelId);
  }

  return _commonGetAPI(url, null, chainId);
}

Pattern C: Subscription functions (MongoDB source)

async function disableSubscriptionToPushMethodolody(subscriptionId) {
  // Fetch subscription from MongoDB
  const subscription = await mongoService.getMyData(
    "hyperguest",
    "hotel_subscriptions",
    { subscriptionId }
  );

  // Get chainId from hotel
  const chainId = await _getChainIdFromHotelId(subscription?.hotelId);

  return _commonGetAPI(url, null, chainId);
}

Critical fix: Vendor mapping

// Wrong: vendorHotelId passed as hotelId
const chainId = await _getChainIdFromHotelId(hotelId);  // WRONG

// Right: Map vendorHotelId to actual hotelId first
const vendorMapping = await getVendorHotelMapping({ vendorHotelId: hotelId });
const actualHotelId = vendorMapping?.hotelId;
const chainId = await _getChainIdFromHotelId(actualHotelId);  // CORRECT

Ship in batches: 5 functions per deploy. Monitor each deploy for 24 hours.

Phase 4: Verification (Day 8)

17 functions updated:

  • 3 booking/cancellation functions

  • 6 search/availability functions

  • 6 subscription management functions

  • 2 static data/ARI sync functions

Verification checklist:

// Test each scenario
- chainId = null → default token ✅
- chainId = 60 → OWNIA token ✅
- chainId = 123 → default token ✅
- hotelId not found → default token ✅
- OWNIA token missing → default token ✅
- MongoDB error → default token ✅
- Database error → default token ✅

Edge cases covered:

  • Missing HYPERGUEST_API_TOKEN_OWNIA env var

  • Database lookup failure

  • MongoDB document missing hotelId

  • Vendor mapping not found

  • Invalid hotelId values

  • Null propagation through call chain

Ship final version. Enable OWNIA token in production .env. Test with chainId = 60 hotels.


The Phased Rollout Pattern

Phase 1: Infrastructure
├─ Database schema
├─ Models
└─ Helper functions
   ↓ DEPLOY & MONITOR (24h)

Phase 2: API Layer
├─ Validation
├─ Defaults
└─ Optional parameters
   ↓ DEPLOY & MONITOR (48h)

Phase 3: Service Logic
├─ Core functionality
├─ Feature flags
└─ Graceful failures
   ↓ DEPLOY & MONITOR (1 week)

Phase 4: Integration
├─ Connect to existing flows
├─ Single hotel testing
└─ Extensive logging
   ↓ DEPLOY & MONITOR (1 week)

Phase 5: Gradual Rollout
├─ 1 hotel → 5 hotels → 50% → 100%
├─ Monitor at each step
└─ Instant rollback capability

Key insight: Each arrow is a production deploy. Each phase can be rolled back independently.


Backward Compatibility Patterns

Pattern 1: Nullable Columns

-- WRONG
ADD COLUMN mlos TINYINT NOT NULL;  -- Breaks existing rows

-- RIGHT
ADD COLUMN mlos TINYINT NULL DEFAULT 1;  -- Safe

Pattern 2: Optional Parameters

// WRONG
function updatePricing(hotelId, mlos) {  // Breaking change
  // ...
}

// RIGHT
function updatePricing(hotelId, mlos = null) {  // Backward compatible
  if (mlos == null) mlos = 1;
  // ...
}

Pattern 3: Feature Flags

// WRONG
if (hotel.chainId === 60) {  // Always active
  useNewToken();
}

// RIGHT
if (hotel.isMemberRateGhc && hotel.chainId === 60) {  // Opt-in
  useNewToken();
}

Pattern 4: Graceful Failures

// WRONG
async function getStrikeThrough() {
  const data = await fetchData();
  return processData(data);  // Throws on error
}

// RIGHT
async function getStrikeThrough() {
  try {
    const data = await fetchData();
    if (!data) return [];  // Safe empty response
    return processData(data);
  } catch (error) {
    console.error('Strike-through failed:', error);
    return [];  // Don't break caller
  }
}

Pattern 5: Preserving Existing Values

// When updating price, preserve MLOS
if (params.mlos == null) {
  // Don't overwrite with default—keep existing value
  params.mlos = existingRecord?.mlos || 1;
}

Technology Stack Specifics

Node.js + MySQL + Sequelize

Challenge: Schema changes require downtime.

Solution: Make columns nullable, add defaults, use migrations.

// Migration: Add column
ALTER TABLE hotel ADD COLUMN is_member_rate_ghc BOOLEAN DEFAULT FALSE;

// Model: Define field
isMemberRateGhc: {
  type: DataTypes.BOOLEAN,
  allowNull: true,
  defaultValue: false,
  field: "is_member_rate_ghc",
}

// Application: Safe access
if (hotel?.isMemberRateGhc === true) {
  // New behavior
}

MongoDB + Mongoose

Challenge: Schemaless means no guarantees.

Solution: Always check for field existence.

// WRONG
const hotelId = subscription.hotelId;  // Breaks if missing

// RIGHT
const hotelId = subscription?.hotelId || null;
if (!hotelId) return useDefaultBehavior();

Async/Await Error Handling

Challenge: Uncaught promise rejections crash Node.js.

Solution: Try-catch every database call.

async function _getChainIdFromHotelId(hotelId) {
  if (!hotelId) return null;

  try {
    const hotel = await hotelModel.findOne({ where: { id: hotelId } });
    return hotel?.chainId || null;
  } catch (error) {
    console.error('Error fetching chainId:', error);
    return null;  // Always return something
  }
}

Key Takeaways

1. Ship Infrastructure First, Logic Later

  • Database changes can go out immediately if nullable with defaults

  • Model updates can ship before API uses them

  • Helper functions can ship before callers need them

2. Feature Flags Are Your Safety Net

  • Database-level flags (not config files)

  • Default to OFF

  • Can be toggled without deploy

  • Per-hotel granularity

3. Test With One Hotel in Production

  • Hardcode hotel ID in code (yes, really)

  • Monitor for 1-2 weeks

  • Remove hardcode only after validation

  • Staging will never catch real edge cases

4. Default to Existing Behavior

  • null chainId → existing behavior

  • Missing flag → existing behavior

  • Error → existing behavior

  • Empty result → existing behavior

5. Make Every Phase Independently Rollback-able

  • Phase 1 rollback: Drop column (easy)

  • Phase 2 rollback: Revert model (easy)

  • Phase 3 rollback: Flip flag to FALSE (instant)

  • Phase 4 rollback: Remove API calls (easy)

6. Monitor Obsessively

  • Log every decision (chainId lookup, token selection)

  • Track error rates per phase

  • Monitor external APIs (Google, HyperGuest)

  • Alert on unexpected behavior

7. Velocity Through Safety

  • Ship in 2-3 day cycles

  • Never batch unrelated changes

  • Always have instant rollback

  • Monitor for 24h between deploys


Anti-Patterns to Avoid

❌ The Big Bang Release

"Let's build MLOS completely (database, API, filtering, UI, channel manager) and ship in 3 weeks."

Why it fails: 3 weeks of compounding risk, no user feedback, hard to debug.

❌ Breaking Changes for Convenience

"Old code should just update to pass the new parameter."

Why it fails: Breaks external integrations, frontend, mobile apps. Debugging hell.

❌ Staging-Only Testing

"It works in staging, ship to prod."

Why it fails: Production has different data, load, edge cases, vendor behaviors.

❌ Feature Complete or Nothing

"We can't ship MLOS without filtering logic."

Why it fails: Shipping infrastructure early = faster iteration, earlier feedback.

❌ Hardcoded Values Forever

"Let's hardcode 10% discount."

Why it fails: Fine for Phase 1. Must be configurable by Phase 4.


Conclusion

Shipping fast in a startup isn't about cutting corners—it's about strategic iteration.

Every feature at Hotelzify follows this playbook:

  1. Ship infrastructure (safe, backward compatible)

  2. Ship core logic (feature-flagged, single hotel)

  3. Monitor and learn (production data, real edge cases)

  4. Gradually roll out (5 hotels → 50% → 100%)

  5. Iterate based on feedback (add filtering, UI, etc.)

The result?

  • Major features shipped in 1-2 weeks (vs. 1-2 months)

  • Zero production incidents from new features

  • Instant rollback capability at every phase

  • Continuous learning from real user data

In startups, you can't afford to be slow. But you also can't afford to break things.

Iterative shipping with backward compatibility is how you ship fast without breaking things.


Appendix: Phased Rollout Visualization

┌────────────────────────────────────────────────────────────┐
│  Week 1: Infrastructure                                    │
├────────────────────────────────────────────────────────────┤
│  ✓ Database schema (nullable, defaults)                   │
│  ✓ Sequelize models                                        │
│  ✓ Validation schemas (optional params)                   │
│  ✓ Helper functions                                        │
│  ✓ Feature flag column                                     │
│                                                             │
│  DEPLOY → Monitor 24h → No errors ✓                        │
└────────────────────────────────────────────────────────────┘
                          ↓
┌────────────────────────────────────────────────────────────┐
│  Week 2: Core Logic (Feature-Flagged)                     │
├────────────────────────────────────────────────────────────┤
│  ✓ Service layer (defaults, preservation logic)           │
│  ✓ API endpoints (accept new params)                      │
│  ✓ Graceful error handling                                │
│  ✓ Extensive logging                                       │
│                                                             │
│  DEPLOY → Monitor 48h → No errors ✓                        │
│  ✓ Test manually via Postman                              │
└────────────────────────────────────────────────────────────┘
                          ↓
┌────────────────────────────────────────────────────────────┐
│  Week 3: Single Hotel Testing (Production)                │
├────────────────────────────────────────────────────────────┤
│  ✓ Enable flag for hotel ID 2740 only                     │
│  ✓ Hardcode hotel ID check in code                        │
│  ✓ Monitor logs, external APIs, error rates               │
│  ✓ Validate calculations, edge cases                      │
│                                                             │
│  1 hotel → Monitor 1 week → Validate ✓                     │
└────────────────────────────────────────────────────────────┘
                          ↓
┌────────────────────────────────────────────────────────────┐
│  Week 4: Gradual Rollout                                   │
├────────────────────────────────────────────────────────────┤
│  Day 1-2:   5 hotels  → Monitor → No issues ✓             │
│  Day 3-5:   20 hotels → Monitor → No issues ✓             │
│  Day 6-10:  50% hotels → Monitor → No issues ✓            │
│  Day 11-14: 100% hotels → Monitor → Success ✓             │
│                                                             │
│  Remove hotel ID hardcode                                  │
│  Rely purely on feature flags                              │
└────────────────────────────────────────────────────────────┘
                          ↓
┌────────────────────────────────────────────────────────────┐
│  Week 5+: Iteration & Enhancement                          │
├────────────────────────────────────────────────────────────┤
│  ✓ Add filtering logic (based on production data)         │
│  ✓ Build admin UI (based on user feedback)                │
│  ✓ Channel manager integration                             │
│  ✓ Google Ads sync                                         │
│                                                             │
│  Each enhancement follows same phased approach             │
└────────────────────────────────────────────────────────────┘

Key insight: Each box is independently deployable and rollback-able.


This post was written with Claude Code, distilled from my engineering journal and experiences over the years at Hotelzify.