Skip to content
BookMyShow's Seat Selection Architecture: Distributed Locks, Payment Orchestration, and Zero Double-Bookings at Scale

BookMyShow's Seat Selection Architecture: Distributed Locks, Payment Orchestration, and Zero Double-Bookings at Scale

1 Architectural Blueprint: Designing for 1 Million Daily Bookings

When you design a seat-selection and booking system that behaves like BookMyShow or Ticketmaster, the core challenge is simple to describe but brutal to implement: thousands of users will attempt to book the same handful of premium seats within the same second, and your system must pick a single winner with zero ambiguity, zero double-bookings, and zero downtime.

In this section, I’ll walk through the architectural decisions that make that possible. We’ll explore the scaling math, the domain boundaries, and the high-throughput .NET techniques that keep the booking engine fast even during the inevitable “Taylor Swift spike.”

1.1 The High-Level Context

1.1.1 Deconstructing the scale: 1M bookings/day vs. “The Taylor Swift Spike” (1M requests/minute)

A platform like BookMyShow typically handles 1 million confirmed bookings per day. Spread evenly, that’s manageable—roughly 12 bookings per second. But production traffic is never even.

During movie releases, cricket matches, or concert ticket drops, traffic explodes into a flash spike. A real observed pattern is:

  • Normal traffic: 3k–7k RPS
  • Major event: 150k–300k RPS
  • Extreme “Taylor Swift spike”: 1M+ requests per minute (~16k RPS sustained for several minutes)

The real pressure comes from the seat map endpoint and the lock seat endpoint. Almost every customer:

  1. Loads the seat map
  2. Requests seat availability
  3. Attempts to lock 1–6 seats almost simultaneously

The architectural principle you learn quickly is:

Traffic for viewing seats is 100x higher than traffic for booking seats.

This is why the read model, real-time cache, and distributed locks dominate the architecture instead of the SQL database.

1.1.2 Monolith vs. Microservices: Why we choose Domain-Driven Design

When teams first design booking platforms, many default to microservices prematurely. But a high-performance seat-selection system benefits from domain-driven decomposition, not arbitrary service slicing.

We split the platform into three strategic bounded contexts:

  • Booking Context Seat locks, booking confirmation, reservation expiry, ticket generation.
  • Inventory Context Venue, showtimes, seat layout, pricing slabs, held/available/booked states.
  • Payments Context Payment initiation, webhooks, failure handling, reconciliation.

The rule I follow is:

  • Cross-context communication must be asynchronous (RabbitMQ, Kafka).
  • Intra-context operations stay strongly consistent (SQL + Redis atomic operations).

This avoids the worst anti-pattern: a “distributed monolith” where booking requires synchronous calls across services.

1.1.3 The Tech Stack: .NET 8, Minimal APIs, and Kestrel Tuning

To push throughput during spikes, the system uses:

  • .NET 8 Minimal APIs for ultra-fast routing and low allocation

  • Kestrel configured for high concurrency:

    {
      "Kestrel": {
        "Limits": {
          "MaxConcurrentConnections": 100000,
          "MaxConcurrentUpgradedConnections": 100000
        },
        "EndpointDefaults": {
          "Protocols": "Http1AndHttp2"
        }
      }
    }
  • Redis (clustered) for lock management

  • PostgreSQL for transactional writes

  • MongoDB for storing denormalized seat maps

Minimal APIs shave ~20–40% latency compared to full MVC pipelines. When you multiply that by hundreds of thousands of requests, the savings become operationally significant.

1.2 Data Modeling for Performance

In practice, the booking domain uses CQRS to break the impossible problem—fast reads + strongly consistent writes—into two manageable halves.

1.2.1 CQRS Pattern Implementation

The CQRS split is strict:

  • Commands (Booking, LockSeat, ReleaseSeat)

    • Only performed against write DB + Redis
    • Require strong consistency
    • Go through validation + domain rules
  • Queries (Seat map, showtime info)

    • Served entirely from read DB (Mongo/Cosmos)
    • Tuned for extremely high throughput

This prevents the read layer from competing for the same resources as the write layer.

1.2.2 Write Model: Relational Database for transactional integrity

Seat-level writes must be durable and serialized. PostgreSQL or SQL Server works well because:

  • ACID guarantees
  • Row-level conflict detection
  • Mature transaction isolation levels
  • Indexing optimized for write patterns

A typical Booking table:

CREATE TABLE bookings (
    booking_id UUID PRIMARY KEY,
    user_id UUID NOT NULL,
    show_id UUID NOT NULL,
    seat_ids TEXT[] NOT NULL,
    status VARCHAR(20) NOT NULL,
    created_at TIMESTAMP NOT NULL,
    version INT NOT NULL
);

Note the version field—used for optimistic concurrency later.

1.2.3 Read Model: NoSQL for fast seat map rendering

Seat layouts can be large—300 to 1,000 seats per auditorium—and we commonly fetch them 10,000+ times per minute.

MongoDB/CosmosDB works because:

  • Reads rarely require joins
  • The entire seat map is stored as a single JSON document:
{
  "auditoriumId": "123",
  "showId": "456",
  "seats": [
    { "id": "A1", "price": 350, "status": "Available" },
    ...
  ]
}

This keeps seat map rendering fast (< 20 ms) even under load.

1.2.4 Real-time State: Redis for ephemeral seat states

Redis represents seat states as:

seat:{showId}:{seatId} -> Available | Locked | Booked

And also stores TTLs:

seat-lock:{showId}:{seatId} -> {userId} [expires in 10 minutes]

Redis offers:

  • Sub-millisecond access
  • Atomic Lua scripting
  • TTL-based auto-expiry
  • High availability in cluster mode

This is the system’s source of truth for seat locking.

1.3 The “Zero Double-Booking” Guarantee

1.3.1 Consistency boundaries

In a distributed architecture:

  • Seat locking requires strong consistency
  • Seat map rendering accepts eventual consistency

The only operations requiring strict guarantees are:

  • Acquiring a seat lock
  • Converting the lock to a reservation

If Redis says “A1 is locked”, no other service is allowed to challenge that.

1.3.2 Identifying race conditions

The nightmare scenario:

  1. User A clicks A1 at timestamp 10:00:00.100
  2. User B clicks A1 at timestamp 10:00:00.100

Without a strong lock, both could read the seat as available.

Race types we guard against:

  • Double read (both see seat as available)
  • Lock overwrite (lock A1 twice)
  • Lock timeout contention
  • Database write conflict

Redis Lua scripts + versioning make these impossible.


2 The Front Line: Handling Flash Sales and Virtual Queues

During a high-demand ticket drop, traffic behaves like a tidal wave. If you don’t dampen it before it reaches your core services, everything collapses—database, Redis, CPU, even the LB layer.

2.1 The “Thundering Herd” Problem

2.1.1 Why standard auto-scaling fails during flash sales

Kubernetes auto-scaling reacts after load increases. When a million people refresh the page at 8:00 AM:

  • Pods spike from 20 → 200
  • Scale-up takes 1–2 minutes
  • During those 2 minutes, your API is effectively down

This is why you must shape the traffic before it enters the cluster.

2.1.2 Using YARP as the ingress gateway

YARP (Yet Another Reverse Proxy) is a .NET reverse proxy that excels at high-throughput, programmable routing.

We deploy YARP as:

  • Layer-7 load distributor
  • Rate-limiter
  • Token-verifier
  • Queue position distributor

A basic YARP config:

{
  "ReverseProxy": {
    "Routes": [
      {
        "RouteId": "booking",
        "ClusterId": "booking-api",
        "Match": { "Path": "/api/booking/{**catch-all}" }
      }
    ],
    "Clusters": {
      "booking-api": {
        "Destinations": {
          "d1": { "Address": "http://booking-service" }
        }
      }
    }
  }
}

YARP lets you implement pre-routing logic to throttle, reject, or queue users before they hit the booking service.

2.2 Building the Virtual Waiting Room

2.2.1 Strategy: Offloading traffic before it hits the database

The waiting room sits in front of everything:

Client -> YARP -> Virtual Queue -> Booking API

Users who exceed rate limits receive a queue token and retry after a delay. Only a controlled number of requests reach the core.

2.2.2 Rate Limiting + Redis Counting Semaphores

ASP.NET Core 8’s rate limiting middleware supports partitioned rate limits.

Example configuration:

builder.Services.AddRateLimiter(options =>
{
    options.AddConcurrencyLimiter("seat-selection", limiter =>
    {
        limiter.PermitLimit = 5000;
        limiter.QueueLimit = 100000;
        limiter.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
    });
});

For multi-node consistency, we back the permit limits with Redis counters:

INCR queue:permits
EXPIRE queue:permits 5s

This avoids per-node drift.

2.2.3 FIFO Queue with JWT Queue Tokens

Once a user is accepted into the queue, they get a signed token:

{
  "pos": 12345,
  "exp": 1712345678,
  "hash": "sha256(...)"
}

This prevents:

  • Spoofing queue positions
  • Bypassing the queue
  • Bot acceleration attempts

The booking API reads the token and enforces queue ordering.

2.3 Token Bucket Algorithm for Traffic Shaping

2.3.1 Why Token Bucket

Token Bucket smooths incoming traffic by allowing short bursts but enforcing a long-term steady rate:

  • bucketSize = 10,000
  • refillRate = 500 tokens/second

Even under spikes, only 500 requests/second reach the booking backend.

2.3.2 Custom .NET Middleware Example

public class TokenBucketMiddleware
{
    private readonly RequestDelegate _next;
    private long _tokens;
    private readonly long _maxTokens = 10000;
    private readonly long _refillRate = 500;
    private DateTime _lastRefill = DateTime.UtcNow;

    public TokenBucketMiddleware(RequestDelegate next)
    {
        _next = next;
        _tokens = _maxTokens;
    }

    public async Task InvokeAsync(HttpContext context)
    {
        Refill();

        if (_tokens > 0)
        {
            Interlocked.Decrement(ref _tokens);
            await _next(context);
        }
        else
        {
            context.Response.StatusCode = 429;
        }
    }

    private void Refill()
    {
        var now = DateTime.UtcNow;
        var seconds = (now - _lastRefill).TotalSeconds;
        var tokensToAdd = (long)(seconds * _refillRate);

        if (tokensToAdd > 0)
        {
            _tokens = Math.Min(_maxTokens, _tokens + tokensToAdd);
            _lastRefill = now;
        }
    }
}

This runs extremely fast and keeps your downstream system alive.


3 Core Mechanics: Distributed Locking with Redis

Seat locking is the beating heart of a booking platform. If two people lock the same seat, nothing else matters—you’ve failed the core contract.

3.1 The Lifecycle of a Seat Lock

3.1.1 The State Machine

Seats flow through:

  1. Available
  2. Locked (TTL: 10 minutes)
  3. Reserved
  4. Booked
  5. Or fallback to Available if the lock expires

Redis stores the temporary states; PostgreSQL stores only the final booking state.

3.1.2 Why database row-level locking kills performance

If you try:

SELECT * FROM seats WHERE seat_id = 'A1' FOR UPDATE;

…during a traffic spike, you will:

  • Serialize concurrent seat locks
  • Cause long transactions
  • Drag down throughput
  • Risk deadlocks
  • Turn the DB into the bottleneck

This is why locking must occur in Redis, not SQL.

3.2 Implementing the RedLock Algorithm

3.2.1 Using RedLock.net / StackExchange.Redis

RedLock ensures:

  • Majority quorum
  • Atomicity
  • Auto-expiry
  • Protection against split-brain

Example:

var redlockFactory = RedLockFactory.Create(new List<RedLockMultiplexer>
{
    new RedLockMultiplexer(connection1),
    new RedLockMultiplexer(connection2),
    new RedLockMultiplexer(connection3)
});

using var redLock = await redlockFactory.CreateLockAsync(
    resource: $"seat-lock:{showId}:{seatId}",
    expiryTime: TimeSpan.FromMinutes(10)
);

if (!redLock.IsAcquired)
{
    return SeatLockResult.Failed();
}

3.2.2 Atomic Lua Script for Locking

Redis Lua scripts ensure read-modify-write atomicity:

local seatKey = KEYS[1]
local user = ARGV[1]
local ttl = ARGV[2]

if redis.call("GET", seatKey) == false then
    redis.call("SET", seatKey, user, "PX", ttl)
    return 1
else
    return 0
end

Run it from .NET:

var result = (int)await db.ScriptEvaluateAsync(lua, keys, values);

3.2.3 Handling “Zombie Locks”

Zombie locks occur when:

  • Client disconnected mid-payment
  • App crashed
  • Network partition

Redis TTLs prevent indefinite blocking. For active payments, we extend locks:

await db.KeyExpireAsync(key, TimeSpan.FromMinutes(5));

3.3 Optimistic Concurrency Control

3.3.1 Using Version Columns

When converting Redis lock → SQL reservation:

UPDATE bookings
SET status = 'Reserved', version = version + 1
WHERE booking_id = @id AND version = @version;

If the row was modified elsewhere, the update fails.

3.3.2 “TryLock” Pattern in .NET

public async Task<bool> TryLockSeat(string seatId, string userId)
{
    var lua = @"if redis.call('GET', KEYS[1]) == false then
                    redis.call('SET', KEYS[1], ARGV[1], 'PX', ARGV[2])
                    return 1
                else
                    return 0
                end";

    var result = (int)await _redis.ScriptEvaluateAsync(lua,
        new RedisKey[] { $"seat:{seatId}" },
        new RedisValue[] { userId, 600000 });

    return result == 1;
}

This gives us a clean, atomic locking API.


4 Payment Orchestration: The Saga Pattern

When a seat lock transitions into a real booking, the system must coordinate multiple services—Inventory, Payments, Booking Confirmation, Notifications—without relying on a central transaction coordinator. We already have Redis ensuring strong seat-level consistency, but payment introduces uncertainty: timeouts, bank failures, third-party issues, or delayed webhooks. The orchestration logic must tolerate all of these while ensuring the seat never ends up in a ghosted, half-reserved state. This is where a Saga becomes the backbone of the workflow.

4.1 Distributed Transactions in a Microservices World

4.1.1 Why Two-Phase Commit (2PC) is an anti-pattern here

Two-Phase Commit seems attractive on paper. It promises atomic updates across services, and legacy enterprise systems still rely on it. But in a modern booking system, 2PC introduces problems that hit quickly at scale:

  1. Slow coordinator = slow bookings 2PC requires all participating services to pause until a global coordinator issues decisions. During a peak event, these pauses compound and throttle throughput.

  2. Failure handling is painful A coordinator failure puts participants into limbo. Recovery is slow and often manual, which is unacceptable when thousands of seats are being locked every minute.

  3. Heterogeneous systems hate 2PC Payments use third-party APIs. Inventory uses Redis + SQL. Booking uses PostgreSQL. Most participants cannot (and should not) support XA transactions.

  4. Locking cost is prohibitive 2PC requires resources to be locked until the commit phase completes. This destroys our latency goals.

In practice, a high-traffic booking platform can’t afford global locks or blocking behavior. The booking pipeline needs asynchronous, compensating flows rather than a transactional monolith.

4.1.2 The Saga Pattern (Orchestration over Choreography)

A Saga approaches distributed consistency by breaking the booking journey into discrete steps. Each step emits events or commands, and the orchestrator manages compensation when something goes wrong.

We choose orchestration because:

  • The booking workflow is linear and business-driven
  • Payments have strict rules and well-defined fallback paths
  • Auditing and visibility matter
  • Operations teams need a single place to inspect failures

The orchestrator acts as the conductor:

  1. OrderCreated
  2. Reserve Inventory
  3. Initiate Payment
  4. Confirm Payment
  5. Complete Order

If any step fails, the Saga triggers reversal logic—releasing seats, rolling back partial reservations, and closing the order gracefully.

4.2 Implementing Sagas with MassTransit

MassTransit gives us a declarative way to build Sagas using state machines, correlation IDs, and RabbitMQ for messaging. Each transition aligns with a domain event.

4.2.1 Setting up the State Machine

A typical Saga state machine for our booking flow might look like this:

public class BookingState : SagaStateMachineInstance
{
    public Guid CorrelationId { get; set; }
    public string CurrentState { get; set; }
    public Guid UserId { get; set; }
    public Guid ShowId { get; set; }
    public List<string> SeatIds { get; set; }
    public DateTime CreatedAt { get; set; }
}

The state machine definition:

public class BookingStateMachine : MassTransitStateMachine<BookingState>
{
    public State InventoryReserved { get; private set; }
    public State PaymentInitiated { get; private set; }
    public State PaymentConfirmed { get; private set; }

    public Event<OrderCreated> OrderCreated { get; private set; }
    public Event<InventoryReservedEvent> InventoryReservedEvent { get; private set; }
    public Event<PaymentSuccess> PaymentSuccess { get; private set; }
    public Event<PaymentFailed> PaymentFailed { get; private set; }

    public BookingStateMachine()
    {
        InstanceState(x => x.CurrentState);

        Event(() => OrderCreated, e => e.CorrelateById(context => context.Message.OrderId));

        Initially(
            When(OrderCreated)
                .Then(context =>
                {
                    context.Instance.ShowId = context.Data.ShowId;
                    context.Instance.SeatIds = context.Data.SeatIds;
                })
                .Publish(context => new ReserveInventory(context.Instance.CorrelationId, context.Instance.ShowId, context.Instance.SeatIds))
                .TransitionTo(InventoryReserved)
        );

        During(InventoryReserved,
            When(InventoryReservedEvent)
                .Publish(ctx => new InitiatePayment(ctx.Instance.CorrelationId))
                .TransitionTo(PaymentInitiated)
        );

        During(PaymentInitiated,
            When(PaymentSuccess)
                .Publish(ctx => new ConfirmOrder(ctx.Instance.CorrelationId))
                .TransitionTo(PaymentConfirmed),

            When(PaymentFailed)
                .Publish(ctx => new ReleaseInventory(ctx.Instance.CorrelationId))
                .Finalize()
        );

        SetCompletedWhenFinalized();
    }
}

Each step is asynchronous and durable because RabbitMQ persists messages.

4.2.2 Integrating RabbitMQ as the durable message bus

MassTransit handles the connection layer, so setup is simple:

builder.Services.AddMassTransit(x =>
{
    x.AddSagaStateMachine<BookingStateMachine, BookingState>()
        .EntityFrameworkRepository(r =>
        {
            r.ConcurrencyMode = ConcurrencyMode.Optimistic;
            r.AddDbContext<BookingDbContext>();
        });

    x.UsingRabbitMq((context, cfg) =>
    {
        cfg.Host("rabbitmq:5672", h =>
        {
            h.Username("guest");
            h.Password("guest");
        });
        cfg.ConfigureEndpoints(context);
    });
});

RabbitMQ gives:

  • Guaranteed message delivery
  • Replay in case of transient failure
  • Isolation between steps
  • Observability through queues and DLQs

This reliability layer is essential for handling flaky payment gateways or service outages gracefully.

4.3 Compensating Transactions (Rollbacks)

Sagas shine when things go wrong, which is common in payments. The system must recover without corrupting seat inventory or risking ghost reservations.

4.3.1 Scenario: Payment Gateway timeout or Insufficient Funds

Two common failure paths:

  1. Timeout The payment gateway doesn’t respond. The user might retry, abandon the flow, or wait for a webhook that will never come.

  2. Hard failure Insufficient funds, card declined, or risk scoring rejection.

Both cases must:

  • Close the order
  • Release seat locks
  • Notify the user
  • Log the attempt for possible fraud detection

4.3.2 Triggering ReleaseSeatCommand

When a payment fails, the Saga fires compensation commands:

public record ReleaseInventory(Guid OrderId);

The handler releases the Redis locks and updates SQL:

public class ReleaseInventoryHandler : IConsumer<ReleaseInventory>
{
    private readonly ISeatLockService _seatLockService;
    private readonly BookingDbContext _db;

    public async Task Consume(ConsumeContext<ReleaseInventory> context)
    {
        var order = await _db.Bookings.FindAsync(context.Message.OrderId);

        foreach (var seatId in order.SeatIds)
            await _seatLockService.ReleaseAsync(order.ShowId, seatId);

        order.Status = "Cancelled";
        await _db.SaveChangesAsync();
    }
}

This ensures seats become available instantly for other customers.

4.3.3 Handling Idempotency for Payment Webhooks

Payment providers are notorious for sending duplicate webhooks. Without idempotency:

  • You might confirm an order twice
  • You might release inventory twice
  • Competing events might corrupt state

The standard pattern is:

public async Task<bool> HandlePaymentWebhook(string eventId, PaymentData data)
{
    if (await _cache.ExistsAsync($"webhook:{eventId}"))
        return false; // duplicate

    await _cache.SetAsync($"webhook:{eventId}", "1", TimeSpan.FromHours(3));

    await _mediator.Send(new ProcessPayment(data));
    return true;
}

This small guard prevents massive operational incidents.


5 Real-Time Synchronization: SignalR and Inventory

After the locking and payment workflows are stable, the next challenge is presenting real-time seat updates to thousands of concurrent viewers. High-demand events often have 20k–50k users looking at the same seat map simultaneously. If updates lag, users feel like seats “vanish” randomly.

5.1 The “Disappearing Seat” Experience

5.1.1 Real-time updates without page refreshes

Seat states change constantly—every lock, release, expiration, or booking triggers a broadcast. If the front end relies on polling, the UI lags 2–10 seconds behind reality. That gap is enough to frustrate customers, especially when they repeatedly choose seats that are unavailable.

SignalR solves this by pushing incremental updates:

  • “Seat A12 → Locked”
  • “Seat C9 → Available”
  • “Row B is now sold out”

Clients update the UI almost instantly. The server only sends the changed items, not the full seat map.

5.1.2 SignalR with Redis Backplane for horizontal scaling

SignalR supports scale-out by using Redis as a message backplane so that:

  • User A connected to Pod #1
  • User B connected to Pod #9

…both receive the same updates.

Configuration example:

builder.Services.AddSignalR()
    .AddStackExchangeRedis("redis:6379", options =>
    {
        options.Configuration.ChannelPrefix = "seat-updates";
    });

Publishing updates is straightforward:

await _hubContext.Clients
    .Group($"auditorium:{showId}")
    .SendAsync("SeatUpdated", seatId, newStatus);

The Redis channel handles fan-out across all nodes.

5.2 Optimization Strategies

Large fan-out systems require careful tuning to avoid redundant broadcasts or bloated payloads.

5.2.1 Group Management for Auditorium Scoping

Each auditorium maintains its own group:

public override async Task OnConnectedAsync()
{
    var showId = Context.GetHttpContext().Request.Query["showId"];
    await Groups.AddToGroupAsync(Context.ConnectionId, $"auditorium:{showId}");
}

This prevents unnecessary cross-auditorium traffic and reduces message volume dramatically.

5.2.2 MessagePack over JSON

MessagePack shrinks payloads by 60–80%, allowing more updates to fit within WebSocket frames:

builder.Services.AddSignalR()
    .AddMessagePackProtocol();

Clients deserialize automatically with the correct library.

5.2.3 Debouncing updates

Seat updates often occur in bursts—locking 4 adjacent seats, releasing several seats after timeouts, or applying dynamic pricing. Broadcasting each change immediately creates noise. Debouncing batches updates:

public class SeatUpdateBuffer
{
    private readonly Channel<SeatUpdate> _channel = Channel.CreateUnbounded();
    private readonly ConcurrentDictionary<string, SeatUpdate> _latest = new();

    public SeatUpdateBuffer()
    {
        _ = Process();
    }

    public void AddUpdate(SeatUpdate update)
        => _latest[update.SeatId] = update;

    private async Task Process()
    {
        while (true)
        {
            await Task.Delay(500);
            var updates = _latest.Values.ToList();
            _latest.Clear();

            // broadcast batch
            await _hubContext.Clients
                .Group($"auditorium:{updates.First().ShowId}")
                .SendAsync("SeatBatchUpdate", updates);
        }
    }
}

This reduces network chatter dramatically.


6 Intelligence Layer: Dynamic Pricing and Fraud Detection

Once the real-time flow is stable, the next layer involves making smarter decisions on pricing and user behavior. Dynamic pricing aligns with demand spikes, and fraud detection protects inventory from scalpers, bots, and automated scripts that attempt to pressure the system.

6.1 Dynamic Pricing Engine

6.1.1 Demand signals and velocity metrics

Dynamic pricing depends on real-time telemetry:

  • Look-to-book ratio
  • Frequency of seat map refreshes
  • Velocity of seat locking
  • Abandon rate on payments
  • Section-level heat maps

Redis is ideal for storing live counters:

INCR show:{showId}:views
INCR show:{showId}:locks
INCR show:{showId}:payments:success

Price adjustments are usually applied per slab or per seating category rather than per individual seat.

6.1.2 Background worker for pricing calculation

A HostedService periodically recalculates pricing multipliers:

public class PricingWorker : BackgroundService
{
    private readonly IRedisClient _redis;
    private readonly IPricingService _pricing;

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        while (!stoppingToken.IsCancellationRequested)
        {
            var shows = await _pricing.GetActiveShowsAsync();

            foreach (var show in shows)
            {
                var metrics = await _pricing.GetMetricsAsync(show.Id);
                var multiplier = _pricing.CalculateMultiplier(metrics);

                await _redis.SetAsync($"pricing:{show.Id}", multiplier, TimeSpan.FromMinutes(5));
            }

            await Task.Delay(TimeSpan.FromSeconds(15), stoppingToken);
        }
    }
}

Multipliers are cached client-side and applied instantly when rendering seat prices.

6.2 The Anti-Scalper Shield

Scalpers and bots degrade the booking experience and distort demand. Detecting them requires behavioral analysis and device-level fingerprinting.

6.2.1 Identifying bot patterns

Bots exhibit recognizable patterns:

  • Extremely high request frequency
  • Linear seat scans (A1 → A2 → A3 → …)
  • Repeated lock attempts with no payment initiation
  • No UI events such as scroll or pointer movement

Tracking these requires combining:

  • Request logs
  • Rate limiting interruptions
  • Failed seat locks
  • Suspicious payment failures

6.2.2 Fingerprinting with device hashes and IP reputation

When the user loads the seat map, the system assigns a fingerprint:

  • Browser entropy
  • Canvas/Font fingerprinting
  • Timezone + language patterns
  • IP + ASN reputation (VPN detection)

Stored in Redis:

fp:{hash} -> riskScore

High-risk scores trigger rate penalties or additional verifications.

6.2.3 CAPTCHA on Checkout

Rather than show everyone a challenge, we only trigger CAPTCHAs for users with abnormal behavior. The backend signals the front end:

{ "captchaRequired": true }

Once solved, the backend stores a proof token:

SET captcha:{fingerprint} 1 EX 1800

Only then does the user proceed to payment.

6.2.4 Orleans for stateful tracking (optional)

Microsoft Orleans offers virtual actors that maintain user/session state efficiently:

  • Track lock attempts
  • Track failed payments
  • Track seat exploration patterns

Example actor interface:

public interface IUserBehaviorGrain : IGrainWithStringKey
{
    Task RecordSeatAttempt(string seatId);
    Task<int> GetSuspicionLevel();
}

This model scales horizontally without managing Redis keys manually.


7 Partner Integrations: Inventory Sync at Scale

When the core booking flow works smoothly, the next challenge is exposing inventory to partners without weakening the guarantees we’ve built. Large ticketing platforms rarely operate in isolation. Aggregators like Google Pay, Paytm, PhonePe, and telecom portals all expect live availability. Each integration introduces another potential pressure point. If one partner overloads their integration, the whole ecosystem feels the impact unless the architecture isolates and protects the booking engine.

7.1 The Omnichannel Challenge

7.1.1 Managing inventory across the native app, website, and third-party aggregators

Every channel sees different traffic patterns. The website might receive high browsing volume, while a partner app might generate lock-heavy traffic during a promotion. The system must ensure all channels access the same seat availability source while preventing partners from causing unfair advantages or starvation for the main booking app.

To maintain fairness, partners never call the core inventory database directly. Everything flows through strongly controlled APIs backed by Redis seat state. Partners receive a filtered, rate-limited version of the seat map—the same data the native app reads but optimized for their request patterns. This avoids situations where partner systems bypass caching layers or cause localized hot spots on the Redis cluster.

A typical partner request flows through a dedicated API gateway segment with:

  • Per-partner rate limits
  • Concurrency controls
  • Traffic signatures for anomaly detection
  • Separate tracing tags for observability

This ensures that even if a partner calls availability endpoints too aggressively, internal customers remain unaffected.

7.2 Designing Partner APIs

Partner APIs must balance performance with consistency. The rules differ by partner size, SLA, and traffic expectations.

7.2.1 Webhooks for “Push” updates vs. Polling

Polling for seat availability rarely scales well. Partners often default to calling availability endpoints every few seconds, which magnifies into millions of requests during peak events. Instead, the platform uses webhook-based push updates to notify partners when inventory changes.

A typical webhook payload looks like:

{
  "showId": "abcd1234",
  "timestamp": "2024-09-05T12:30:51Z",
  "changes": [
    { "seatId": "A12", "status": "Locked" },
    { "seatId": "A13", "status": "Available" }
  ]
}

Partners subscribe by registering their callback URLs. The booking engine posts changes in near real time, and partners acknowledge delivery. If delivery fails, the system retries with exponential backoff. Storing webhook events in a durable queue (RabbitMQ or Kafka) ensures no loss of updates if a partner endpoint goes down.

However, some partners cannot support webhooks, so polling endpoints exist but are heavily throttled. Typical rate limits might allow:

  • 10 requests per second for large partners
  • 2 requests per second for smaller ones

These constraints push partners to use webhooks whenever possible.

7.2.2 The “Allocated Block” Strategy

Some partners require guaranteed inventory—especially telecom and loyalty platforms. Instead of letting them lock seats in real time, the system can pre-allocate seat blocks.

For example:

  • Allocate 50 premium seats to Partner A
  • Allocate 100 regular seats to Partner B

The partner treats their allocation as a virtual inventory. When a user books through that partner, the partner API only interacts with the booking system at the final booking step. The platform maps partner seat identifiers back to internal seat codes during confirmation.

A simplified allocation model:

partner-block:{partnerId}:{showId} -> [A1, A2, A3, ...]

When a partner consumes a seat:

public async Task<string?> ConsumeAllocatedSeat(string partnerId, string showId)
{
    var key = $"partner-block:{partnerId}:{showId}";
    var seatId = await _redis.ListLeftPopAsync(key);

    return seatId; // null if allocation exhausted
}

This approach eliminates lock contention and exposes a fixed seat pool to partners. It also lets the booking engine oversell-proof the system by respecting allocation caps.

7.3 Resilience with Polly

Partner integrations add volatility. You must treat partner calls as untrusted, slow, or unpredictable. Polly provides essential resilience patterns to prevent these issues from cascading.

7.3.1 Circuit Breakers and Retry Policies

A retry strategy smooths over transient partner failures. A circuit breaker prevents endless retries against a failing partner. Combined, they isolate flaky integrations.

var retryPolicy = Policy
    .Handle<HttpRequestException>()
    .OrResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
    .WaitAndRetryAsync(3, attempt => TimeSpan.FromMilliseconds(200 * attempt));

var circuitPolicy = Policy
    .Handle<HttpRequestException>()
    .CircuitBreakerAsync(
        exceptionsAllowedBeforeBreaking: 5,
        durationOfBreak: TimeSpan.FromSeconds(30)
    );

var combined = Policy.WrapAsync(retryPolicy, circuitPolicy);

var response = await combined.ExecuteAsync(() =>
    _httpClient.PostAsync(partnerUrl, content));

Once a partner API enters the “open” state, calls are immediately rejected until the break duration passes.

7.3.2 Bulkhead Isolation to protect the booking engine

Bulkheading prevents one downstream partner from consuming all HTTP client threads or blocking the core booking API.

var bulkhead = Policy.BulkheadAsync(
    maxParallelization: 20,
    maxQueuingActions: 100,
    onBulkheadRejectedAsync: _ => Task.CompletedTask
);

Each partner receives its own bulkhead. If Partner A misbehaves, its calls are throttled independently. The booking process continues unaffected for other channels.


8 Operational Excellence and Conclusion

By the time a system reaches this stage—robust locking, real-time synchronization, dynamic pricing, partner connectivity—the remaining work is to ensure operability. This means strong observability, clean failure recovery, simple scaling paths, and predictable runtime behavior. Operational excellence differentiates a system that works during normal days from one that survives a viral ticket sale.

8.1 Observability and Tracing

8.1.1 Implementing OpenTelemetry end-to-end

Tracing must cover every step across the booking lifecycle. Distributed tracing helps diagnose bottlenecks such as a slow seat-lock Lua script or a delayed payment webhook. OpenTelemetry integrates cleanly with ASP.NET Core, MassTransit, RabbitMQ, and Redis.

Configuration example:

builder.Services.AddOpenTelemetry()
    .WithTracing(t =>
    {
        t.AddAspNetCoreInstrumentation()
         .AddHttpClientInstrumentation()
         .AddMassTransitInstrumentation()
         .AddRedisInstrumentation()
         .AddSource("BookingService")
         .SetResourceBuilder(ResourceBuilder.CreateDefault().AddService("booking-api"))
         .AddOtlpExporter(o =>
         {
             o.Endpoint = new Uri("http://otel-collector:4317");
         });
    });

Each trace includes:

  • SeatLockAttempt
  • InventoryReserved event publish
  • PaymentInitiated command
  • Payment provider webhook
  • OrderConfirmed

This provides a searchable history for any booking ID. During peak loads, tracing is invaluable for identifying hotspots, such as slow partner endpoints or overloaded Redis nodes.

8.1.2 Key Metrics to Monitor

Three metrics repeatedly predict booking system failures:

  1. LockContentionRate A high rate means too many users are clicking the same seat cluster. It indicates insufficient inventory or aggressive load from partners.

  2. SoldOutTime Measures how long it takes for an event to sell out after going live. Spikes or unusual patterns often reveal delays in the Saga or real-time updates.

  3. PaymentFailureRate Used to detect payment gateway outages. If the failure rate crosses a threshold, the system automatically:

    • Switches to a secondary payment provider
    • Increases queue delays
    • Temporarily pauses payment initiation

These metrics feed dashboards and alert policies. They’re often exported to Prometheus or Grafana for long-term trend analysis.

8.2 Scaling Infrastructure

Modern booking systems run in Kubernetes. The challenge is dynamically matching capacity—especially consumer workers—with the incoming traffic and queue backlogs.

8.2.1 Using KEDA for event-driven autoscaling

KEDA listens to external event sources (like RabbitMQ queues) and adjusts pod counts in real time. This is crucial for Saga consumers, webhook processors, and inventory update handlers.

A typical KEDA scale object:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: booking-consumer
spec:
  scaleTargetRef:
    name: booking-consumer-deployment
  minReplicaCount: 2
  maxReplicaCount: 50
  triggers:
  - type: rabbitmq
    metadata:
      host: RabbitMqConnection
      queueName: booking-events
      queueLength: "200"

This means:

  • At less than 200 pending messages, maintain 2 replicas
  • As backlog increases, scale up to 50 consumers

KEDA ensures the system reacts to booking surges in seconds without relying on CPU-based autoscaling, which reacts too slowly.

8.3 Summary & Next Steps

8.3.1 Recap of the “Lock-Pay-Confirm” architecture

The flow we’ve built through these sections follows a disciplined sequence:

  1. Lock: Redis-based distributed locks ensure zero double-bookings.
  2. Pay: A MassTransit Saga orchestrates payment and inventory.
  3. Confirm: SQL finalizes orders with optimistic concurrency.
  4. Sync: SignalR keeps all customers up to date.
  5. Scale: Virtual queues, dynamic pricing, and partner isolation enable predictable behavior.
  6. Observe: OpenTelemetry provides deep visibility into every step.

This pattern has proven resilient at massive scale, especially under unpredictable traffic spikes.

8.3.2 Final thoughts on system cost-efficiency and scale

The goal is not just to handle peak load but to do so without overspending. The architecture keeps costs low by:

  • Using Redis for hot data instead of oversized SQL servers
  • Scaling consumers only when queues grow
  • Offloading static and semi-static data to NoSQL
  • Using webhooks to limit partner polling
  • Deploying minimal APIs for high throughput

With these strategies, a booking platform can confidently manage millions of seat selections and hundreds of thousands of bookings per day while maintaining the strict guarantees customers expect.

Advertisement