Skip to content
Peak Season Delivery Optimization: Dynamic Route Planning, Gig Worker Management, and 10X Scale Handling

Peak Season Delivery Optimization: Dynamic Route Planning, Gig Worker Management, and 10X Scale Handling

1 The Peak Season Challenge: Domain Analysis & Architectural Drivers

Peak season delivery—especially the Black Friday to Christmas window—pushes logistics platforms into stress conditions that rarely show up during the rest of the year. Order volume can increase by 10× in a matter of hours, gig worker availability becomes volatile, and small inefficiencies quickly snowball into missed SLAs. Decisions that are “good enough” at normal scale often fail catastrophically under peak load.

The core question is straightforward: How do we design a delivery and routing platform that behaves predictably when demand, traffic, and workforce availability all become unstable at the same time?

This section breaks down what actually changes during peak season, why those changes break traditional systems, and which architectural drivers matter most when designing for extreme scale.

1.1 Deconstructing the “Black Friday” Load

Peak load is not a smooth curve. It arrives as sharp, uneven spikes across different parts of the system. Order ingestion, routing, driver assignment, tracking, and customer-facing queries all scale differently—and often at different times of the day. Treating peak season as “normal traffic but bigger” is one of the most common mistakes teams make.

Each workload stresses a different bottleneck, which is why understanding how traffic changes matters more than simply knowing how much it grows.

1.1.1 The 10X spike: Analyzing read vs. write heavy operations

In steady-state conditions, most delivery platforms operate within predictable ranges. A marketplace might ingest tens of thousands of orders per hour, with tracking and driver updates following consistent patterns. During Black Friday, those assumptions stop holding. Order volume can jump from tens of thousands per hour to hundreds of thousands—or more—almost instantly.

More importantly, the balance between reads and writes shifts.

Write-heavy operations grow first and fastest:

  • Order creation spikes when promotions go live.
  • Package status transitions accelerate as fulfillment waves start.
  • Driver GPS pings increase with the number of active couriers.
  • Real-time events such as ETA recalculations, geofence triggers, and reassignments multiply.

Read-heavy operations also grow, but in a more predictable way:

  • Customers refresh tracking pages more frequently.
  • Drivers sync routes and assignments.
  • Operations teams monitor dashboards and heatmaps.
  • Analysts run near-real-time queries to manage supply and demand.

The critical insight is that write traffic becomes the dominant constraint. Reads can often be cached, delayed, or degraded. Writes—especially order ingestion and status transitions—must be accepted and processed immediately, or the system falls behind.

A simplified illustration of peak season load looks like this:

Normal:
Orders: 5,000/hour
Driver Pings: 50,000/hour
Package Status Events: 100,000/hour

Black Friday:
Orders: 50,000/hour
Driver Pings: 600,000/hour
Package Status Events: 1,200,000/hour

These spikes are not synchronized. Orders surge when deals launch. Driver pings peak during delivery windows. Status events spike when routes go out. The architecture must absorb uncorrelated bursts across multiple domains, not just handle a higher average load.

1.1.2 The “Last Mile” constraint: Why standard logistics models fail when volume density explodes

Most failures during peak season happen in the last mile. Higher order density means more stops per driver, tighter delivery windows, and far more routing permutations. Problems that are manageable at low volume become nonlinear as density increases.

Traditional approaches—fixed delivery zones, static route planning, and wave-based optimization—break down for a few consistent reasons:

  1. Clustering stops stops working linearly. Thousands of orders arriving in the same area within minutes force constant rebalancing of zones and batches.
  2. Driver supply fluctuates constantly. Gig workers sign on and off throughout the day, invalidating routing decisions that assumed stable capacity.
  3. Traffic conditions change faster than routes can be recomputed. Congestion near malls and hubs quickly invalidates precomputed ETAs.
  4. Delivery promises tighten. Same-day or same-evening SLAs reduce the time available for optimization and error recovery.

Because of this, last-mile systems must assume that routes will change. Dynamic recalculation, streaming inputs, and cost-aware re-routing are no longer optimizations—they are requirements.

1.2 Architectural Pillars for High Scale

Peak-season architecture is driven less by features and more by non-functional requirements. Three pillars dominate design decisions: elasticity, latency, and fault tolerance. If any one of these is weak, the entire system becomes fragile under load.

1.2.1 Elasticity vs. Scalability: Designing for KEDA rather than static scaling

Static scaling strategies fail because peak-season traffic does not grow gradually. Load changes minute by minute, sometimes second by second. Even traditional horizontal autoscaling based on CPU or memory is often too slow, because most delivery workloads are event-driven, not CPU-bound.

Kubernetes Event-Driven Autoscaling (KEDA) addresses this by scaling based on demand signals such as queue depth or event lag. Instead of guessing how much capacity is needed, the system reacts directly to backlog.

When order events pile up, ingestion workers scale out. When GPS pings surge, tracking processors expand. When routing requests queue up, solver pods spin up in parallel.

This model has several practical advantages:

  • Scaling reacts directly to demand, not resource exhaustion.
  • Each microservice scales independently based on its own workload.
  • Backpressure is explicit and measurable through queue lag.

Example: Scaling Kafka consumers with KEDA

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-ingestion-scaler
spec:
  scaleTargetRef:
    name: order-ingestion-worker
  triggers:
    - type: kafka
      metadata:
        topic: orders
        bootstrapServers: kafka:9092
        lagThreshold: "1000"

When lag exceeds 1,000 messages, additional workers are created automatically. Static scaling simply cannot react fast enough to handle spikes of this magnitude.

1.2.2 Latency requirements

During peak season, latency budgets shrink. Every extra millisecond compounds across thousands of deliveries. Clear SLAs are required to keep both customer and driver experiences usable:

OperationSLA Target
Route calculation<200 ms (small batches), <1s (medium batches)
Order ingestion acknowledgment<50 ms
Driver location processing<300 ms end-to-end
Tracking UI refresh500–800 ms
Assignment scoring and dispatch<100 ms

The key point is that these budgets leave little room for inefficiency. A routing solver that takes several seconds may work during low volume but becomes a bottleneck during peak hours, delaying thousands of downstream actions.

Meeting these targets typically requires:

  • Aggressive in-memory caching
  • Precomputed spatial groupings
  • Low-latency gRPC calls between services
  • Parallel solver execution
  • Avoiding blocking I/O in request paths

1.2.3 Fault Tolerance: Implementing “Degraded Mode”

Under extreme load, systems must make deliberate trade-offs. Degraded Mode allows the platform to shed non-essential features while protecting core flows such as order ingestion, assignment, and delivery execution.

Common degradations include:

  • Disabling live map animations.
  • Reducing ETA recalculation frequency.
  • Pausing historical analytics.
  • Limiting geofence processing to high-priority events.

Example fallback logic:

if (systemHealth.IsDegraded)
{
    return CachedEtaManager.GetLastKnownEta(orderId);
}
return EtaService.Calculate(orderId);

In peak season, available and slightly stale data is far better than perfect but unavailable data.

1.3 Domain-Driven Design (DDD) Boundaries

Peak season amplifies coupling problems. Clear domain boundaries isolate load, prevent cascading failures, and allow teams to scale services independently.

1.3.1 Defining Bounded Contexts

A typical large-scale delivery platform separates into these contexts:

  1. Ordering

    • Order creation, validation, SLA assignment.
  2. Routing (Core Domain)

    • VRP solving
    • Distance matrix caching
    • Batching and dynamic re-routing
  3. Fleet Management

    • Driver identity and availability
    • Vehicle capacity
    • Assignment acceptance
  4. Billing / Surge Pricing

    • Incentives
    • Surge rules
    • Financial ledgers
  5. Tracking

    • Location ingestion
    • Geofence events
    • Proof-of-delivery milestones

Each context scales differently. Tracking often requires orders of magnitude more throughput than Billing. Routing needs CPU-heavy nodes, while Tracking favors memory and network bandwidth.

1.3.2 Strategic Patterns: Core Domain vs. Generic Subdomains

Routing is the Core Domain. It contains the hardest problems and delivers the most competitive advantage. It deserves the most engineering investment.

Core Domain

  • VRPTW solvers
  • Dynamic re-routing logic
  • ETA prediction

Generic Subdomains

  • Notifications
  • Identity
  • Payments
  • Reporting

These supporting domains should rely on proven patterns and third-party tooling rather than custom innovation.


2 High-Level Architecture: The Event-Driven Backbone

Once peak-season load patterns are understood, the architectural implications become obvious. A logistics platform operating under 10× load cannot depend on synchronous request/response flows as its primary coordination mechanism. Those patterns assume stable traffic and predictable execution times—conditions that simply do not exist during peak delivery windows.

Predictability at scale comes from event-driven architecture. Producers emit facts about what happened. Consumers react when they are ready. This decoupling allows each part of the system—order ingestion, routing, tracking, billing, and gig worker assignment—to scale independently without blocking the others.

At peak scale, the backbone of the system is not a web API. It is the event pipeline.

2.1 The Tech Stack: .NET 9 & Cloud-Native

Modern .NET (version 9 and newer) provides the performance characteristics needed for high-throughput logistics workloads while preserving memory safety and developer productivity. When paired with Kubernetes, it enables a platform that can scale aggressively without sacrificing operational stability.

The goal of the stack is not novelty. It is fast startup, predictable performance, and operational consistency from local development through production.

2.1.1 Core: ASP.NET Core Minimal APIs

Minimal APIs are a practical choice for peak-season systems because they remove unnecessary abstraction and overhead. They start faster, allocate less memory, and are easier to reason about under load. This matters when dozens or hundreds of instances may be created in response to traffic spikes.

Order ingestion endpoints should do as little work as possible. Their responsibility is to validate input, accept the request, and emit an event—nothing more.

Example: Order ingestion endpoint

app.MapPost("/orders", async (OrderDto dto, IOrderService svc) =>
{
    var orderId = await svc.IngestAsync(dto);
    return Results.Accepted($"/orders/{orderId}");
});

The key detail is what does not happen here. The endpoint does not perform routing, does not calculate pricing, and does not block on downstream systems. It records intent and moves on. This keeps latency low and protects the API layer when traffic spikes sharply.

2.1.2 Orchestration: Kubernetes with .NET Aspire

Local development environments often diverge significantly from production, which makes peak-season bugs harder to reproduce. .NET Aspire reduces this gap by letting teams run the full system locally—APIs, workers, message brokers, caches—with consistent configuration.

In production, Kubernetes (AKS or EKS) provides the operational backbone:

  • Horizontal Pod Autoscaling for baseline elasticity
  • KEDA for event-driven scaling
  • Rolling, blue/green, and canary deployments
  • Sidecars for caching, observability, or network acceleration

Aspire does not replace Kubernetes. It ensures that what developers run locally behaves like what operators run in production, reducing surprises when peak load arrives.

2.1.3 Inter-service Communication: gRPC vs. MassTransit

Not all communication patterns are equal at peak scale. Some interactions must be fast and synchronous. Others must be durable and asynchronous. Treating them the same leads to either excessive latency or unnecessary coupling.

Use gRPC when:

  • A response is required immediately to proceed.
  • Latency must stay well below 200 ms.
  • Calls are internal and well-controlled.

Examples include:

  • Distance matrix queries
  • ETA lookups
  • Driver acceptance acknowledgments

Use MassTransit (over RabbitMQ or Service Bus) when:

  • Work can be done asynchronously.
  • Retries and durability matter.
  • Throughput is more important than immediacy.

Examples include:

  • OrderCreated → RoutingRequested
  • DriverLocationUpdated → TrackingUpdated
  • RouteOptimized → DriverAssigned
  • PhotoUploaded → ImageProcessingTriggered

Example gRPC service definition

service DistanceMatrix {
  rpc GetDistance (DistanceRequest) returns (DistanceResponse);
}

This separation ensures that latency-sensitive paths remain fast, while high-volume workflows are buffered and resilient under load.

2.2 Ingestion Strategy: The “Shock Absorber” Pattern

Peak season traffic does not arrive smoothly. It arrives in bursts—sometimes extreme ones. If the API layer tries to process every request immediately, it will fail. Instead, ingestion must absorb shocks and release work at a controlled rate.

The ingestion layer acts like a shock absorber: it accepts bursts, queues work, and lets downstream systems process at their own pace.

2.2.1 Competing Consumers with Kafka or Event Hubs

Competing Consumers are the backbone of this pattern. Instead of one worker processing all events, multiple stateless consumers share the load. Kafka partitions or Event Hub partitions make this parallelism explicit.

A typical peak-season setup might look like:

  • 12 partitions for the orders topic
  • 50 partitions for the driver-location topic
  • Stateless consumers that scale automatically via KEDA

Example consumer loop

await foreach (var msg in consumer.ConsumeAsync(stoppingToken))
{
    var evt = JsonSerializer.Deserialize<OrderCreatedEvent>(msg.Value);
    await routerChannel.Writer.WriteAsync(evt, stoppingToken);
}

The consumer acknowledges messages only after they are safely handed off to internal processing pipelines. This protects against message loss and allows backpressure to build naturally when the system is under stress.

2.2.2 Library Spotlight: YARP for load balancing and rate limiting

YARP sits at the edge of the system and protects internal services from abuse—intentional or accidental. During peak season, rate limiting is not about blocking users. It is about preventing cascading failures.

YARP can:

  • Throttle abusive or misbehaving clients
  • Route requests cleanly to internal services
  • Terminate TLS efficiently
  • Apply retries and circuit breakers consistently

Example rate limit configuration

"RateLimits": {
  "DefaultPolicy": {
    "PermitLimit": 1000,
    "Window": "00:00:01"
  }
}

This ensures that sudden spikes do not overwhelm downstream services that are already operating near capacity.

2.3 Data Architecture Strategy

No single database can serve all workloads efficiently at peak scale. A delivery platform must store financial data, operational events, spatial data, and real-time state—each with different access patterns and consistency requirements.

Polyglot persistence is not optional. It is required.

2.3.1 Relational for financial data, NoSQL for operational streams

Relational databases (PostgreSQL / SQL Server) are used where correctness is non-negotiable:

  • Billing ledgers
  • Driver payouts
  • Financial reconciliation events

These systems rely on strong consistency, transactions, and well-defined schemas.

NoSQL databases (Cosmos DB / MongoDB) handle high-volume, fast-changing data:

  • Tracking events
  • GPS histories
  • Route recalculation outputs
  • Proof-of-delivery metadata

These workloads favor horizontal scale, flexible schemas, and write throughput over strict relational constraints.

2.3.2 Redis Stack for geospatial indexing and caching

Redis Stack plays a critical role in peak-season performance. It acts as both a cache and a real-time spatial index.

Capabilities used heavily include:

  • GEOADD and GEORADIUS queries
  • TTL-based caching for ETAs and route snapshots
  • Lua scripts for lightweight geofence logic

Example: Find unassigned orders within 500 meters

GEORADIUS "orders:pending" -73.935242 40.730610 500 m WITHDIST

Redis allows routing and assignment services to answer spatial questions in milliseconds, which is essential when thousands of decisions must be made every second.

At peak scale, Redis is not an optimization layer. It is a core dependency that keeps routing, assignment, and tracking responsive under extreme load.


3 The Core Engine: Dynamic Route Optimization & Batching

If the event-driven backbone keeps the system standing under peak load, the routing engine is what determines whether deliveries actually succeed. This engine decides which driver goes where, in what order, and under which constraints. During peak season, those decisions must be made quickly, revised often, and executed reliably—sometimes thousands of times per minute.

The challenge is not finding the mathematically perfect route. The challenge is finding good enough routes fast, and knowing when to change them without destabilizing drivers or downstream systems.

3.1 Algorithmic Approach to the Last Mile

The last mile is dominated by routing problems that are computationally expensive by nature. As order density increases, naive approaches collapse under their own complexity. The goal is to apply the right algorithms, at the right scope, with clear limits on how much work the system is allowed to do per decision.

3.1.1 VRPTW in practical terms

Most last-mile routing problems can be modeled as a variant of the Vehicle Routing Problem with Time Windows (VRPTW). In practice, this means the system must consider several constraints at the same time:

  • Each delivery has a valid time window (for example, 3–5 pm).
  • Each driver has limited capacity (weight, volume, or stop count).
  • Each stop requires service time, not just travel time.
  • Drivers have shift boundaries and break rules.

A simplified version of the problem looks like this:

  • Given N delivery locations, each with an earliest and latest arrival time.
  • Given K drivers, each with capacity and availability constraints.
  • Find routes that minimize total cost while respecting all deadlines.

During peak season, this is not a one-time calculation. The solver runs continuously. New orders arrive. Drivers log off. Traffic conditions change. Each event potentially invalidates previous assumptions, forcing partial or full recomputation.

This is why exact optimization is rarely feasible at scale. Heuristics and approximations are not shortcuts—they are the only viable option.

3.1.2 Dynamic batching using geospatial proximity and SLAs

One of the most effective ways to keep routing tractable is to reduce the problem size before solving it. Instead of routing thousands of orders at once, the system groups them into smaller, meaningful batches.

Batching typically considers:

  • Geographic proximity (via H3 or geohashes)
  • Promised delivery windows
  • Fulfillment center or pickup origin
  • Current driver availability in the area

The intent is simple: only compare orders that realistically belong together. This dramatically reduces the number of permutations the solver must explore.

Example batching logic:

var candidates = orders
    .Where(o => o.SlaTime <= batchDeadline)
    .GroupBy(o => H3Index.FromGeo(o.Lat, o.Lng).Parent(7));

By grouping orders into spatially coherent clusters with similar SLAs, the routing engine works on smaller, denser problems. This reduces distance matrix size, speeds up solver execution, and keeps latency within acceptable bounds during peak load.

3.2 Implementation with Open Source Tools

Building a production-grade routing engine does not mean building everything from scratch. Mature open-source tools provide strong foundations, as long as they are used with clear boundaries and realistic expectations.

3.2.1 OR-Tools for VRP and VRPTW

Google OR-Tools is a practical choice for solving constrained routing problems. It provides:

  • Constraint programming primitives
  • Local search heuristics
  • Native support for time windows and capacities
  • Reasonable performance for medium-sized batches

A minimal VRPTW setup in C# looks like this:

RoutingModel routing = new(manager);
RoutingSearchParameters searchParams =
    operations_research_constraint_solver
        .DefaultRoutingSearchParameters();

searchParams.FirstSolutionStrategy =
    FirstSolutionStrategy.Types.Value.PathCheapestArc;

var solution = routing.SolveWithParameters(searchParams);

In production, OR-Tools should not run inside request handlers. Instead, it belongs in a dedicated routing service invoked asynchronously or via gRPC. This keeps solver execution isolated, scalable, and easier to monitor under peak load.

3.2.2 Distance calculation with Itinero or OSRM

Routing solvers depend heavily on accurate distance and travel-time estimates. Computing these on the fly is expensive. OSRM provides a fast, production-tested way to generate distance matrices.

A common pattern is:

  1. Load the road network into OSRM.
  2. Query the /table endpoint for batch distance calculations.
  3. Cache results in Redis with short TTLs.

Example OSRM table request:

/table/v1/driving/13.388860,52.517037;13.397634,52.529407

Caching these matrices avoids repeated computation when batches overlap geographically, which is common during dense peak-season delivery waves.

3.2.3 H3 clustering for zone-based assignment

H3’s hexagonal indexing model works well for last-mile logistics. Each coordinate maps to a stable cell, and neighboring cells share clear adjacency relationships. This makes it easier to reason about zones and nearby drivers under load.

Example: assigning an order to a resolution-8 H3 cell

var h3 = H3Index.FromLatitudeLongitude(lat, lng, 8);

Using H3 enables:

  • Consistent spatial grouping
  • Predictable cluster sizes
  • Fast neighborhood queries for assignment and rebalancing

During peak season, these properties help keep routing and assignment decisions deterministic and fast.

3.3 Handling Truly Dynamic Re-routing

Dynamic routing is necessary, but it must be applied carefully. Constant changes frustrate drivers and increase operational risk. The system must distinguish between changes that matter and noise that should be ignored.

3.3.1 Mid-route cancellations and urgent pickups

When a delivery is canceled mid-route, the system evaluates whether rerouting is worth the disruption.

Two strategies are commonly used:

  1. Soft re-routing Apply only if the cancellation significantly affects downstream stops or SLA risk.

  2. Hard re-routing Triggered when route efficiency improves meaningfully or when urgent pickups must be inserted.

Urgent pickups are handled using insertion heuristics that minimize incremental cost:

route.InsertAtBestPosition(newStop);

The insertion point is chosen based on the smallest increase in travel time or SLA risk, not global optimality.

3.3.2 Route locking and execution stability

Once a driver starts executing a route, excessive changes become counterproductive. Locking mechanisms prevent constant churn and provide a stable experience for drivers.

A typical route lifecycle looks like this:

  • Planned – Fully mutable
  • SentToDriver – Soft-locked, limited changes allowed
  • AcceptedByDriver – Hard-locked
  • InProgress – No structural changes
  • Completed – Archived

Enforcement example:

if (route.State >= RouteState.AcceptedByDriver)
{
    return; // Route is locked
}

Soft locks allow last-minute adjustments before acceptance. Hard locks protect execution once the driver commits. This balance keeps the system responsive without overwhelming drivers during peak chaos.


4 Gig Worker Ecosystem: Onboarding & Assignment

Peak season delivery only works if the workforce can scale as fast as demand. In practice, that workforce is mostly gig-based. Drivers sign up late, log in and out frequently, switch regions, or disappear mid-shift. Systems built around stable, long-lived employees tend to break under this level of churn.

To operate reliably, the platform must make onboarding nearly frictionless while still enforcing compliance. Once workers are active, assignment decisions must be fast, fair, and resilient to race conditions. This section focuses on the mechanics that allow a volatile gig workforce to operate predictably at 10× scale.

4.1 Frictionless Onboarding & Identity

Many drivers onboard minutes before they want to start delivering. If onboarding takes too long, supply never catches up with demand. At the same time, identity and compliance cannot be skipped. The onboarding pipeline must be fast, asynchronous, and tolerant of partial completion.

A typical onboarding flow includes:

  • Account creation
  • Identity verification
  • Background check initiation
  • Driver app provisioning

Each step should move forward independently so the system never blocks on a slow external dependency.

4.1.1 OAuth flows and Duende for short-lived worker identities

OAuth 2.0 is well suited for gig worker apps because it supports short-lived sessions and frequent reauthentication. Drivers often switch devices, reinstall apps, or return after long breaks. Identity tokens must reflect that reality.

Duende IdentityServer issues access and refresh tokens using Authorization Code + PKCE. Token lifetimes are intentionally short, and refresh tokens rotate to reduce risk when devices are lost or compromised.

A typical client configuration looks like this:

new Client
{
    ClientId = "driver-app",
    AllowedGrantTypes = GrantTypes.Code,
    RequirePkce = true,
    RequireClientSecret = false,
    RedirectUris = { "driverapp://signin-callback" },
    AllowedScopes = { "openid", "profile", "worker.api" },
    AllowOfflineAccess = true,
    RefreshTokenUsage = TokenUsage.OneTimeOnly
};

Tokens carry only what the app needs to operate immediately: worker ID, region, and current eligibility state. Anything else is fetched from Fleet Management after authentication. Keeping tokens small matters during peak season when mobile networks are unreliable and retries are common.

4.1.2 Automating background checks with integration events

Background checks are slow and unpredictable. Some complete in minutes, others take hours. Treating them as synchronous steps creates bottlenecks exactly when onboarding needs to scale.

Instead, background checks are fully event-driven. The provider posts results to a webhook. That webhook publishes an event. Worker services react when they are ready.

Webhook ingestion example:

app.MapPost("/webhooks/background-check", async (
    BackgroundCheckResult payload,
    IEventPublisher publisher) =>
{
    await publisher.PublishAsync(new BackgroundCheckCompletedEvent
    {
        WorkerId = payload.WorkerId,
        Status = payload.Status,
        Score = payload.Score
    });

    return Results.Ok();
});

Once the event is processed, the worker’s status changes to Eligible. From that point on, the assignment engine can consider them. Onboarding never blocks delivery operations, even when thousands of background checks complete at the same time.

4.2 Intelligent Assignment (The Matching Engine)

Once workers are active, the platform must decide who gets which delivery. These decisions happen continuously and at high volume. During peak hours, assignment logic can execute tens of thousands of times per minute.

The key constraint is speed. The system cannot afford to recompute global state for every decision. Instead, it evaluates a small set of nearby candidates and chooses the best fit based on current conditions.

4.2.1 Broadcast vs. unicast assignment

Most gig platforms use two assignment patterns, often together.

Broadcast assignment sends a job to all nearby eligible workers. The first to accept gets it. This works well when:

  • Worker supply is high
  • Jobs are interchangeable
  • Acceptance speed matters more than optimization

The downside is noise. During peak season, broadcast pushes can overwhelm devices and networks.

Unicast assignment selects one worker and sends the job directly. This is preferred when:

  • SLAs are tight
  • Routes are carefully optimized
  • Capacity and fairness matter

In practice, platforms use both: broadcast for low-risk jobs, unicast for time-critical ones.

Unicast notification example:

await driverNotificationService.NotifyAsync(workerId, new AssignmentPayload
{
    BatchId = batch.Id,
    Stops = batch.Stops
});

This hybrid approach balances speed and control under heavy load.

4.2.2 Score-based assignment under peak load

Assignment decisions are rarely binary. Multiple workers may be eligible, and the “best” choice depends on several factors. Score-based assignment turns this into a simple ranking problem.

A common scoring model combines distance, capacity, quality, and risk:

score = distanceWeight * distance
      + ratingWeight   * driverRating
      + capacityWeight * remainingCapacity
      + slaRiskWeight  * slaRisk

Weights can change dynamically. During peak congestion, meeting SLAs may matter more than minimizing distance.

Example scoring function:

decimal ScoreCandidate(Worker w, Order o)
{
    return
        0.5m * w.DistanceTo(o) +
        0.2m * (5 - w.Rating) +
        0.2m * (1 - w.CapacityUtilization) +
        0.1m * o.SlaRisk;
}

The system typically scores only a small candidate set—often 10 to 30 workers—pulled from Redis geospatial queries. This keeps decisions fast and predictable.

4.2.3 Preventing double assignment with optimistic concurrency

High concurrency creates race conditions. Two workers may accept the same job, or two assignment workers may select the same candidate simultaneously. Locks do not scale well here.

Optimistic concurrency solves this cleanly. Each assignment update checks a version or row token. If another process already claimed the job, the update fails and the system moves on.

Example with EF Core:

var job = await db.Jobs.FirstAsync(j => j.Id == jobId);

job.WorkerId = workerId;
job.Version++;

try
{
    await db.SaveChangesAsync();
    return AssignmentResult.Success;
}
catch (DbUpdateConcurrencyException)
{
    return AssignmentResult.FailedAlreadyTaken;
}

This approach works naturally with competing consumers and keeps throughput high during peak assignment waves.

4.3 The Driver App Backend

The driver app is where all of this becomes real. If the app fails, deliveries stop. During peak season, connectivity is unreliable and load is high, so the backend must assume the app will go offline at inconvenient times.

4.3.1 Offline-first synchronization and conflict resolution

Drivers move through tunnels, basements, and rural areas. The app must continue working without a connection. Routes, stops, and PoD data are stored locally in SQLite and synced when connectivity returns.

The backend processes sync batches and resolves conflicts deterministically:

  • Server wins for route changes
  • Client wins for completed deliveries

Sync endpoint example:

app.MapPost("/sync", async (SyncPayload payload, IDriverSyncService svc) =>
{
    var result = await svc.SyncAsync(payload);
    return Results.Ok(result);
});

The response includes server-side corrections and any updated instructions. This keeps the driver experience consistent even during intermittent connectivity.

4.3.2 Hardening the mobile gateway with Polly

During peak season, backend services may throttle or degrade temporarily. The mobile gateway must absorb this without causing retry storms or user-visible failures.

Polly provides controlled retries, backoff, and circuit breaking:

var policy = Policy
    .Handle<HttpRequestException>()
    .OrResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
    .WaitAndRetryAsync(3, retry => TimeSpan.FromMilliseconds(200));

var response = await policy.ExecuteAsync(() =>
    httpClient.PostAsJsonAsync("/sync", payload));

Retries are bounded and predictable. When combined with degraded modes upstream, this keeps the driver app responsive even when parts of the system are under stress.


5 Economic Orchestration: Surge Pricing & Incentives

During peak season, demand does not fail gracefully. Orders arrive faster than drivers can absorb them, especially in predictable windows like late afternoons and early evenings. If the system treats pricing as static, the outcome is also predictable: unassigned orders pile up, delivery promises slip, and driver frustration grows.

Economic orchestration exists to correct this imbalance in real time. By adjusting payouts dynamically, the platform can influence where drivers go, when they log in, and which jobs they accept. The challenge is not just computing higher prices—it is doing so quickly, consistently, and in a way drivers trust.

5.1 Real-time Demand Modeling

Surge pricing only works if it reflects reality. That means the system must continuously observe demand and supply across geography and time. Static snapshots are not enough; the model must detect trends and react before SLAs are breached.

At a minimum, demand modeling considers:

  • Number of unassigned orders
  • Number of available drivers
  • Delivery time commitments
  • Spatial density of both supply and demand

These signals change rapidly during peak hours and must be recalculated frequently.

5.1.1 Building heatmaps of unassigned orders with Redis Geo

Redis Geo is well suited for tracking demand density in real time. Each unassigned order is stored as a geospatial point. By querying counts within a given radius, the system builds a heatmap that highlights where demand is outpacing supply.

Example: count unassigned orders within a 1 km radius

GEORADIUS "orders:pending" -73.935242 40.730610 1000 m COUNT 200

When counts exceed a configured threshold, the area is flagged as a surge zone. The heatmap service publishes surge updates through the event bus, allowing other services—especially the driver app—to react immediately.

During peak periods, these calculations typically run every 10–20 seconds. To keep this affordable, the service reads directly from Redis, computes only deltas, and avoids full recomputation unless boundaries change significantly.

5.1.2 Designing the Surge Multiplier Service

A surge multiplier represents how imbalanced a zone is. At its simplest, it compares pending orders to available drivers:

multiplier = baseRate * (1 + (pendingOrders / availableDrivers))

In practice, this needs refinement. SLA pressure matters. A zone with fewer orders but imminent deadlines may require stronger incentives than a zone with higher volume but flexible delivery windows.

A more realistic model incorporates SLA risk:

multiplier = baseRate * (
    1
    + α * (pendingOrders / max(availableDrivers, 1))
    + β * slaBreachProbability
)

Multipliers are always capped to avoid runaway pricing. Drivers entering high-demand zones see increased payouts in near real time, encouraging them to reposition or accept additional work.

Example multiplier computation:

public decimal ComputeMultiplier(int pending, int available, decimal slaRisk)
{
    if (available == 0)
        return 3.0m; // hard cap to prevent runaway pricing

    var ratio = (decimal)pending / available;
    return Math.Min(3.0m, 1.0m + 0.5m * ratio + 0.3m * slaRisk);
}

Once calculated, multipliers are published to the Billing context so they are consistently applied to all relevant assignments.

5.2 Dynamic Pricing Implementation

Pricing decisions must be deterministic. Drivers need to know what they will earn before accepting a job, not after completing it. For that reason, surge multipliers are applied at assignment time and carried through to payout.

Beyond surge, the pricing engine may also apply bonuses, guarantees, or time-based incentives. During peak season, these rules change frequently, often daily.

5.2.1 Using a Rules Engine for pricing logic

Hard-coding pricing logic quickly becomes a liability. Operations teams need the ability to adjust incentives without redeploying services or risking regressions.

Rules engines such as Microsoft RulesEngine or NRules allow pricing behavior to be defined declaratively. Rules are loaded at runtime and evaluated whenever assignment conditions change.

Example pricing rule using RulesEngine:

{
  "WorkflowName": "Pricing",
  "Rules": [
    {
      "RuleName": "SurgePricing",
      "SuccessEvent": "ApplySurge",
      "ErrorMessage": "Surge rule failed",
      "Expressions": [
        {
          "Name": "checkSurge",
          "Expression": "input.surgeMultiplier > 1"
        }
      ]
    }
  ]
}

At runtime, the pricing engine evaluates these rules against the current assignment context:

var result = await rulesEngine.ExecuteAsync(
    "Pricing",
    new { surgeMultiplier = 1.5m }
);

The output is not a final payout, but a set of pricing events that the billing system uses to calculate earnings consistently across services.

5.2.2 Protecting ledger integrity with the Outbox Pattern

Nothing undermines driver trust faster than incorrect payouts. During peak season, thousands of pricing events may be generated simultaneously. Network failures, retries, or partial outages can easily lead to duplicate or missing ledger entries if the system is not careful.

The Outbox Pattern ensures pricing events are never lost or double-applied. Each ledger update and its corresponding outbound event are written in the same transaction. A background process later publishes the event to the message bus.

Outbox write example:

using var tx = await db.Database.BeginTransactionAsync();

db.LedgerEntries.Add(entry);
db.Outbox.Add(new OutboxEvent
{
    Type = "LedgerEntryCreated",
    Payload = JsonSerializer.Serialize(entry)
});

await db.SaveChangesAsync();
await tx.CommitAsync();

A separate processor handles delivery:

foreach (var evt in pending)
{
    await bus.Publish(evt.ToMessage());
    evt.MarkProcessed();
}
await db.SaveChangesAsync();

If the system crashes or restarts, unprocessed outbox entries remain and are retried safely. This guarantees idempotent, auditable payouts even under extreme load.


6 Real-Time Visibility: IoT & Tracking at Scale

Real-time visibility is one of the first things customers notice—and one of the fastest systems to break under peak load. As delivery volume increases, so does the number of active drivers, and each of those drivers sends frequent location updates. A platform that comfortably handles hundreds of thousands of GPS pings per hour on a normal day may need to process several million during peak season.

The challenge is not just throughput. Location data arrives from mobile devices with unreliable connectivity, noisy sensors, and inconsistent timing. The tracking pipeline must absorb this data efficiently, reduce noise early, and surface only meaningful updates to downstream systems and users. This section walks through how location data is ingested, processed, and turned into useful signals without overwhelming the platform.

6.1 The Ingestion Funnel (MQTT vs. WebSockets)

The ingestion funnel sits at the edge of the system. Its job is simple but critical: accept raw GPS updates from driver devices and move them off the hot path as quickly as possible. If ingestion blocks or slows down, everything downstream—from routing to customer tracking—starts to lag.

At peak scale, this funnel must handle sharp spikes in message volume without putting pressure on the API layer or databases.

6.1.1 Why MQTT fits battery-constrained driver devices

Driver devices are mobile phones operating under imperfect conditions. Battery life matters. Network quality varies constantly. Requiring each device to maintain frequent HTTP or WebSocket connections does not scale well under those constraints.

MQTT is designed for exactly this scenario. It minimizes overhead and handles intermittent connectivity gracefully. Key reasons it works well for driver telemetry include:

  • Very small packet sizes
  • Persistent sessions across reconnects
  • Built-in quality-of-service levels
  • Automatic retry without application-level logic

Drivers publish location updates to topics like:

drivers/{driverId}/location

The broker handles fan-out to downstream consumers. The backend never talks directly to the device over HTTP for telemetry, which significantly reduces connection churn and resource usage.

Example MQTT client setup in the gateway:

var client = new MqttFactory().CreateMqttClient();
await client.ConnectAsync(new MqttClientOptionsBuilder()
    .WithTcpServer("mqtt-broker", 1883)
    .WithCleanSession(false)
    .Build());

await client.SubscribeAsync("drivers/+/location");

The gateway’s responsibility ends at ingestion. It forwards messages into the event pipeline and moves on. This isolation is what keeps driver ping spikes from cascading into core services.

6.1.2 Using SignalR for customer-facing live updates

Customer tracking has very different requirements than driver telemetry. Customers do not need second-by-second updates, and polling at scale would be wasteful. What they need is timely, smooth updates that reflect progress without unnecessary noise.

SignalR provides a WebSocket-based channel optimized for pushing updates from server to client. Customers subscribe to updates for a specific order, and the backend pushes location or ETA changes as they happen.

Example SignalR hub:

public class TrackingHub : Hub
{
    public async Task SubscribeToOrder(string orderId)
    {
        await Groups.AddToGroupAsync(Context.ConnectionId, orderId);
    }
}

When meaningful updates arrive, the tracking service pushes them to subscribed clients:

await hubContext.Clients.Group(orderId)
    .SendAsync("locationUpdate", new { lat, lng, timestamp });

SignalR is deliberately used only for outbound communication. Devices never send raw telemetry over WebSockets. This separation keeps the ingestion path efficient and the customer experience responsive.

6.2 Processing Location Streams

Once location messages are in the event bus, the next challenge is volume. Writing every raw GPS ping to storage would be prohibitively expensive and mostly useless. The system must filter, normalize, and enrich data before it becomes part of the operational state.

6.2.1 Filtering GPS noise with serverless stream processing

GPS data is inherently noisy. A stationary driver may report slightly different coordinates every few seconds. If these updates are treated as meaningful movement, downstream systems waste cycles recalculating ETAs and triggering geofences.

Lightweight serverless functions are well suited for this stage. They scale automatically and keep processing logic small and focused.

Example jitter-filtering function:

public static class LocationProcessor
{
    [Function("ProcessLocation")]
    public static async Task Run(
        [KafkaTrigger("tracker", "driver-locations")] LocationPing ping,
        FunctionContext context)
    {
        if (IsJitter(ping))
            return;

        await PublishNormalizedAsync(ping);
    }

    private static bool IsJitter(LocationPing ping)
    {
        return ping.Accuracy > 30 ||
               ping.Speed < 1 && ping.DistanceFromLast < 5;
    }
}

By dropping low-quality or redundant updates early, the system reduces storage costs and protects routing and tracking services from unnecessary churn.

6.2.2 Map matching with OSRM

Raw GPS coordinates often place drivers off the actual road network. This leads to incorrect ETAs and confusing map visuals. Map matching corrects this by snapping points to the most likely road segment.

OSRM provides a fast and reliable map-matching API that integrates well with stream processing.

Example map-matching request:

/match/v1/driving/{lng},{lat}?tidy=true

Enrichment step in code:

var response = await httpClient.GetFromJsonAsync<OsrmMatchResponse>(
    $"/match/v1/driving/{ping.Lng},{ping.Lat}?tidy=true");

var snapped = response.Tracepoints.First();

From this point on, all downstream systems operate on the snapped coordinate, not the raw GPS input. This keeps ETAs and geofence checks consistent and reliable.

6.3 Geofencing & ETA Calculations

After normalization and enrichment, location events become actionable. They drive state transitions, customer notifications, and operational decisions.

6.3.1 Detecting proximity events with Redis geofencing

Geofencing answers simple but important questions: Has the driver arrived? Are they close? Should we notify the customer? Redis geospatial indexes make these checks extremely fast.

Example: check if a driver is within 100 meters of the destination

GEORADIUS "destinations:active" -73.935242 40.730610 100 m WITHDIST

When a driver crosses the threshold, the system emits an event:

if (distance <= 100)
{
    await eventBus.PublishAsync(
        new DriverApproaching(orderId, driverId));
}

This event may trigger customer notifications, PoD preparation, or final routing adjustments. Using Redis avoids constant polling and keeps proximity checks inexpensive.

6.3.2 Updating ETAs incrementally under peak load

Recomputing full routes on every location update would overwhelm the system during peak season. Instead, ETAs are updated incrementally using a combination of signals:

  • Snapped driver position
  • Historical speed data
  • Live traffic indicators
  • Percentage of route completed

Simplified ETA adjustment:

var eta = baseEta
    .Add(TimeSpan.FromSeconds(trafficIndex * 30))
    .Add(TimeSpan.FromSeconds(delayFromStops));

Only meaningful changes are pushed to customers via SignalR. This results in smoother ETA updates, lower computational cost, and fewer unnecessary recalculations when traffic is volatile.


7 The Final Mile: Proof of Delivery (PoD) & Image Handling

The final mile is where the delivery officially ends—and where trust is either reinforced or lost. Proof of Delivery (PoD) is the system’s confirmation that the package reached the correct destination, at the right time, by the right driver. This data feeds customer notifications, billing, and dispute resolution.

During peak season, PoD volume increases dramatically. Millions of deliveries may complete within narrow time windows, and each one generates images, location data, and audit records. A well-designed system captures this evidence without slowing down drivers or overwhelming backend services.

7.1 Secure Proof of Delivery Architecture

PoD is not just a photo upload. It is a short workflow with strict validation requirements. Each step must be fast, reliable, and resistant to misuse, especially when drivers are under pressure to move quickly.

7.1.1 The PoD flow: barcode scan, location check, photo, signature

A typical PoD workflow follows a consistent sequence:

  • Scan the package barcode to confirm the correct item.
  • Capture the driver’s current location.
  • Take one or more photos of the delivered package.
  • Optionally capture a customer signature.

From the driver’s perspective, this should feel like a single action. On the backend, it is handled as a structured submission that can be validated and processed independently of other system load.

Example PoD submission endpoint:

app.MapPost("/pod/submit", async (
    PodSubmission submission,
    IPodService svc) =>
{
    var result = await svc.ProcessAsync(submission);
    return Results.Ok(result);
});

The service validates the barcode, associates the submission with the active delivery, links uploaded assets, and records the outcome. The goal is to acknowledge the driver quickly so they can move on to the next stop.

7.1.2 Preventing fraud with location validation

Peak season increases pressure on drivers, and that pressure increases the risk of incorrect or fraudulent PoD submissions. To protect the platform, each submission is validated against the expected drop-off location.

Because the tracking system already maintains snapped, high-quality coordinates, this check is fast and reliable. The system simply ensures the driver was close enough to the destination when the PoD was submitted.

Example validation:

var distance = GeoCalculator.DistanceInMeters(
    submission.Lat, submission.Lng,
    order.DestinationLat, order.DestinationLng);

if (distance > 50)
    throw new InvalidOperationException("Invalid PoD location");

If validation succeeds, the service emits a ProofOfDeliveryCompleted event. That event triggers billing, customer notifications, and downstream analytics. If it fails, the delivery is flagged for review without blocking the rest of the system.

7.2 High-Scale Image Processing

Images are the heaviest part of PoD. During peak season, millions of photos may be uploaded within a few hours. If the backend tries to handle image uploads synchronously, it quickly becomes a bottleneck.

The guiding principle is simple: the driver should never wait on image processing.

7.2.1 Direct-to-blob uploads with presigned URLs

To avoid unnecessary load on the API layer, images are uploaded directly from the driver app to object storage using presigned URLs. The backend generates a short-lived URL and returns it to the app. The app then uploads the image directly to storage.

Example presigned URL generation:

var sas = blobClient.GenerateSasUri(
    BlobSasPermissions.Create | BlobSasPermissions.Write,
    DateTimeOffset.UtcNow.AddMinutes(5));

From the backend’s perspective, the image is just metadata: a blob name, size, and association with a delivery. This keeps API memory usage low and eliminates large request payloads during peak traffic.

7.2.2 Asynchronous image processing via serverless functions

Once an image lands in storage, processing happens asynchronously. A serverless function reacts to the blob creation event and performs any required transformations:

  • Resizing for consistent display
  • Compression to reduce storage and bandwidth
  • Watermarking with timestamp and location
  • Format normalization

Example blob-triggered function:

[Function("ProcessPodImage")]
public static async Task Run(
    [BlobTrigger("pod-images/{name}")] Stream image,
    string name,
    [Blob("pod-processed/{name}", FileAccess.Write)] Stream output)
{
    var processed = await PoDImageProcessor.ProcessAsync(image);
    await output.WriteAsync(processed);
}

This approach decouples image handling from delivery completion. The original image is available immediately, and the processed version appears shortly after. If processing is delayed or retried, it does not affect the driver or the delivery state.

7.2.3 ImageSharp for safe, efficient image manipulation

ImageSharp is a good fit for this workload. It is fully managed, performs well in containerized and serverless environments, and avoids unsafe native dependencies.

A typical processing step includes resizing and watermarking:

public static byte[] Process(byte[] input, string watermark)
{
    using var img = Image.Load(input);

    img.Mutate(x =>
    {
        x.Resize(new ResizeOptions
        {
            Mode = ResizeMode.Max,
            Size = new Size(1024, 1024)
        });
        x.DrawText(
            watermark,
            SystemFonts.CreateFont("Arial", 24),
            Color.White,
            new PointF(20, img.Height - 40));
    });

    using var ms = new MemoryStream();
    img.SaveAsJpeg(ms);
    return ms.ToArray();
}

Because ImageSharp runs entirely in managed code, it scales predictably and avoids memory-safety issues under heavy load.


8 Reliability & Observability: Keeping the Lights On

Peak season does not fail gently. At 10× load, small issues that go unnoticed during normal operations can quickly cascade into widespread outages. A slow dependency, an overloaded queue, or a noisy retry loop can take down large parts of the system in minutes.

Reliability at this scale comes down to two things: spotting problems early and recovering from them quickly. Observability provides the visibility needed to understand what is happening across services. Chaos testing and load testing validate whether the system actually behaves the way we think it does under stress. This final section focuses on the practical tools and patterns that keep the platform stable during the busiest days of the year.

8.1 Observability Stack

In a peak-season logistics system, observability is a core feature, not a debugging aid. Millions of events may pass through the platform in a short time window. Without consistent telemetry, diagnosing issues becomes guesswork, and response times stretch unacceptably long.

A good observability stack provides three things:

  • End-to-end correlation across services
  • High-signal logs that are easy to query
  • Metrics that show trends, not just snapshots

The goal is to let operators follow a single order through the system and immediately see where time or capacity is being lost.

8.1.1 Distributed tracing with OpenTelemetry

Distributed tracing makes it possible to follow a delivery from “order placed” all the way to “delivered.” OpenTelemetry (OTel) provides a standard way to collect and export traces, metrics, and logs without locking the system into a specific vendor.

During peak traffic, latency rarely fails in one place. A few extra milliseconds in routing, combined with slower distance lookups and queue backlogs, can push the entire flow past SLA limits. Traces make these compounding delays visible.

A typical setup includes:

  • OpenTelemetry SDKs in every microservice
  • An OTel Collector for batching and export
  • A trace backend such as Jaeger or Tempo
  • Correlation IDs (OrderId, RouteId, DriverId) propagated across service boundaries

Example trace instrumentation in a routing service:

using var activity = MyTelemetry.ActivitySource
    .StartActivity("RecalculateRoute");

activity?.SetTag("order.id", orderId);
activity?.SetTag("route.batchSize", stops.Count);

var matrix = await distanceMatrixClient.GetAsync(stops);
var optimized = await solver.SolveAsync(stops, matrix);

activity?.SetTag("route.newLength", optimized.TotalDistance);

With this in place, operators can quickly answer questions like:

  • Are routing recalculations slowing down?
  • Is the distance service the bottleneck?
  • Are messages backing up in a specific queue?

Because all services emit traces in the same format, correlation across the system happens automatically.

8.1.2 Structured logging with Serilog

Text logs alone do not scale. When thousands of services emit millions of log lines, searching by keywords becomes ineffective. Structured logging solves this by treating logs as data, not strings.

Serilog supports structured logging by emitting JSON with typed properties. In logistics systems, those properties usually include high-cardinality identifiers such as OrderId, DriverId, and RouteId.

Example Serilog configuration:

Log.Logger = new LoggerConfiguration()
    .Enrich.WithProperty("service", "tracking")
    .WriteTo.Console(new RenderedCompactJsonFormatter())
    .CreateLogger();

Example structured log entry:

Log.ForContext("orderId", order.Id)
   .ForContext("driverId", driver.Id)
   .Information("Driver location updated");

This allows operators to run precise queries such as:

orderId:"12345" AND level:"Error"

During peak season, this drastically reduces the time needed to diagnose incidents. Structured logs also feed anomaly detection systems, which can flag unusual patterns—such as sudden error spikes—before customers notice.

8.2 Chaos Engineering & Load Testing

Reliable systems are not built by accident. They are tested under failure conditions before those failures occur in production. Chaos engineering and load testing are how teams validate that their assumptions hold up under real stress.

8.2.1 Using NBomber to simulate peak traffic

NBomber is well suited for testing .NET-based delivery platforms because it allows developers to model realistic traffic patterns in code. Instead of sending constant load, scenarios can simulate spikes, waves, and sudden drops—patterns that closely resemble peak-season behavior.

NBomber is commonly used to test:

  • Order ingestion bursts
  • Driver location update floods
  • Routing recalculation storms
  • PoD submission spikes

Example NBomber scenario generating GPS bursts:

var sendPing = Step.Create("gps-ping", async ctx =>
{
    var ping = new { lat = 40.73, lng = -73.93, driverId = Guid.NewGuid() };
    var content = JsonContent.Create(ping);
    var result = await httpClient.PostAsync("/tracking/ping", content);
    return result.IsSuccessStatusCode
        ? Response.Ok()
        : Response.Fail();
});

var scenario = ScenarioBuilder
    .CreateScenario("gps-load", sendPing)
    .WithWarmUpDuration(TimeSpan.FromSeconds(10))
    .WithLoadSimulations(
        Simulation.InjectPerSec(5000, TimeSpan.FromSeconds(60))
    );

NBomberRunner.RegisterScenarios(scenario).Run();

This test pushes 5,000 location updates per second through the ingestion pipeline. Engineers can then inspect traces, logs, queue depth, and CPU usage to verify that SLAs hold and backpressure behaves as expected.

8.2.2 Feature flags for controlled degradation

Even with careful design, peak load can exceed expectations. When that happens, the system must shed nonessential features to protect core flows. Feature flags provide a safe way to do this without redeploying code.

Feature flags are commonly used to disable:

  • Live map animations
  • Historical exports
  • Noncritical analytics
  • Expensive recomputation paths

Example configuration:

{
  "FeatureManagement": {
    "EnableLiveMap": true,
    "EnableHistoryExport": false
  }
}

Usage in code:

public class MapController
{
    private readonly IFeatureManager _features;

    public MapController(IFeatureManager features)
    {
        _features = features;
    }

    public async Task<IActionResult> GetLiveMap(string orderId)
    {
        if (!await _features.IsEnabledAsync("EnableLiveMap"))
            return Results.StatusCode(503);

        return await mapService.GetAsync(orderId);
    }
}

During peak season, operators can toggle flags manually or automatically based on metrics such as:

  • Kafka lag
  • Redis CPU usage
  • Pending routing jobs
  • API response times

When thresholds are exceeded, the system enters degraded mode. Core workflows—order creation, assignment, routing, PoD—continue to function, while nonessential features step aside. This controlled degradation is often the difference between partial service and a full outage.

Advertisement