1 The High-Concurrency Challenge: Defining the “World Cup” Scale
An ESPN-scale sports platform that supports 10 million concurrent users, delivers live scores in real time, and sends over a million notifications within seconds of a goal operates at the outer edge of what modern systems can handle. The problem sounds straightforward—get updates from the stadium to the fan—but the constraints make it hard. Every update must reach users in under 500 milliseconds, even when traffic spikes instantly, networks behave inconsistently, or parts of the platform are degraded.
At this scale, familiar patterns stop working. You cannot design for “average traffic” or assume smooth load curves. Every layer—data ingestion, processing, fan-out, delivery, and storage—runs under constant pressure and must absorb sudden bursts without slowing down. Small architectural choices, such as how connections are managed or how autoscaling reacts, directly affect user-visible latency.
1.1 Throughput vs. Latency: The physics of 10M concurrent WebSocket connections
Most real-time systems struggle long before they reach one million active connections. At ten million, the problem looks less like a web application and more like a telecommunications system. Each WebSocket connection consumes memory for buffers, CPU for keep-alives and message handling, and network bandwidth for frames. The challenge is not opening connections—it is keeping p99 latency low while all of them remain active.
In practice, a well-tuned WebSocket server typically supports 5,000 to 30,000 concurrent connections per node. Even with aggressive optimization, self-managed clusters need hundreds of machines to approach a million connections. Pushing beyond that reliably requires managed, hyperscale services such as Azure Web PubSub, AWS AppSync, or Cloudflare PubSub. These systems rely on specialized networking stacks, event-driven I/O, and low-level optimizations that are difficult to reproduce in application code.
Concurrency alone does not define the problem, though. The harder part is handling sudden throughput spikes during critical moments in a match. A goal creates a sharp, immediate surge:
- Data providers publish updated match state within roughly 20 milliseconds.
- Backend services transform that data into scores, timelines, commentary, and stats.
- Millions of subscribed clients expect the update almost instantly.
The real enemy here is tail latency. It is not enough for most users to see the update quickly. If even a small percentage of fans receive the goal notification several seconds late during peak moments, the experience is perceived as broken.
The mental model to keep in mind is simple but unforgiving: Throughput mainly affects cost. Latency determines whether the platform survives. A system at this scale must be designed to control both.
1.2 The “Thundering Herd” problem: Managing 1M+ notifications during a goal event
A goal does not just generate a single update; it triggers a chain reaction. Millions of connected clients—some on WebSockets, others waiting for push notifications—react at almost the same time. Without careful design, this creates cascading failures across the platform:
- WebSocket clients receive an update and immediately request refreshed data.
- Push notification pipelines spike CPU, memory, and outbound network usage.
- Shared caches or databases suddenly receive millions of near-identical reads.
This pattern is known as the Thundering Herd problem. Even systems with significant headroom can fail if too many components react synchronously to the same event.
A scalable architecture avoids this by breaking the broadcast flow into clear, controlled stages:
-
Provider event (5–15ms) The raw update is ingested into a durable streaming layer such as Event Hubs.
-
State computation (15–50ms) Match state, scores, and derived data are computed and stored in fast, in-memory systems like Redis.
-
Fan-out (50–200ms)
- Web PubSub broadcasts updates to connected clients.
- Notification Hubs and FCM deliver mobile push notifications.
- Edge caches update lightweight score endpoints.
-
Client-side rendering (50–200ms) Final display depends largely on device performance and network conditions.
Each stage is isolated so that load spikes in one area do not cascade into the next. The system must also expect duplicate events from upstream providers and handle them safely. Rate limits, per-device throttling, and connection backoff strategies smooth out client behavior and prevent synchronized refresh storms.
1.3 Service Level Objectives (SLO): Defining sub-500ms end-to-end latency from stadium to screen
At World Cup scale, performance targets must be explicit. Vague goals like “fast updates” are not useful. Clear Service Level Objectives force architectural discipline and make trade-offs visible.
A realistic latency budget for staying under 500 milliseconds end-to-end looks like this:
| Stage | Target Latency |
|---|---|
| Provider → Ingestion | 20–40ms |
| Ingestion → Event Processing | 50–100ms |
| Processing → Fan-out Broadcast | 40–120ms |
| Client Network + Rendering | 100–200ms |
Meeting these targets consistently requires deliberate design choices. Ingest pipelines must scale horizontally without manual intervention. Fan-out mechanisms must handle millions of recipients efficiently. Caches must be the primary source of truth for hot data, not databases. Observability must focus on p99 and p999 latencies, not averages.
Autoscaling systems such as KEDA and Event Hubs throughput units need to react before queues build up, not after. Regional redundancy must be active-active so traffic can shift instantly during failures.
Taken together, these constraints lead to a clear architectural rule: Latency is not a non-functional concern. At this scale, it is the primary functional requirement of every service.
2 Ingestion Tier: Multi-Provider Data Orchestration
Once you accept the latency and concurrency constraints defined in the previous section, the ingestion tier becomes the first real stress point. Every live update—goals, cards, substitutions—enters the system here. If ingestion slows or becomes inconsistent, everything downstream backs up: fan-out stalls, caches serve stale data, and users see delays.
Sports data providers such as Opta, Sportradar, and Genius Sports deliver high-frequency updates, but they do not agree on schemas, naming conventions, or even time formats. The ingestion tier has one job: absorb this variability, normalize it into a stable internal model, and do so without losing events or processing the same event twice.
This layer must also evolve over time. Providers change payload formats mid-season, add new event types, or temporarily degrade under load. A production-grade ingestion architecture assumes these changes are normal and designs for them explicitly.
2.1 Ingesting Opta, Sportradar, and Genius Sports: Handling varied schemas with the Adapter Pattern
Each provider describes the same real-world action in a different way. Opta may represent a goal with "eventType": "G", Sportradar may use "type": "score" nested several levels deep, and Genius Sports might include additional contextual fields. Timestamps can arrive as epoch milliseconds, ISO strings, or provider-local timezones.
Trying to normalize these formats inline quickly turns ingestion code into an unmaintainable set of conditionals. A cleaner approach is to treat each provider as a pluggable adapter. The ingestion service receives raw payloads, identifies the provider, and delegates transformation to a dedicated adapter that maps the payload into a canonical internal event.
public interface IProviderAdapter
{
ProviderType Type { get; }
MatchEvent MapToCanonical(string rawPayload);
}
public class OptaAdapter : IProviderAdapter
{
public ProviderType Type => ProviderType.Opta;
public MatchEvent MapToCanonical(string rawPayload)
{
var data = JsonSerializer.Deserialize<OptaEvent>(rawPayload);
return new MatchEvent
{
MatchId = data.MatchId,
EventType = NormalizeEventType(data.EventCode),
Timestamp = DateTimeOffset.FromUnixTimeMilliseconds(data.Timestamp),
Payload = data
};
}
}
This pattern does two important things. First, it isolates provider-specific logic so changes do not ripple through the system. Second, it ensures that everything downstream—streaming, caching, fan-out—operates on a stable, provider-agnostic schema.
To keep this tier manageable at scale, three rules matter:
- The canonical model must be stable, even if provider payloads change.
- Adapters are responsible for validation and recovery, not downstream services.
- Schema versions must be preserved so historical data can be replayed safely.
With these constraints in place, adding or modifying a provider becomes a bounded change rather than a platform-wide risk.
2.2 Deduplication Logic: Implementing a CRDT or an Idempotency layer using Redis
High-frequency sports feeds are noisy by nature. Providers often send the same event multiple times, sometimes within milliseconds. If the ingestion tier processes every duplicate, downstream systems amplify the problem—more writes, more broadcasts, and more load at exactly the worst moment.
Deduplication must happen early and cheaply. At this scale, two approaches are commonly used.
CRDT-inspired approach
A simple Last-Write-Wins (LWW) strategy works when events include reliable timestamps. For each (MatchId, EventId), the ingestion layer keeps the most recent version and discards older ones. This approach fits well in distributed ingestion scenarios where reconciliation may occur across regions or consumers.
Redis-based idempotency layer
For simpler pipelines, Redis provides a fast and effective idempotency check. Each processed event ID is written once with a short TTL aligned to match duration.
public class IdempotencyStore
{
private readonly IDatabase _redis;
public async Task<bool> ShouldProcessAsync(string eventId)
{
// Expire after the expected match window
return await _redis.StringSetAsync(
$"evt:{eventId}",
"1",
TimeSpan.FromHours(2),
When.NotExists);
}
}
If the key already exists, the event is ignored. The check is constant time and avoids hitting databases or downstream services.
The trade-off is straightforward:
- CRDT-style reconciliation works well for complex, distributed ingestion paths.
- Redis idempotency is simpler and faster for short-lived, high-volume events.
Both approaches dramatically reduce unnecessary fan-out and help protect latency budgets during peak moments.
2.3 Event Streaming with Azure Event Hubs and KEDA: Auto-scaling consumers based on ingestion lag
After normalization and deduplication, events are published into Azure Event Hubs. This creates a durable buffer between ingestion and processing, allowing the system to absorb sudden spikes without dropping data. Event Hubs provides partitioned, ordered streams that scale to hundreds of thousands of events per second during busy matches.
Partitioning strategy matters. Assigning partitions by MatchId or provider stream ensures ordering where it matters, while allowing parallel processing across matches. Consumers can then scale horizontally without coordination overhead.
The real strength appears when Event Hubs is paired with KEDA. KEDA monitors ingestion lag—how far consumers are behind—and scales workers automatically. When a match heats up and event volume spikes, new consumers start within seconds.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: eventhub-consumer
spec:
scaleTargetRef:
name: eventhub-worker
triggers:
- type: azure-eventhub
metadata:
consumerGroup: "$Default"
unprocessedEventThreshold: "10000"
connectionFromEnv: "EVENTHUB_CONN"
This setup ensures that:
- Idle periods cost almost nothing.
- Peak moments trigger immediate scale-out.
- Ingestion lag becomes a measurable signal, not an assumption.
Just as importantly, ingestion and processing remain loosely coupled. One can scale without forcing changes in the other.
2.4 Using Polly for advanced resilience and circuit breaking during provider outages
Data providers are external dependencies, and they fail in unpredictable ways. They slow down, return partial data, or drop connections entirely—often during the biggest matches. The ingestion tier cannot block or retry blindly when this happens.
Polly provides structured resilience patterns that keep ingestion responsive under failure. Retries handle transient errors, timeouts prevent thread starvation, and circuit breakers stop repeated calls to failing providers.
var circuitBreaker = Policy
.Handle<HttpRequestException>()
.CircuitBreakerAsync(
exceptionsAllowedBeforeBreaking: 3,
durationOfBreak: TimeSpan.FromSeconds(30));
var retryPolicy = Policy
.Handle<HttpRequestException>()
.WaitAndRetryAsync(new[]
{
TimeSpan.FromMilliseconds(50),
TimeSpan.FromMilliseconds(100),
TimeSpan.FromMilliseconds(200)
});
var combined = Policy.WrapAsync(retryPolicy, circuitBreaker);
var response = await combined.ExecuteAsync(
() => httpClient.GetStringAsync(url));
This approach enforces clear boundaries:
- Provider failures do not stall ingestion pipelines.
- Workers fail fast and move on instead of waiting indefinitely.
- Circuit breaker events generate alerts so teams can react.
At ESPN scale, resilience is not defensive coding—it is expected behavior. Providers will fail, and the ingestion tier must continue operating predictably when they do.
3 Real-Time Delivery: Scaling WebSockets to Millions
Real-time delivery represents the most visible part of the architecture. Fans notice delays instantly. They compare the app to broadcast TV, social media, and messaging apps. If the platform lags even a few seconds behind, trust erodes.
This section focuses on how to deliver live scores and commentary to 10 million concurrent WebSocket connections using Azure Web PubSub and GraphQL subscriptions.
3.1 Beyond SignalR: Why Azure Web PubSub is required for 10M connections
SignalR is excellent for mid-scale scenarios, but it was not designed to sustain millions of concurrent WebSocket connections or global multi-region routing. Bottlenecks occur in:
- Connection negotiation.
- Keep-alive overhead.
- Fan-out throughput.
- Memory consumption per connection.
- Server affinity (sticky sessions).
At 10M connections, self-hosted SignalR clusters require hundreds of nodes and still risk overload. Azure Web PubSub shifts the complexity to a managed control plane with optimized fan-out, distributed connection storage, and multi-region scaling.
Key advantages:
- Elastic connection scaling without application code changes.
- Event-driven serverless handlers for filtering and authorization.
- Guaranteed message ordering per connection.
- TTL-based message buffers for clients reconnecting after brief drops.
The architecture pattern:
Event Hubs → Broadcast Worker → Web PubSub → Client
The Broadcast Worker handles filtering and grouping logic; Web PubSub handles the network layer.
3.2 GraphQL Subscriptions with HotChocolate: Implementing efficient schema-stitching for live match commentary
GraphQL is increasingly used for sports platforms because it allows clients to retrieve only the fields they need. For real-time updates, GraphQL Subscriptions with HotChocolate integrate cleanly with Web PubSub.
Example subscription schema:
type Subscription {
commentary(matchId: ID!): CommentaryEvent!
}
A HotChocolate resolver can bridge Event Hubs to Web PubSub:
public class CommentarySubscription
{
[Subscribe]
public CommentaryEvent OnCommentary([EventMessage] CommentaryEvent evt) => evt;
}
To connect Web PubSub, use HotChocolate’s integration:
services
.AddGraphQLServer()
.AddSubscriptionType<CommentarySubscription>()
.AddInMemorySubscriptions();
But for 10M clients, in-memory subscriptions are not viable. Instead, merge GraphQL with Web PubSub:
- Client negotiates GraphQL subscription.
- Server registers the subscription in Web PubSub groups (e.g.,
match-{id}). - Broadcast Worker sends canonical events to the group.
- Client receives events via WebSocket.
This avoids GraphQL servers holding millions of stateful connections and reduces compute pressure.
3.3 Connection Management: Handling “Sticky Sessions” and massive reconnection storms (The “Backoff” strategy)
When a network outage occurs—regional carrier issue, local Wi-Fi drop—millions of clients can simultaneously reconnect. This creates a dangerous connection storm.
Problems during reconnection spikes:
- Authentication endpoints overload.
- Web PubSub negotiation endpoints slow.
- Clients repeatedly retry without delay.
To prevent cascading failure, enforce exponential backoff:
Client strategy:
- First retry: 100ms + jitter
- Second retry: 250ms + jitter
- Next retries: exponential up to 10 seconds
Server strategy:
- Cache negotiation tokens for 60–120 seconds.
- Offload connection negotiation from the core API to a dedicated lightweight endpoint.
- Use CDN caching for static negotiation metadata.
A server example generating negotiation tokens:
[ApiController]
public class NegotiationController : ControllerBase
{
private readonly WebPubSubServiceClient _client;
public NegotiationController(WebPubSubServiceClient client)
{
_client = client;
}
[HttpGet("negotiate")]
public async Task<IActionResult> Negotiate()
{
var response = await _client.GetClientAccessUriAsync(
roles: new[] { "webpubsub.joinLeaveGroup", "webpubsub.sendToGroup" });
return Ok(new { url = response.ToString() });
}
}
The goal is to prevent synchronized retry patterns. A controlled reconnection storm is survivable; an uncontrolled one is catastrophic.
3.4 Code Example: A high-performance IHostedService for broadcasting updates from Event Hubs to Web PubSub
The Broadcast Worker is the core component responsible for receiving normalized events from Event Hubs and distributing them to Web PubSub groups.
A simplified production-ready version:
public class BroadcastService : BackgroundService
{
private readonly EventHubConsumerClient _consumer;
private readonly WebPubSubServiceClient _pubSub;
private readonly ILogger<BroadcastService> _logger;
public BroadcastService(
EventHubConsumerClient consumer,
WebPubSubServiceClient pubSub,
ILogger<BroadcastService> logger)
{
_consumer = consumer;
_pubSub = pubSub;
_logger = logger;
}
protected override async Task ExecuteAsync(CancellationToken token)
{
await foreach (PartitionEvent evt in _consumer.ReadEventsAsync(token))
{
try
{
var matchEvent = JsonSerializer.Deserialize<MatchEvent>(evt.Data.Body.ToArray());
var group = $"match-{matchEvent.MatchId}";
await _pubSub.SendToGroupAsync(
group,
BinaryData.FromObjectAsJson(matchEvent),
contentType: "application/json",
cancellationToken: token);
}
catch (Exception ex)
{
_logger.LogError(ex, "Broadcast failure");
}
}
}
}
Key performance considerations:
- Avoid JSON serialization inside critical paths; pre-serialize where possible.
- Use pooled byte buffers to avoid excessive GC pressure.
- Use partition-aware consumers to prevent out-of-order broadcasts.
- Batch sends for non-critical events to reduce network overhead.
In practice, this service runs as a Kubernetes workload scaled by KEDA, ensuring it always matches ingest throughput.
4 Data Strategy: Hybrid Storage for History and Speed
Real-time delivery gets updates onto screens, but data strategy determines whether the platform can keep doing that under sustained pressure. Live state must be read and updated in milliseconds, while historical data needs to support heavy analytical queries without interfering with live traffic. Trying to solve both problems with a single storage system usually leads to compromises that fail at scale.
At ESPN scale, the solution is to separate concerns deliberately. Redis Stack handles live, mutable state that changes second by second. ClickHouse stores the full, immutable event history for replay, analytics, and reporting. Each system does what it is good at, and neither is forced into a role it cannot handle efficiently.
4.1 ClickHouse for Time-Series: Storing every ball-by-ball event for instant historical analytics
A single football match generates thousands of discrete events: passes, tackles, fouls, substitutions, shots, and goals. Across a tournament, this becomes billions of records that analysts, product features, and downstream systems rely on. This data is append-only, rarely updated, and queried mostly by time range or event type—an ideal fit for ClickHouse.
ClickHouse’s columnar storage model means queries only read the columns they need. When computing metrics like possession trends, shot maps, or xG over time, the engine scans timestamps and numeric fields without touching large JSON payloads. This is why even queries across billions of rows can complete in milliseconds.
A typical schema reflects this append-only nature:
CREATE TABLE match_events
(
MatchId UInt32,
EventTimestamp DateTime64(3),
EventType String,
PlayerId UInt32,
Payload JSON
)
ENGINE = MergeTree()
PARTITION BY MatchId
ORDER BY (MatchId, EventTimestamp);
Partitioning by MatchId ensures queries stay scoped to a single match unless explicitly widened. Ordering by timestamp keeps disk reads sequential, which is critical for performance. In real deployments, partitions may also include tournament or season identifiers, depending on query patterns.
To keep ingestion fast, writes should be batched. Sending one insert per event creates unnecessary overhead. Instead, ingestion workers collect events and flush them in groups:
await using var conn = new ClickHouseConnection(connectionString);
await conn.OpenAsync();
var cmd = conn.CreateCommand();
cmd.CommandText = @"INSERT INTO match_events
(MatchId, EventTimestamp, EventType, PlayerId, Payload)
VALUES @batch";
cmd.Parameters.Add(new ClickHouseParameter
{
ParameterName = "batch",
Value = batchRows
});
await cmd.ExecuteNonQueryAsync();
This approach keeps write latency low while allowing ClickHouse to optimize storage in the background. The main operational concern is monitoring merge pressure and partition sizes so compaction does not lag behind peak ingestion periods.
4.2 Redis Stack (RedisJSON & RediSearch): Managing live match state and “hot” player stats
While ClickHouse stores history, Redis Stack powers the live experience. Scores, timers, possession percentages, and player stats change constantly and must be read thousands of times per second. These workloads require sub-millisecond access and atomic updates, which is where Redis excels.
A common pattern is to store the entire live match state as a JSON document under a predictable key like match:{id}:state. Individual fields can then be updated without rewriting the entire document:
await redis.Json.SetAsync(
$"match:{matchId}:state",
"$.score.home",
homeScore);
await redis.Json.SetAsync(
$"match:{matchId}:state",
"$.score.away",
awayScore);
This keeps network payloads small and update logic simple. RediSearch adds the ability to query across many live matches without scanning keys. For example, listing all matches currently in the second half becomes a fast index lookup:
FT.SEARCH match_idx "@period:{2}"
This is especially useful for APIs that render match lists or dashboards during busy periods. Player-level statistics follow the same pattern. Atomic updates are handled with small Lua scripts to avoid race conditions:
var script = @"
local key = KEYS[1]
local field = ARGV[1]
local inc = tonumber(ARGV[2])
redis.call('JSON.NUMINCRBY', key, field, inc)
return 1";
await redis.ScriptEvaluateAsync(
script,
new[] { $"player:{playerId}:stats" },
new[] { "$.touches", "1" });
By pushing this logic into Redis, application services remain stateless and fast. Redis serves as the authoritative source for live state, while ClickHouse captures the long-term record.
4.3 Entity Framework Core Pitfalls: When to bypass the ORM for raw performance
Entity Framework Core is productive and safe for many workloads, but it is not a good default for latency-sensitive paths in a real-time sports platform. Change tracking, query translation, and object materialization introduce overhead that becomes visible when endpoints must respond in a few milliseconds.
For hot read paths—such as fetching the current match summary—using Dapper provides more predictable performance. Dapper maps query results directly to lightweight DTOs without tracking or proxies:
var sql = @"SELECT MatchId, HomeScore, AwayScore, Period
FROM MatchSummary
WHERE MatchId = @matchId";
return await connection.QuerySingleAsync<MatchSummaryDto>(
sql,
new { matchId });
This avoids allocating full entity graphs and keeps memory pressure low. The same principle applies to analytics queries against ClickHouse. EF Core is not designed for columnar databases, so using ClickHouse.Client directly aligns better with how the database executes queries.
EF Core still has a place—for admin tools, configuration data, and low-frequency workflows. The rule of thumb is straightforward: Use EF Core where developer efficiency matters; bypass it where latency and throughput are non-negotiable.
4.4 Partitioning Strategies: Preventing cross-partition scans at scale
Partitioning decisions have long-term consequences. Poor choices lead to queries that scan far more data than intended, which becomes expensive during peak traffic. For sports data, MatchId is usually the natural boundary because most queries focus on a single match or a small set of matches.
However, some analytics span multiple matches—such as tournament-wide averages or team trends. In these cases, incorporating TournamentId into the partition strategy can help:
Partition: TournamentId
Order: TournamentId, MatchId, EventTimestamp
ClickHouse handles large partitions efficiently as long as the sort order matches query patterns. Smaller partitions can reduce merge pressure when many matches run simultaneously, but too many tiny partitions increase metadata overhead. The balance depends on how the data is queried, not on theoretical limits.
Redis follows a simpler rule: keep all keys related to a match under a shared prefix. This makes cache invalidation and cleanup predictable and fast when matches end.
Ultimately, partitioning should be driven by real query behavior. The goal is to minimize accidental cross-partition scans, especially during live events when every unnecessary read competes with real-time traffic.
5 The Push Notification Engine: 10M Alerts Per Minute
WebSockets cover fans who are actively using the app, but push notifications handle everyone else. Phones locked, apps backgrounded, devices on poor networks—all of these still need to react within seconds when a goal is scored. During major matches, that can mean delivering millions of notifications per minute without slowing down the rest of the platform.
At this scale, push notifications are not a side feature. They are a parallel delivery system with their own throughput limits, failure modes, and cost profile. The architecture must fan out events safely, route messages across multiple providers, and prioritize what truly matters when queues start filling up.
5.1 Fan-out Architecture: Decoupling the “Goal Scored” event from individual user delivery
Push delivery should never happen directly from the core event stream. A goal event is time-critical, but the act of delivering notifications to millions of devices is slow, provider-dependent, and unpredictable. Tight coupling here would block ingestion and real-time delivery during peak moments.
Instead, the system publishes a compact notification intent to a queue. This message describes what happened and who should care, but not how delivery happens.
{
"EventId": "abc123",
"MatchId": 876,
"Type": "Goal",
"TeamId": 44,
"Segments": ["team:44:followers", "match:876:watching"]
}
A fan-out service consumes these messages and expands segments into concrete delivery batches. Because this work happens asynchronously, the system absorbs sudden bursts without blocking upstream pipelines. Queues buffer load, and workers scale horizontally to keep up.
A simplified handler shows the intent:
public async Task HandleAsync(NotificationEvent evt)
{
foreach (var segment in evt.Segments)
{
var users = await segmentResolver.ResolveAsync(segment);
await dispatcher.EnqueueAsync(users, evt);
}
}
Each step is isolated. If one segment expands to millions of users, it does not stall ingestion or WebSocket broadcasts. This separation is what keeps the system stable when traffic spikes instantly.
5.2 Segmented Broadcasting: Using Azure Notification Hubs and Firebase (FCM) in parallel
No single push provider performs best everywhere. Notification Hubs simplifies multi-platform delivery, but direct integration with Firebase Cloud Messaging often performs better for Android-heavy regions. At ESPN scale, using both paths in parallel provides flexibility and resilience.
A common strategy is to route high-priority notifications—goals, red cards, match-ending events—through multiple delivery paths. Less urgent messages, such as reminders or summaries, go through a single provider to reduce cost.
Sending through Notification Hubs:
await notificationHubClient.SendFcmNativeNotificationAsync(
jsonPayload,
tags: new[] { $"match_{matchId}" });
Sending directly through FCM:
var message = new Message
{
Topic = $"match_{matchId}",
Data = new Dictionary<string, string>
{
["event"] = evt.Type,
["score"] = evt.Payload
}
};
await firebaseMessaging.SendAsync(message);
This approach allows routing decisions to change at runtime. If one provider slows down or fails, traffic can shift without code changes. It also avoids long-term vendor lock-in, which matters when notification volume directly affects operating cost.
5.3 Priority Queuing: Making sure goal alerts always go first
During peak moments, notification queues fill quickly. Without prioritization, critical alerts get stuck behind low-value messages like half-time summaries or promotional content. That delay is immediately visible to users.
RabbitMQ priority queues solve this problem by allowing messages to carry explicit priority values. Consumers always process higher-priority messages first, regardless of when they arrived.
Queue configuration:
channel.QueueDeclare(
queue: "notifications",
durable: true,
exclusive: false,
autoDelete: false,
arguments: new Dictionary<string, object>
{
{ "x-max-priority", 10 }
});
Publishing a high-priority goal alert:
var props = channel.CreateBasicProperties();
props.Priority = 9;
channel.BasicPublish(
exchange: "",
routingKey: "notifications",
basicProperties: props,
body: payload);
This ensures that goal alerts are never delayed by background notifications. The consumer logic remains simple, and the queue enforces ordering automatically. Under heavy load, this difference is the line between “late but delivered” and “delivered when it matters.”
5.4 Personalization at Scale: Applying user preferences without database lookups
Push notifications are only valuable if they are relevant. Fans expect alerts for their teams, not every match. But checking preferences against a relational database for millions of users is not viable during live events.
The solution is to push preference data closer to the fan-out layer. Redis works well here, using simple set-based structures:
team:{teamId}:followersmatch:{matchId}:subscribers
Checking eligibility becomes a constant-time operation:
bool shouldSend = await redis.SetContainsAsync(
$"team:{teamId}:followers",
userId);
For extremely high-volume scenarios, fan-out workers can preload Bloom filters or compressed bitsets into memory at startup. This reduces Redis traffic further while keeping false positives within acceptable limits.
The goal is to make personalization effectively free from a latency perspective. When preference checks are O(1), the system can scale to millions of notifications per minute without personalization becoming a bottleneck.
6 Edge Intelligence and Predictive Caching
By the time data reaches the delivery layer, most of the work should already be done. The majority of fan traffic during a match is read-heavy and highly repetitive: current score, match clock, cards, substitutions, and basic player stats. Serving these reads from a central region adds latency and puts unnecessary pressure on core services.
At ESPN scale, the edge becomes part of the application architecture. Edge caching and lightweight compute move the “read path” closer to users, while predictive caching ensures the hottest data is already in memory before traffic spikes begin.
6.1 Azure Front Door and Cloudflare Workers: Moving the read path to the edge
Modern CDNs are no longer limited to static files. Azure Front Door and Cloudflare Workers can execute small pieces of logic at the edge, fetch data from nearby caches, and return responses without touching the core application stack. This reduces round trips and keeps latency consistent across regions.
A common pattern is to expose a lightweight edge endpoint for live match state. The worker retrieves the latest state from a regional Redis replica and returns it directly to the client:
export default {
async fetch(req, env) {
const url = new URL(req.url);
const matchId = url.searchParams.get("match");
const state = await env.REDIS.get(`match:${matchId}:state`);
return new Response(state, {
headers: { "Content-Type": "application/json" }
});
}
};
Azure Front Door complements this by caching short-lived JSON responses—often for just a few seconds. That window is enough to absorb massive read bursts during goals or penalties without making the data feel stale. Origin shields ensure that even cache misses are aggregated, preventing thousands of edge nodes from hitting the backend simultaneously.
6.2 Predictive pre-warming: Filling caches before fans arrive
Traffic does not start at kickoff. Fans open the app well in advance to check lineups, formations, and pre-match commentary. If caches are cold at that moment, the platform pays the price when load ramps up quickly.
Predictive pre-warming solves this by hydrating Redis with known hot keys ahead of time. A scheduled worker reads the match schedule and prepares the expected data structures 20–30 minutes before kickoff.
public async Task PrewarmAsync(IEnumerable<int> upcomingMatchIds)
{
foreach (var id in upcomingMatchIds)
{
var lineup = await api.GetLineupAsync(id);
await redis.Json.SetAsync($"match:{id}:lineup", "$", lineup);
var summary = await api.GetSummaryAsync(id);
await redis.Json.SetAsync($"match:{id}:summary", "$", summary);
}
}
This approach smooths the load curve and avoids a thundering herd of cache misses right before the match starts. When tied into tournament schedules, these workers can scale automatically using KEDA, ensuring pre-warming keeps pace with the number of simultaneous matches.
6.3 Cache invalidation: Preventing stampedes with a lease pattern
Cache expiration is dangerous during live events. If a hot key expires at the wrong moment, thousands of clients can hit the backend simultaneously, overwhelming services that were otherwise stable. This is known as a cache stampede.
The lease pattern prevents this by allowing only one worker to refresh expired data while others continue serving slightly stale values. Redis supports this using SET with NX semantics:
var leaseKey = $"lease:{key}";
bool acquired = await redis.StringSetAsync(
leaseKey,
"1",
expiry: TimeSpan.FromSeconds(5),
when: When.NotExists);
if (acquired)
{
var fresh = await fetchFromOrigin();
await redis.StringSetAsync(key, fresh, TimeSpan.FromSeconds(5));
await redis.KeyDeleteAsync(leaseKey);
}
else
{
return await redis.StringGetAsync(key); // serve stale
}
This pattern trades a small amount of staleness for system stability. During match-critical moments, that trade-off is almost always worth it. Fans prefer data that is a second old over an app that fails to load.
6.4 Handling dynamic content: Choosing the right TTL for the right data
Not all data needs the same freshness. Live scores and match clocks change constantly, while player biographies or venue details may not change for months. Treating all content the same wastes resources and increases risk.
A practical approach is to group data by volatility and assign TTLs accordingly. Live fields get very short TTLs—often one to three seconds—and are updated proactively through push mechanisms. Static fields get long TTLs and rely on cache expiry rather than frequent refreshes.
await redis.StringSetAsync(
"match:88:score",
payload,
TimeSpan.FromSeconds(1));
await redis.StringSetAsync(
"player:101:bio",
payload,
TimeSpan.FromHours(24));
This keeps the cache efficient and predictable. Hot paths stay fast under pressure, and cold paths avoid unnecessary backend traffic. When TTLs reflect how the domain actually changes, caching becomes a reliability feature, not just a performance optimization.
7 Reliability Engineering: Testing for the “Final”
Large-scale sports platforms rarely fail because of a single bad design decision. They fail because assumptions made during development don’t hold under real match-day pressure. Reliability engineering exists to close that gap. It answers a simple question: how does the system behave when everything goes wrong at once?
At ESPN scale, reliability testing must reflect reality. That means production-like traffic patterns, real data volumes, and deliberate failure scenarios. Anything less gives a false sense of confidence and leaves teams unprepared for the final.
7.1 Chaos Engineering with Azure Chaos Studio: Simulating a regional data center failure during a match
Chaos engineering is about validating assumptions, not creating outages for their own sake. Azure Chaos Studio lets teams inject targeted faults—network delays, CPU stress, service interruptions—directly into live-like environments. This makes it possible to test failure modes that are otherwise hard to reproduce.
One of the most important scenarios is a regional failure during a live match. If a region hosting Web PubSub or Event Hubs becomes unavailable, traffic must shift cleanly to other regions without noticeable impact. This only works if routing, state replication, and autoscaling are already configured correctly.
A simplified chaos experiment might inject latency into a region for several minutes:
{
"properties": {
"steps": [
{
"name": "region-outage",
"branches": [
{
"name": "fault",
"actions": [
{
"type": "Microsoft-Chaos/Network/Latency",
"parameters": {
"duration": "PT5M",
"delay": "2000"
}
}
]
}
]
}
]
}
}
During the experiment, teams watch how traffic reroutes, how quickly clients reconnect, and whether ingestion or fan-out backlogs grow. If Redis latency spikes or Event Hub lag increases beyond acceptable limits, the system is not ready. Fixing these gaps during controlled chaos is far safer than discovering them during a live final.
7.2 Load Testing at Scale: Using Azure Load Testing to simulate match-day traffic
Load testing only matters if it resembles real usage. Sports platforms don’t see steady traffic; they see sudden spikes tied to kickoff, goals, and halftime. Azure Load Testing, powered by JMeter, makes it possible to simulate these patterns across regions with synchronized timing.
A realistic test suite usually includes several scenarios running in parallel:
- WebSocket negotiation bursts as fans open the app before kickoff.
- Cache-heavy reads during goals and penalties.
- API spikes for lineups, odds, and live commentary.
- Push registration and delivery during pre-match and post-goal surges.
JMeter thread groups allow these patterns to ramp up aggressively:
<ThreadGroup guiclass="ThreadGroupGui" testclass="ThreadGroup">
<stringProp name="ThreadGroup.num_threads">200000</stringProp>
<stringProp name="ThreadGroup.ramp_time">60</stringProp>
<stringProp name="ThreadGroup.scheduler">true</stringProp>
</ThreadGroup>
Azure Load Testing distributes this traffic across regions so behavior mirrors real-world usage. As load increases, autoscaling systems—KEDA, VM scale sets, App Service Plans—must react quickly and predictably. Engineers watch p99 latency, queue depth, Redis eviction rates, and Web PubSub connection stability.
The goal is not to reach a headline number like “1M RPS.” The goal is to ensure the platform degrades gracefully and recovers cleanly under stress. Any unexplained metric spike is a signal to revisit architecture, not just add capacity.
7.3 Observability: Seeing bottlenecks before users feel them
Without strong observability, reliability engineering turns into guesswork. OpenTelemetry provides a unified way to instrument services with metrics, traces, and logs. Combined with Prometheus and Grafana, it gives teams real-time visibility into how the system behaves during critical moments.
In .NET services, instrumentation can be added centrally:
builder.Services.AddOpenTelemetry()
.WithMetrics(m => m
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddRuntimeInstrumentation()
.AddPrometheusExporter())
.WithTracing(t => t
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddSource("MatchEngine")
.AddOtlpExporter());
During matches, certain signals matter more than others:
- Web PubSub connection churn and reconnect rates
- Event Hub consumer lag
- Redis command latency and timeouts
- p99 latency for GraphQL subscriptions
- Push notification failures by provider
Grafana dashboards should correlate these metrics with match timelines. When a goal is scored, engineers should immediately see how every layer responds. If Redis latency jumps at the same moment a cache expires, the cause becomes obvious. With proper tracing, these insights appear in minutes instead of hours.
7.4 The Kill Switch Pattern: Preserving core features under extreme load
When load exceeds safe limits, the platform must protect what matters most. Live scores, commentary, and notifications take priority. Optional features—social comments, advanced stats, personalization widgets—should be the first to go.
Kill switches make this possible without redeploying code. Feature flags stored in a fast configuration system allow services to disable non-essential paths instantly:
public async Task<IActionResult> GetComments(int matchId)
{
if (await featureFlags.IsEnabledAsync("COMMENTS_DISABLED"))
return StatusCode(503);
return Ok(await commentsService.GetLatestAsync(matchId));
}
Health-monitoring services watch indicators like CPU saturation, Event Hub backlog, or p999 latency. When thresholds are crossed, kill switches activate automatically. This prevents cascading failures and buys engineers time to investigate.
Once load stabilizes, features can be re-enabled gradually or kept disabled until the match ends. In practice, users rarely notice missing secondary features—but they always notice when core updates stop working.
8 Implementation Blueprint: A Production-Ready Reference
By this point, the architecture is clear. What remains is execution. At ESPN scale, reliability does not come from diagrams alone—it comes from making sure every decision is repeatable, auditable, and reversible under pressure. The implementation blueprint exists to remove guesswork when traffic peaks and time is limited.
This section focuses on how the system is actually built, deployed, scaled, and operated so that match day feels routine rather than risky.
8.1 Infrastructure as Code (IaC): Managing the surge with Terraform or Bicep
At global scale, manual configuration is not just inefficient—it is dangerous. Infrastructure as Code ensures that every environment is created the same way, with the same defaults, limits, and failover behavior. When traffic spikes or regions fail, there should be no surprises.
In Azure-heavy environments, Bicep and Terraform are the most common choices. Bicep integrates tightly with Azure-native tooling, while Terraform is often preferred for its mature state management and module ecosystem. Both work as long as the infrastructure is fully declarative.
A simple Bicep example for provisioning Event Hubs capacity:
resource eventHubNs 'Microsoft.EventHub/namespaces@2022-10-01-preview' = {
name: 'sports-prod-eh'
location: location
sku: {
name: 'Standard'
tier: 'Standard'
capacity: 20
}
}
In practice, this definition is wrapped in modules that parameterize region count, throughput units, and redundancy. Terraform deployments follow a similar pattern, using variables to control scale without duplicating code. This guarantees that secondary regions are true peers, not under-provisioned backups.
IaC also enables temporary scale-ups. During tournaments, teams can spin up additional Web PubSub units or compute pools and tear them down afterward with confidence. Everything is tracked, versioned, and reversible.
8.2 Deployment Strategy: Blue-green deployments with zero tolerance for downtime
Live sports platforms cannot afford deployment risk during matches. Any change that disrupts active connections or introduces latency spikes is unacceptable. Blue-green deployments solve this by separating deployment from traffic.
One environment (blue) serves all production traffic. The other (green) receives the new version. Once health checks pass, traffic shifts gradually using Front Door or Application Gateway rules.
A simplified deployment step for a .NET service:
jobs:
- job: DeployGreen
steps:
- task: AzureWebApp@1
inputs:
appName: 'match-api-green'
package: '$(Build.ArtifactStagingDirectory)/api.zip'
Connection-heavy components require extra care. WebSocket-based systems cannot tolerate mass reconnects caused by abrupt traffic switches. Traffic must move in small increments—1%, then 5%, then 20%—while monitoring latency, error rates, and reconnect behavior.
If anything degrades, rollback is immediate. Traffic snaps back to blue without redeploying. For users, the change is invisible. That is the standard expected during finals and knockout rounds.
8.3 Cost Optimization: Scaling down after the crowd leaves
The architecture that supports ten million concurrent users is expensive to run continuously. Fortunately, sports traffic is seasonal and predictable. After a tournament ends, usage drops sharply, and the platform must contract just as smoothly as it expanded.
Scale-down plans should be built into the same IaC modules used for scaling up. For compute-heavy components, node pools or VM scale sets are reduced:
az aks nodepool scale \
--resource-group sports-rg \
--cluster-name sports-aks \
--name systempool \
--node-count 5
Other components follow the same pattern. Redis clusters shrink to fewer shards. Web PubSub units scale down. Event Hubs reduce throughput units or move off premium tiers. These changes are controlled by environment-level flags such as traffic_level = off-season.
Not everything shrinks equally. ClickHouse clusters often remain steady because analytics workloads continue. Hot caches shrink dramatically. Background jobs may move to spot instances. When planned properly, teams routinely cut infrastructure costs by 60–70% after major events without compromising baseline reliability.
8.4 Final Checklist: The Architect’s war room on match day
No matter how good the architecture is, match day success depends on preparation. A shared war room—virtual or physical—keeps everyone aligned when traffic spikes and decisions must be made quickly.
Before kickoff, teams walk through a checklist so nothing relies on memory or assumption.
-
Capacity verification
- Event Hub partitions and throughput units confirmed
- Web PubSub connection limits validated
- Redis memory headroom and eviction policies checked
-
Failover readiness
- Front Door routing rules tested
- Secondary regions warm and synchronized
-
Autoscaling validation
- KEDA triggers active and thresholds reviewed
- AKS or VMSS node pools pre-scaled for expected peaks
-
Notification pipeline health
- FCM, APNs, and Notification Hubs reachable
- Segment and preference caches up to date
-
Observability and alerts
- Grafana dashboards preloaded
- Alerts tuned to avoid noise during spikes
-
Feature flags and kill switches
- Non-essential features in low-impact mode
- Automatic kill switches armed and tested
-
Communication protocols
- Clear ownership for ingestion, delivery, and infrastructure
- Escalation paths confirmed and staffed
This checklist turns a high-stress event into a controlled operation. When millions of fans are watching and reacting in real time, disciplined execution matters as much as the architecture itself.