1 Problem Framing & Goals
Designing a real-time multiplayer card game platform in .NET requires balancing fairness, responsiveness, and cost at global scale. Unlike action-heavy genres, card games emphasize deterministic logic, rule enforcement, and replayability. This allows us to architect with deterministic lockstep and actor-based simulation rather than full physics synchronization. In this section, we’ll frame the technical problem, establish success criteria, and align on architecture drivers for a production-grade implementation.
1.1 Scope & Success Criteria
Let’s start with clear boundaries. We’re targeting games such as Poker, Blackjack, or Magic: The Gathering (MTG)-style turn-based titles. These games share common traits: discrete state transitions, hidden information, predictable actions, and deterministic outcomes. Our goal is to support:
- 100,000 concurrent matches, each isolated and independently simulated.
- Latency budget under 150 ms round-trip for command acknowledgments.
- Fairness enforced by deterministic execution and verifiable event logs.
- Cross-platform reach, including mobile and web clients using SignalR.
- Replayability and auditability, with event-sourced match histories.
The architectural baseline relies on the modern .NET 9 ecosystem, optimized for concurrency and observability:
| Layer | Technology | Responsibility |
|---|---|---|
| Frontend | Blazor Hybrid or Unity WebGL (via C# client) | Player UI, animations, local prediction |
| Transport | ASP.NET Core SignalR | Real-time messaging (WebSocket-first) |
| Compute | Microsoft Orleans 9.2.x | Actor-based game simulations (grains per match) |
| Caching/Queues | Redis | Matchmaking, session cache, rate limits |
| Storage | EventStoreDB or Marten on PostgreSQL | Durable event sourcing, snapshots |
| Telemetry | OpenTelemetry | Distributed tracing, metrics, structured logs |
The architecture emphasizes authoritative simulation (no peer trust), idempotent event processing, and reproducibility. Every move, shuffle, and draw is recorded as an event, and the state at any point can be reconstructed by replaying these events with a known random seed.
1.1.1 Fairness and Replayability
For competitive multiplayer, fairness means identical outcomes for identical inputs. A given match seed (MatchSeed) initializes deterministic RNG and card order, ensuring identical replays. Every input from clients is timestamped, validated, and appended to the event stream. This allows post-game verification — if any discrepancy occurs, we can replay the event log to confirm state hashes.
Example of seeding RNG deterministically:
public class MatchRandom
{
private readonly Random _rng;
public MatchRandom(Guid matchId)
{
int seed = matchId.GetHashCode();
_rng = new Random(seed);
}
public int Next(int min, int max) => _rng.Next(min, max);
}
When every match uses the same MatchRandom, two servers replaying the same event sequence will produce identical results — the foundation of fairness.
1.1.2 Performance Targets
We design around Orleans virtual actors, one per active match. Each grain is lightweight (tens of kilobytes in memory), meaning a single silo on a 16-core VM can host 20–30k active grains comfortably. Horizontal scaling adds capacity linearly.
For 100k concurrent matches:
- 4 Orleans silos (16 cores, 64 GB RAM each)
- Redis cluster for ephemeral data
- Azure SignalR Service for WebSocket fan-out to clients
- EventStoreDB cluster for durable match logs and snapshots
Latency is bounded by three main contributors:
- Network RTT (client ↔ gateway): typically 50–120 ms.
- Gateway dispatch (SignalR hub to grain): <5 ms median.
- Grain processing + broadcast: <10 ms median.
These ensure round-trip visibility under 150 ms, enough to make animations feel responsive.
1.1.3 Platform Observability
Everything — from match actions to connection churn — flows through OpenTelemetry. Using unified traces, we correlate per-match logs, Redis queries, and SignalR hub invocations:
builder.Services.AddOpenTelemetry()
.WithTracing(tracer =>
tracer.AddAspNetCoreInstrumentation()
.AddRedisInstrumentation()
.AddSource("Game.MatchGrain"))
.WithMetrics(metrics =>
metrics.AddMeter("Game.Metrics")
.AddAspNetCoreInstrumentation());
Operators can visualize per-match latency, dropped connections, and load patterns in Grafana or Azure Monitor, forming a robust operational feedback loop.
1.2 Network Model Choices
Real-time multiplayer games fall into three families of synchronization models: state synchronization, deterministic lockstep, and server-authoritative prediction. Choosing the right one defines both fairness guarantees and scalability.
1.2.1 State Synchronization
In a state-sync model, the server continuously transmits authoritative state snapshots to clients — e.g., player hands, board state, and timers. Clients are passive viewers that simply render the latest snapshot.
- Pros: Simplest to reason about, handles client divergence automatically.
- Cons: Bandwidth-heavy, difficult to scale to 100k+ concurrent matches, especially when each frame transmits large serialized states.
This model suits fast-moving games like shooters but is excessive for deterministic, turn-based domains.
1.2.2 Deterministic Lockstep
Deterministic lockstep was popularized by early RTS games (e.g., Age of Empires). Instead of syncing full state, clients send inputs only (e.g., “play card X”, “end turn”). The server queues and broadcasts those inputs, which every client applies locally using the same deterministic simulation.
Card games map perfectly to this pattern:
- Minimal input bandwidth (tens of bytes per action).
- No continuous simulation loop — state only advances when a move occurs.
- Server can validate moves deterministically and reject invalid commands.
The cost is that every client and the server must maintain identical simulation code paths. Even subtle differences (e.g., floating-point rounding, RNG seeds) can cause divergence.
Example client command payload:
{
"matchId": "b5a9...",
"playerId": "p1",
"inputTick": 123,
"command": {
"type": "PlayCard",
"cardId": "C-452",
"targetId": "T-14"
}
}
Each command is timestamped and appended to the server’s authoritative queue for that tick.
1.2.3 Hybrid Server-Authoritative Prediction
In modern practice, a hybrid model — deterministic lockstep with server authority — gives the best balance. Clients predict local outcomes instantly (for responsiveness), but the server confirms or corrects the final state. This minimizes perceived latency while preventing cheating.
For example:
- Client predicts “Card X played successfully.”
- Server validates legality; if valid, rebroadcasts
MoveCommitted. - If invalid (e.g., wrong mana cost), server sends correction; client rolls back and replays pending inputs.
1.2.4 Why Lockstep Wins for Card Games
For card-based games:
- Actions are discrete and deterministic.
- There’s little continuous simulation.
- Bandwidth efficiency matters when scaling horizontally.
The lockstep model transmits intent, not state. Instead of sending full board layouts, we send compact actions that reproduce the same outcome on every node. With event sourcing, this doubles as a replay log — meaning replays and audits are free byproducts.
1.3 Non-Functional Requirements
Architecting for scale means thinking beyond gameplay logic. Non-functional attributes determine how sustainable the system is under real-world stress.
1.3.1 Scalability
Each Orleans grain represents a match — activated on demand and persisted when idle. Orleans’ built-in placement and activation management lets us horizontally scale to hundreds of thousands of concurrent grains. Key practices:
- Stateless frontends: The SignalR gateway can scale independently.
- Partitioned caches: Redis shards hold matchmaking queues and presence data.
- Eventual consistency: Read models for lobbies and leaderboards refresh asynchronously via projections.
Because card matches are mostly I/O-bound (small messages, small state), the bottleneck shifts to SignalR fan-out. Azure SignalR Service allows >100k concurrent WebSocket connections by adding units, each handling ~1,000–2,000 connections comfortably.
1.3.2 Availability and Fault Tolerance
- Silo recovery: Orleans reminders restart matches gracefully after node failure.
- SignalR reconnect: Clients resume sessions using stateful reconnect tokens.
- Event sourcing: Matches can replay from their event streams after restart, ensuring continuity.
public override async Task OnActivateAsync(CancellationToken ct)
{
var snapshot = await _repo.LoadSnapshotAsync(this.GetPrimaryKeyString());
_state = snapshot?.State ?? MatchState.New();
}
This activation model ensures a crashed match grain can restore the last known state deterministically.
1.3.3 Observability
We embed distributed tracing and metrics at all key boundaries: transport, actor, and persistence. Example metrics:
signalr_connections_activematch_latency_msgrain_activation_countredis_queue_depth
Traces link these components, helping us diagnose end-to-end latency or lock contention under load.
1.3.4 Cost and Efficiency
With deterministic lockstep, bandwidth per match is roughly <2 KB/s for active games — meaning a 100k-match cluster consumes under 200 MB/s aggregate outbound traffic, feasible within mid-tier Azure or AWS footprints.
The most expensive components at scale are:
- SignalR Service Units
- Redis Premium Tiers
- EventStoreDB IOPS for append-heavy workloads
Cost mitigation strategies include:
- Snapshotting to reduce replay overhead.
- Cold storage for historical events.
- Rate limits per player to prevent denial-of-wallet attacks.
1.3.5 Compliance and Data Governance
To comply with privacy regulations (GDPR, CCPA):
- PII is excluded from domain events; only IDs are persisted.
- Metadata tables map player IDs to erased/anonymized records.
- Redaction jobs can remove or rewrite sensitive attributes post-retention.
Example redacted event shape:
{
"eventType": "CardDrawn",
"playerId": "anon-4839",
"matchId": "b5a9...",
"timestamp": "2025-11-03T14:21:00Z"
}
Only external systems (profiles, payments) retain identifiable data; game event logs remain compliant and reproducible.
2 Reference Architecture (High-Level)
With goals and constraints defined, we can outline a modular reference architecture built for determinism, scale, and observability. Every layer serves a single purpose but integrates through consistent event-driven contracts.
2.1 Components and Responsibilities
2.1.1 Clients (Web/Mobile)
Clients are responsible for:
- Sending player inputs via SignalR (
HubConnection.InvokeAsync). - Predicting local outcomes for responsiveness.
- Replaying corrections when the server sends authoritative state.
- Handling temporary disconnects via offline queue.
Typical SignalR client setup:
var connection = new HubConnectionBuilder()
.WithUrl("https://game.example.com/realtime")
.WithAutomaticReconnect()
.AddMessagePackProtocol()
.Build();
await connection.StartAsync();
await connection.InvokeAsync("JoinMatch", matchId);
To reduce bandwidth, use MessagePack serialization and delta updates rather than full board re-sends.
Clients queue outgoing commands when offline and reconcile when reconnected:
_offlineQueue.Enqueue(new PlayCardCommand(cardId));
if (connection.State == HubConnectionState.Connected)
{
while (_offlineQueue.TryDequeue(out var cmd))
await connection.InvokeAsync("SubmitMove", cmd);
}
2.1.2 Real-Time Gateway
The gateway layer, implemented with ASP.NET Core SignalR, handles authentication, connection state, and fan-out to matches. It’s stateless, scaling horizontally behind a load balancer.
Key features:
- Stateful reconnects (ASP.NET Core 8+).
- Connection groups per match.
- Backpressure metrics for overload protection.
Example hub skeleton:
public class MatchHub : Hub
{
private readonly IClusterClient _orleans;
public MatchHub(IClusterClient orleans) => _orleans = orleans;
public async Task SubmitMove(MoveCommand cmd)
{
var grain = _orleans.GetGrain<IMatchGrain>(cmd.MatchId);
await grain.SubmitMove(cmd.PlayerId, cmd);
}
}
2.1.3 Game Domain (Orleans Cluster)
Each match is a dedicated Orleans grain. Orleans virtual actors provide:
- Single-threaded isolation (no explicit locks).
- Automatic persistence and reminders.
- Seamless scale-out and fault recovery.
Example MatchGrain structure:
public interface IMatchGrain : IGrainWithStringKey
{
Task SubmitMove(string playerId, MoveCommand command);
}
public class MatchGrain : Grain, IMatchGrain
{
private MatchState _state = new();
private readonly IEventStore _eventStore;
public MatchGrain(IEventStore eventStore) => _eventStore = eventStore;
public async Task SubmitMove(string playerId, MoveCommand command)
{
if (!_state.CanPlay(playerId, command))
throw new InvalidOperationException("Illegal move");
var evt = _state.Apply(command);
await _eventStore.AppendAsync(this.GetPrimaryKeyString(), evt);
await Clients.Group(_state.MatchId).SendAsync("MoveCommitted", evt);
}
}
This actor encapsulates game rules, ensuring only valid transitions occur.
2.1.4 Persistence
Event sourcing provides full historical traceability. Two practical implementations exist:
- EventStoreDB: purpose-built, high-throughput event store with append-only semantics.
- Marten (on PostgreSQL): document + event store hybrid for easier analytics.
Both support per-stream event sequences and snapshots.
Snapshot example:
public record MatchSnapshot(Guid MatchId, MatchState State, int Version);
Every N events (say, 50), a snapshot is saved to reduce replay time.
2.1.5 Matchmaking (Redis)
Redis excels at real-time ranking and queue management. Players enter matchmaking via sorted sets keyed by ELO:
ZADD "matchmaking:region:eu" 1520 player:123
ZRANGEBYSCORE "matchmaking:region:eu" 1500 1550 LIMIT 0 10
When a match is found, both players are popped and assigned a new Orleans grain for the match session.
2.1.6 Observability and Operations
OpenTelemetry instruments all layers automatically, and you can export to Prometheus or Azure Monitor.
Common telemetry exports:
- Traces:
MatchGrain.SubmitMove,SignalR.Send - Metrics: connection count, move latency, queue depth
- Logs: structured JSON with correlation IDs
Example configuration:
services.AddOpenTelemetry()
.ConfigureResource(r => r.AddService("CardGamePlatform"))
.WithTracing(b => b
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddSource("Orleans")
.AddRedisInstrumentation())
.WithMetrics(b => b
.AddMeter("CardGame.Matches")
.AddPrometheusExporter());
2.2 Deployment & Scale Patterns
2.2.1 Scaling the Real-Time Gateway
The SignalR gateway is stateless, so scaling is horizontal:
- Each instance handles ~5,000–10,000 WebSocket connections.
- Azure SignalR Service provides elastic fan-out beyond a single VM.
Example capacity plan:
| Component | Scale Unit | Estimated Limit |
|---|---|---|
| SignalR Gateway | 4× D4s_v5 (4vCPU) | 40k connections |
| Azure SignalR Service | 4 units | +80k connections |
| Orleans Cluster | 4× D16s_v5 | 100k active matches |
| Redis Cluster | 3-node Premium | 250k ops/sec |
| EventStoreDB | 3-node HA | 50k appends/sec |
Connections per SignalR unit can be tuned by adjusting ServiceMode:
"Azure:SignalR:ServiceMode": "Serverless"
Serverless mode offloads connection management, freeing compute for domain logic.
2.2.2 Orleans Cluster Scaling
Orleans scales nearly linearly with CPU cores. Each silo can manage 20k+ active grains. We use Kubernetes or Azure Container Apps for orchestration, setting grainDirectory to a distributed store like Azure Table or Redis.
Sample Kubernetes deployment snippet:
apiVersion: apps/v1
kind: Deployment
metadata:
name: orleans-silo
spec:
replicas: 4
template:
spec:
containers:
- name: silo
image: cardgame/orleans-silo:latest
env:
- name: ASPNETCORE_ENVIRONMENT
value: "Production"
- name: ORLEANS_CLUSTER_ID
value: "CardGameCluster"
2.2.3 Backplane vs Azure SignalR Service
Two options for SignalR scale-out:
- Redis backplane: Simple, self-managed, cheaper at small scale.
- Limitation: message fan-out bottleneck beyond ~20k connections.
- Azure SignalR Service: Managed elasticity, built-in reconnects, automatic scaling.
- Trade-off: slightly higher latency (≈10–20 ms overhead).
For 100k concurrent connections, Azure SignalR is more predictable and operationally simpler.
2.2.4 Reliability under Load
Production systems must handle disconnect storms, regional outages, and match migration. Strategies include:
- Buffered send queues in hubs.
- Client-side exponential backoff reconnects.
- Grain rehydration from snapshots after crash.
2.2.5 Operational Limits
Empirical load tests show:
- 1 Orleans grain uses ~50 KB memory (state + runtime).
- 16-core, 64 GB VM → 30k concurrent matches.
- 4 silos → 120k matches sustainable under 60% CPU.
With event sourcing, storage growth averages 0.5–1 MB per completed match, so a 1 TB Postgres instance can retain ~1 million historical games before archiving
3 Deterministic Simulation for Card Games
Creating deterministic simulations is the backbone of fairness and reproducibility in multiplayer card games. Every player and server must arrive at the exact same game state when given the same ordered inputs. This section dives into the strategies, implementation practices, and Orleans grain design patterns that make deterministic lockstep reliable in production.
3.1 Determinism Strategies
Determinism begins with the principle that the same inputs produce the same outputs — no hidden side effects, time-dependent logic, or random variation between runs. In card games, this means card draws, shuffles, and rule resolutions must be reproducible byte-for-byte.
3.1.1 Fixed-Step Simulation
Although most card games don’t run a physics-style loop, it’s still important to simulate in fixed “ticks.” Each move is processed in discrete steps (tick = turnNumber * actionIndex). This ensures events are ordered and predictable even if players act at variable speeds.
Every command is associated with a tick index and validated before being applied to state:
public record InputFrame(int Tick, string PlayerId, PlayerCommand Command);
public void ProcessInputFrame(InputFrame frame)
{
if (frame.Tick != _state.ExpectedTick)
throw new InvalidOperationException("Out-of-sequence input");
_state = _reducer.Apply(_state, frame.Command);
_state.ExpectedTick++;
}
With this fixed-step model, you can pause, replay, or resimulate a match deterministically across machines.
3.1.2 Integer Math and Stable RNG
Floating-point operations can diverge slightly across CPUs or frameworks. Even minor rounding differences will desynchronize clients in a lockstep model. The safest approach is integer-only arithmetic for all scoring and probability decisions.
For randomization, every match uses a seeded RNG derived from its match ID and optional salt. The same seed must yield the same deck shuffle and card draws across replays.
public sealed class DeterministicRng
{
private readonly Random _rng;
public DeterministicRng(Guid matchId, int salt)
{
var seed = HashCode.Combine(matchId.GetHashCode(), salt);
_rng = new Random(seed);
}
public int Next(int min, int max) => _rng.Next(min, max);
}
This RNG must be used consistently in both server simulation and any client prediction layer. Never call DateTime.Now or new Random() inside rules — those cause divergence.
3.1.3 Canonical Ordering and Serialization
In card games, ordering is everything. The order of cards in decks, graveyards, and hands must be canonicalized before serialization or hashing. Even iteration over dictionary keys must be deterministic (use sorted keys or indexed lists).
When serializing game state or side effects, keep a consistent property order and type schema. MessagePack and System.Text.Json both support deterministic serialization if configured properly:
var options = new JsonSerializerOptions
{
WriteIndented = false,
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull
};
var json = JsonSerializer.Serialize(state, options);
Always version events explicitly to prevent schema drift breaking replays.
3.1.4 Event-Based Side Effects
A deterministic lockstep engine should never mutate global state directly. Instead, every state transition is represented as an event — the only legal output of a reducer. These events are idempotent and serializable. Example event sequence for a turn:
[
{ "eventType": "CardDrawn", "playerId": "p1", "cardId": "C13" },
{ "eventType": "CardPlayed", "playerId": "p1", "cardId": "C13" },
{ "eventType": "EffectResolved", "effect": "DealDamage", "targetId": "p2" }
]
Because every action results in an event, you can replay the same sequence offline and confirm the same outcome hash.
3.2 Practical Implementation
Determinism isn’t a single class — it’s an ecosystem of validation, idempotency, and consistent event handling. Here’s how to build it incrementally in a .NET game server.
3.2.1 Validation Before Commit
The golden rule is: never mutate state before validating the command. Each client action must be checked against the current turn, mana pool, and card ownership before the reducer applies it.
public bool CanPlayCard(Player player, Card card)
{
return player.Mana >= card.Cost && player.Hand.Contains(card.Id);
}
Only if CanPlayCard passes do we emit a CardPlayed event. This guarantees that replaying events yields the same valid state every time.
3.2.2 Idempotent Reducers
A reducer should be pure — applying the same event twice must yield the same state. Here’s a simplified reducer pattern:
public MatchState Apply(MatchState state, IMatchEvent evt)
{
return evt switch
{
CardPlayed e => state.WithCardPlayed(e.PlayerId, e.CardId),
CardDrawn e => state.WithCardDrawn(e.PlayerId, e.CardId),
EffectResolved e => state.WithEffectApplied(e.TargetId, e.Effect),
_ => state
};
}
Idempotency is key for recovery and retries. If a grain replays persisted events, the end state will always be identical.
3.2.3 Hash-of-State for Sanity Checks
Each tick produces a state hash — a short digest of the serialized state. The server can include this hash in TickAck messages, allowing clients to verify that their local prediction matches the authoritative state.
public static string ComputeStateHash(MatchState state)
{
var json = JsonSerializer.Serialize(state);
var bytes = SHA256.HashData(Encoding.UTF8.GetBytes(json));
return Convert.ToHexString(bytes);
}
When clients detect mismatch, they trigger reconciliation (rolling back and replaying buffered inputs). This technique is a practical guardrail for silent divergence.
3.2.4 Open Source Building Blocks
For Poker-like games, hand evaluation is the most CPU-intensive step. Instead of reinventing evaluation algorithms, leverage existing C# libraries:
- SnapCall or HoldemPoker.Evaluator for O(1) hand ranking using precomputed lookup tables.
- Forge (Java-based) as inspiration for rule modeling in more complex TCGs.
Example using HoldemPoker.Evaluator:
var evaluator = new HandEvaluator();
var rank = evaluator.Evaluate(new[] { Card.Hearts(10), Card.Spades("A"), Card.Clubs("K") });
Console.WriteLine($"Hand rank: {rank}");
For MTG-style games, where the rules are deeply nested (stack, triggers, replacement effects), borrow concepts from Forge such as continuous effects layers and priority passing but express them via Orleans messages for concurrency safety.
3.2.5 Testing Determinism
To confirm deterministic behavior, build property-based tests comparing multiple runs:
[Fact]
public void Simulation_Should_ProduceSameHash_AfterReplay()
{
var events = SimulateGame(seed: 42);
var state1 = Replay(events, seed: 42);
var state2 = Replay(events, seed: 42);
Assert.Equal(ComputeStateHash(state1), ComputeStateHash(state2));
}
Tests like this catch hidden non-determinism early — often from mutable static fields or RNG leaks.
3.3 Orleans Grain Design
The Orleans actor model simplifies concurrency: every grain runs single-threaded, avoiding shared-state race conditions that break determinism. Each match is encapsulated in a MatchGrain.
3.3.1 Grain Lifecycle
A MatchGrain activates when the first player joins and deactivates when persisted and idle. Orleans automatically manages memory, serializing the grain state to the configured persistence provider.
public override async Task OnActivateAsync(CancellationToken ct)
{
_state = await _repository.LoadAsync(this.GetPrimaryKeyString()) ?? new MatchState();
_logger.LogInformation("Activated MatchGrain {MatchId}", this.GetPrimaryKeyString());
}
When a match ends, it persists its final snapshot and deactivates to free resources:
public async Task EndMatch()
{
await _repository.SaveAsync(_state);
DeactivateOnIdle();
}
3.3.2 Membership and Turn Management
Membership is stored in the grain state. A deterministic queue enforces turn order:
private readonly Queue<string> _turnOrder = new();
public void AddPlayer(string playerId)
{
if (!_state.Players.Contains(playerId))
_state.Players.Add(playerId);
_turnOrder.Enqueue(playerId);
}
public string CurrentPlayer => _turnOrder.Peek();
Turn transitions generate explicit events to preserve ordering across replays:
public async Task EndTurn(string playerId)
{
if (playerId != CurrentPlayer)
throw new InvalidOperationException("Not your turn");
_turnOrder.Enqueue(_turnOrder.Dequeue());
await EmitAsync(new TurnEnded(playerId));
}
3.3.3 Timers and Reminders
To handle inactivity, Orleans reminders trigger turn timeouts or match expiry:
public override async Task OnActivateAsync(CancellationToken ct)
{
await RegisterOrUpdateReminder("TurnTimeout", TimeSpan.FromSeconds(30), TimeSpan.FromSeconds(30));
}
public async Task ReceiveReminder(string name, TickStatus status)
{
if (name == "TurnTimeout")
{
if (_state.LastActionAt < DateTime.UtcNow.AddSeconds(-30))
await ForceEndTurn();
}
}
Reminders persist across silo restarts, ensuring fairness even if nodes crash.
3.3.4 Persistence Providers and Serialization
Use Orleans’ built-in persistence providers with custom serializers optimized for deterministic replay. MessagePack is a strong candidate due to its compact binary form and predictable ordering:
builder.AddMemoryGrainStorageAsDefault(options =>
{
options.JsonSerializerOptions.TypeInfoResolverChain.Insert(0, new DeterministicResolver());
});
Serialization should avoid non-deterministic types (e.g., Dictionary without ordering, DateTime.Now). Instead, use logical timestamps (tick, turn) derived from simulation.
4 Real-Time Networking: SignalR & Reconnects
Even with deterministic simulation, networking can undermine fairness if inputs arrive late or connections drop. The SignalR layer must guarantee ordered delivery, reconnection safety, and efficient fan-out. This section details the message contracts, reconnect logic, and mobile QoS strategies that make the real-time fabric resilient.
4.1 Wire Protocol & Message Contracts
Every client-server exchange follows a structured command envelope carrying metadata needed for replay, acknowledgment, and validation.
Example envelope:
{
"tick": 1024,
"playerId": "p1",
"authToken": "jwt-abc",
"reconnectToken": "r-xyz",
"command": {
"type": "PlayCard",
"cardId": "C17"
}
}
4.1.1 Sequence Numbers and ACKs
Each command has a sequence number (seq) incremented per client. The server responds with an acknowledgment message confirming commit and authoritative tick.
public record ClientCommand(int Seq, int Tick, string PlayerId, object Command);
public record Ack(int Seq, int Tick, string Hash);
SignalR hub example:
public async Task SubmitMove(ClientCommand cmd)
{
var grain = _orleans.GetGrain<IMatchGrain>(cmd.MatchId);
var hash = await grain.SubmitMove(cmd.PlayerId, cmd.Command);
await Clients.Caller.SendAsync("Ack", new Ack(cmd.Seq, cmd.Tick, hash));
}
Clients maintain a rolling buffer of unacknowledged commands. When an Ack arrives, they drop all confirmed entries, keeping only future predictions.
4.1.2 Server Timestamps
All messages include a server timestamp (ts) for latency measurement and conflict resolution. Clients can compute round-trip latency and adapt prediction windows dynamically.
4.2 Stateful Reconnect & Replay
SignalR 8 introduced stateful reconnects, allowing clients to resume interrupted sessions without losing in-flight messages. This feature is critical for mobile players switching networks or momentarily backgrounding the app.
4.2.1 Enabling Stateful Reconnects
On the server:
builder.Services.AddSignalR()
.AddHubOptions<MatchHub>(options => options.AllowStatefulReconnects = true);
On the client:
var connection = new HubConnectionBuilder()
.WithUrl("/realtime")
.WithAutomaticReconnect()
.AddMessagePackProtocol()
.Build();
When the connection drops, SignalR automatically retries using the last session token. Messages sent during brief disconnects are buffered client-side, then replayed once reconnected.
4.2.2 Buffered Message Replay
Clients maintain an outbound queue:
private readonly ConcurrentQueue<ClientCommand> _pending = new();
private async Task SendAsync(ClientCommand cmd)
{
_pending.Enqueue(cmd);
if (connection.State == HubConnectionState.Connected)
{
while (_pending.TryDequeue(out var next))
await connection.InvokeAsync("SubmitMove", next);
}
}
On reconnect, any commands with sequence numbers higher than the last acknowledged Ack are replayed automatically. This ensures the server receives a complete input sequence.
4.2.3 Handling Partial Disconnects
For fairness, the server must freeze the match if an active player disconnects during their turn. A DisconnectTimeout timer inside MatchGrain handles this gracefully:
public async Task PlayerDisconnected(string playerId)
{
_state.MarkDisconnected(playerId);
await RegisterOrUpdateReminder($"dc:{playerId}", TimeSpan.FromSeconds(60), TimeSpan.FromSeconds(60));
}
If the reminder triggers before reconnection, the server forces a forfeit or auto-pass.
4.3 Scale-Out Options
Scaling SignalR to 100,000+ clients requires either a Redis backplane or Azure SignalR Service. Each has different operational trade-offs.
4.3.1 Redis Backplane
Ideal for self-hosted clusters or private deployments. It relays messages across multiple hub servers through Redis pub/sub channels.
services.AddSignalR().AddStackExchangeRedis("redis:6379", options =>
{
options.Configuration.ChannelPrefix = "cardgame";
});
Pros: low latency, full control, simple debugging. Cons: message loss risk during Redis failover, manual scaling, limited throughput (~20k concurrent connections per cluster node).
4.3.2 Azure SignalR Service
Fully managed and horizontally elastic. Each “unit” handles thousands of WebSocket clients. It abstracts away connection fan-out and provides automatic failover.
Configuration:
{
"Azure": {
"SignalR": {
"ConnectionString": "...",
"ServiceMode": "Default"
}
}
}
Pros: built-in telemetry, automatic scaling, global regions. Cons: slightly higher cost and 10–20 ms additional latency per hop.
4.3.3 Avoiding Message Loss
Regardless of scale-out model, messages must be acknowledged. The client should request replays when a gap in sequence numbers is detected. The server maintains a short rolling buffer of recent broadcasts:
private readonly RingBuffer<GameEvent> _recent = new(64);
public async Task ReplayMissing(int fromSeq)
{
var missing = _recent.Where(e => e.Seq > fromSeq);
await Clients.Caller.SendAsync("Replay", missing);
}
This prevents desynchronization during transient outages or load spikes.
4.4 Mobile QoS
Mobile networks are inherently unstable — players switch from Wi-Fi to LTE, encounter high jitter, and occasionally lose connectivity. Designing for these conditions requires adaptive networking.
4.4.1 Adaptive Send Rates
SignalR doesn’t expose raw UDP-like control, but you can implement adaptive batching on the client. If round-trip latency exceeds a threshold, queue commands locally and send them in small bursts instead of per-frame.
if (_avgRtt > 200)
_sendBatchSize = 3;
else
_sendBatchSize = 1;
This reduces overhead on unreliable connections.
4.4.2 Payload Shaping
Use MessagePack with compression to minimize payload size. Avoid serializing redundant state or unchanged fields. For frequently sent commands, define compact DTOs:
[MessagePackObject]
public class PlayCardCommand
{
[Key(0)] public string CardId { get; init; }
[Key(1)] public string TargetId { get; init; }
}
A full card play command can be under 40 bytes — crucial for mobile bandwidth.
4.4.3 Backoff and Retry Policy
SignalR handles automatic reconnects, but you can customize backoff curves:
.WithAutomaticReconnect(new[] {
TimeSpan.FromSeconds(2),
TimeSpan.FromSeconds(5),
TimeSpan.FromSeconds(10)
})
This avoids connection thrashing in poor coverage areas.
4.4.4 Group Rejoin Semantics
When clients reconnect, they must rejoin their previous match group to resume broadcasts. Orleans stores player membership in grain state, and the hub restores groups automatically:
public override async Task OnConnectedAsync()
{
var matches = await _profile.GetActiveMatches(Context.UserIdentifier);
foreach (var m in matches)
await Groups.AddToGroupAsync(Context.ConnectionId, m.MatchId);
}
Players rejoin seamlessly without duplicate subscriptions or missing updates.
5 Authoritative Server + Client-Side Prediction
Even with deterministic lockstep, immediate feedback is critical for user experience. Client-side prediction bridges the gap between latency and responsiveness while maintaining server authority.
5.1 Hybrid Model
The hybrid model works like this:
- Player submits a move to the server.
- The client predicts the outcome locally — animating the card play immediately.
- The server validates and sends back an authoritative confirmation or correction.
This achieves low-latency visuals while preserving fairness. The server always has the final say, and any mismatches trigger reconciliation.
Example prediction flow:
await connection.InvokeAsync("SubmitMove", cmd);
_localSimulator.Apply(cmd); // optimistic render
When the authoritative event arrives:
connection.On<MoveCommitted>("MoveCommitted", evt =>
{
if (_localSimulator.StateHash != evt.StateHash)
_localSimulator.Reconcile(evt);
});
5.2 Reconciliation Workflow
5.2.1 Input
Buffering Clients maintain a circular buffer of inputs since the last confirmed tick:
var unconfirmed = _inputHistory.Where(i => i.Tick > lastAckTick);
When a correction arrives, the client rolls back to the last known good state, reapplies unconfirmed inputs, and continues forward.
5.2.2 Partial Rollforward for Turn-Based Systems
In card games, full rollback is rarely necessary because actions are discrete. Instead, clients can use a delta replay approach:
public void Reconcile(MatchSnapshot snapshot)
{
_state = snapshot.State;
foreach (var pending in _pendingInputs)
_state = _reducer.Apply(_state, pending.Command);
}
This lets the UI continue smoothly without visible rewinds. The player’s intent remains visible while the server ensures correctness.
5.2.3 Corrected State Hashes
Each authoritative event includes a StateHash. If the client’s computed hash diverges, it triggers reconciliation. This makes drift detection instant and computationally light.
5.3 UX Patterns
A good prediction system feels seamless even when the network isn’t. UI design plays a huge role in hiding latency and reconciling errors gracefully.
5.3.1 Ghost Animations
When the player plays a card, render a “ghost” animation immediately — the card moves visually to the board, but interactions remain disabled until confirmed. If the move is invalid, the ghost snaps back naturally.
AnimateCardPlay(cardId, predicted: true);
Once the server confirms:
AnimateCardCommit(cardId);
This technique avoids jarring jumps or pop-ins when reconciliation happens.
5.3.2 Latency-Tolerant Affordances
Use timers and progress rings to communicate network delays transparently. For example, show a subtle “waiting for opponent” overlay when the server hasn’t acknowledged within 200 ms. Avoid abrupt freezes.
5.3.3 Conflict Resolution UI
If the client’s prediction conflicts with the authoritative state (e.g., card can’t be played), provide contextual feedback:
- Fade out invalid animations.
- Display a short tooltip (“Not enough mana”).
- Re-enable the card smoothly.
Small UI touches preserve trust — players feel the system is responsive and fair even when corrections occur behind the scenes.
6 Event Sourcing, Snapshots & Replay
Event sourcing turns every game into a verifiable timeline of intent and outcome. Each match is a series of domain events stored immutably, from shuffles to draws and final resolution. The design must support replay, auditing, and evolution of schema without breaking older matches.
6.1 Event Modeling
6.1.1 Defining Domain Events
Domain events capture meaningful game transitions. Each event records only the intent and outcome, not derived values. Common examples include:
public record DeckShuffled(Guid MatchId, int Seed, DateTimeOffset Timestamp);
public record CardDrawn(Guid MatchId, string PlayerId, string CardId, DateTimeOffset Timestamp);
public record MoveProposed(Guid MatchId, string PlayerId, string MoveType, string CardId);
public record MoveCommitted(Guid MatchId, string PlayerId, string MoveType, string CardId, string Result);
public record MatchEnded(Guid MatchId, string WinnerId, string Reason);
These events are small, immutable, and serialized deterministically. Avoid storing entire deck lists or hand contents unless necessary — derive them from replay.
6.1.2 Versioning and Upcasters
As the game evolves, rules and data contracts change. To preserve replay compatibility, use event versioning with upcasters. Each version introduces a new schema while older events are transformed on load.
Example upcaster pattern:
public interface IEventUpcaster<T>
{
object Upcast(T oldEvent);
}
public class CardDrawnV1ToV2 : IEventUpcaster<CardDrawnV1>
{
public object Upcast(CardDrawnV1 oldEvent)
=> new CardDrawnV2(oldEvent.MatchId, oldEvent.PlayerId, oldEvent.CardId, "defaultZone");
}
The system reads all events through a version-aware deserializer that chains upcasters automatically.
6.1.3 Event Identity and Metadata
Each event includes:
- Id (GUID)
- Sequence number (monotonic per match)
- CorrelationId (for tracing moves)
- Timestamp (server time)
- Hash (state checksum after application)
public record GameEventMetadata(Guid EventId, int Sequence, string CorrelationId, DateTimeOffset Timestamp, string Hash);
Metadata helps verify the order and integrity of the stream.
6.2 Storage Options & Patterns
Choosing the right event store affects durability, performance, and query flexibility. The two most practical options in .NET ecosystems are EventStoreDB and Marten.
6.2.1 EventStoreDB
EventStoreDB uses append-only streams with optimistic concurrency. Each match maps to a single stream named match-{MatchId}.
Appending new events:
await _eventStore.AppendToStreamAsync(
$"match-{matchId}",
StreamState.Any,
events.Select(e => new EventData(
Uuid.NewUuid(),
e.GetType().Name,
JsonSerializer.SerializeToUtf8Bytes(e))));
Reading events for replay:
var result = _eventStore.ReadStreamAsync(Direction.Forwards, $"match-{matchId}", StreamPosition.Start);
await foreach (var resolved in result)
{
var evt = JsonSerializer.Deserialize(resolved.Event.Data.Span, Type.GetType(resolved.Event.EventType)!);
Apply(evt);
}
EventStoreDB supports projections and subscriptions for near-real-time dashboards or leaderboards.
6.2.2 Marten on PostgreSQL
Marten treats PostgreSQL as both an event store and document store. It’s ideal when you also need queryable read models in the same database.
Stream definition:
public record MatchAggregate(Guid Id)
{
public List<object> Events { get; } = new();
}
Appending:
await _session.Events.Append(matchId, new CardDrawn(matchId, playerId, cardId, DateTimeOffset.UtcNow));
await _session.SaveChangesAsync();
Marten automatically maintains per-stream version numbers and supports inline projections.
6.2.3 Snapshot Cadence
Replaying every event from the beginning can be expensive in long matches. Snapshotting saves the serialized state every N events (commonly 50–100).
public record MatchSnapshot(Guid MatchId, int Version, MatchState State);
if (_eventCountSinceLastSnapshot >= 50)
{
await _snapshotRepo.SaveAsync(new MatchSnapshot(matchId, currentVersion, _state));
}
Snapshots are just performance optimizations — they must be reproducible from the event stream alone.
6.2.4 Asynchronous Projections
For lobbies and leaderboards, use async projections that read from event streams and update denormalized views.
Example: projecting MatchEnded to leaderboard.
public async Task Project(MatchEnded evt)
{
await _redis.ZIncrByAsync("leaderboard", 1, evt.WinnerId);
}
These projections run in background workers and can be rebuilt anytime by replaying events.
6.3 Replay & Dispute Resolution
Deterministic re-simulation is how you prove fairness. Every match can be re-executed from its event log, ensuring outcomes are verifiable.
6.3.1 Deterministic Re-simulation
Re-simulation begins by seeding RNG with the stored match seed and replaying all events sequentially:
var rng = new DeterministicRng(matchId, seed);
var state = new MatchState();
foreach (var evt in eventStream)
{
state = reducer.Apply(state, evt);
}
The recomputed state hash is compared to the final event’s recorded hash. A mismatch indicates corruption or tampering.
6.3.2 Signed Transcripts
Each match produces a signed transcript — a compact JSON of events and hashes, signed using the server’s private key.
var transcriptJson = JsonSerializer.Serialize(eventStream);
var signature = Sign(transcriptJson, privateKey);
await File.WriteAllTextAsync("match-transcript.json", transcriptJson);
await File.WriteAllTextAsync("match-transcript.sig", signature);
Auditors can verify authenticity using the public key. This guarantees a match record cannot be falsified.
6.3.3 Per-Turn Hashes
During play, the grain emits a hash for each turn. These hashes can be published to an external ledger or simply stored for comparison:
public async Task CommitTurn(int turnNumber)
{
var hash = ComputeStateHash(_state);
await _eventStore.AppendAsync(matchId, new TurnCommitted(turnNumber, hash));
}
If any player disputes a result, the server recomputes all hashes — mismatches point directly to the divergent turn.
6.4 Data Governance
Strong data governance ensures compliance while maintaining analytical value.
6.4.1 PII Minimization
Game events should never contain personal data. Instead of usernames, store internal IDs. Player profiles and emails live in separate systems protected by stricter access policies.
{
"eventType": "MoveCommitted",
"playerId": "anon-23841",
"matchId": "m-19ff3"
}
6.4.2 Redactable Metadata
To support data-erasure requests, associate each player with redactable metadata. A scheduled job replaces identifying information in historical logs without altering semantics:
UPDATE events SET playerId = 'anon-' || gen_random_uuid() WHERE playerId = @deletedUserId;
6.4.3 Retention Policies
Define retention tiers:
- Hot: 30 days in primary store for replays and audits.
- Warm: 6 months compressed in cheaper storage.
- Cold: Archived JSON + signature files for legal retention.
Data governance ensures the event store remains lightweight and compliant without losing integrity.
7 Anti-Cheat & Fair-Play Architecture
Preventing cheating is as critical as gameplay itself. Every trust boundary — client inputs, server timing, deck construction — must be validated. Deterministic lockstep helps, but layered defenses are essential for production fairness.
7.1 Server-Side Validation
7.1.1 Move Legality
Each MatchGrain validates every move using authoritative rules before committing.
public async Task SubmitMove(string playerId, MoveCommand cmd)
{
if (!_rules.CanExecute(_state, playerId, cmd))
throw new InvalidOperationException("Illegal move");
var evt = _state.Apply(cmd);
await _eventStore.AppendAsync(matchId, evt);
}
The rules engine checks phase, turn ownership, and resource availability. Illegal or out-of-turn actions are rejected immediately.
7.1.2 Timing Windows and Priority Passes
For MTG-style games with complex timing, enforce priority windows. Only one player may act during each window, and passes must be explicit.
public void PassPriority(string playerId)
{
if (playerId != _state.ActivePriorityHolder)
throw new InvalidOperationException("Cannot pass priority now");
_state.AdvancePriority();
}
7.1.3 Flood and Automation Detection
Track input rate per player and drop excessive messages:
if (_inputRateLimiter.IsLimitExceeded(playerId))
await DisconnectAsync(playerId, "Flooding detected");
Repeated automation patterns trigger account review.
7.1.4 Deck Integrity Proofs
Before a match begins, players submit deck hashes verified by the server:
var hash = SHA256.HashData(Encoding.UTF8.GetBytes(JsonSerializer.Serialize(deck)));
if (hash != submittedHash)
throw new InvalidOperationException("Deck mismatch");
This prevents tampering after matchmaking.
7.2 Network-Level Defenses
Lockstep’s input synchronization also provides a strong security baseline.
7.2.1 Relay/Server-Authoritative Lockstep
Clients never communicate directly — all inputs go through the server relay. Each frame is time-bounded; if an input arrives after the cutoff, it’s ignored.
const int InputWindowMs = 150;
if (receivedAt - frame.Timestamp > TimeSpan.FromMilliseconds(InputWindowMs))
return; // discard late packet
This prevents timing-based attacks where a player delays input after seeing an opponent’s action.
7.2.2 Timing Manipulation Detection
Store per-player latency averages. Sudden anomalies signal intentional lag switching:
if (Math.Abs(currentRtt - avgRtt) > 200)
_suspicionScore[playerId] += 1;
If suspicion exceeds threshold, flag the match for review.
7.3 Statistical Anomaly Detection
Cheating often surfaces statistically before it’s visible in gameplay. Detecting improbable patterns helps identify bad actors early.
7.3.1 Suspicious Win Rates
Compare player win rates to their expected ELO outcome. Outliers beyond 4 standard deviations are suspect.
import numpy as np
z_score = (player_winrate - expected_winrate) / stddev
if abs(z_score) > 4:
flag_player(player_id)
7.3.2 Move Sequence Analysis
Analyze uncommon action sequences using frequency models. If a player repeats rare combos at impossible frequency, investigate.
7.3.3 ML.NET Spike Detection
ML.NET’s IidSpikeDetector identifies sudden performance changes in telemetry streams.
var model = mlContext.Transforms.DetectIidSpike(
outputColumnName: "Prediction",
inputColumnName: "WinRate",
confidence: 95,
pvalueHistoryLength: 20);
The output triggers alerts when patterns deviate sharply.
7.3.4 Analytical Storage
Store aggregated features in a separate warehouse (e.g., PostgreSQL or Kusto). Include match duration, draw variance, and move entropy. Statistical tools can query these aggregates to detect evolving cheat patterns.
7.4 Abuse Control Primitives
Beyond cheating, operational abuse — spam, multi-accounting, botting — must be rate-limited.
7.4.1 Redis Bloom and Cuckoo Filters
Redis modules support probabilistic filters for quick lookups. Example: prevent duplicate registrations.
BF.ADD known_devices device:abc123
BF.EXISTS known_devices device:abc123
7.4.2 Session and Device Reputation
Assign scores to sessions based on disconnect frequency, input anomalies, or flagged IPs:
_redis.HashIncrement($"device:{deviceId}", "suspicionScore");
Threshold breaches trigger captcha or cooldown.
7.4.3 Rate Limiting
Rate limit per IP and per user:
if (!await _rateLimiter.TryConsumeAsync(playerId))
throw new TooManyRequestsException();
This defends against denial-of-service and automation floods.
7.5 Auditability
Every verdict — ban, rollback, or dispute resolution — must be reproducible.
Combine the event stream and signed snapshots to recreate exact match state:
var replay = await _replayer.Run(matchId);
Assert.Equal(replay.FinalHash, officialSnapshot.Hash);
Auditors verify signatures before acting. This transparent chain of evidence builds player trust and simplifies appeals.
8 Matchmaking, Scaling & Operations
Once gameplay and fairness are solid, the next challenge is scaling matchmaking and operations to production volumes.
8.1 ELO/MMR with Redis Sorted Sets
Redis sorted sets are a simple, fast foundation for matchmaking queues.
8.1.1 Rating-Based Search
Each player’s rating is stored with score value:
ZADD matchmaking:global 1500 player:123
To find suitable opponents within ±50 rating points:
ZRANGEBYSCORE matchmaking:global 1450 1550 LIMIT 0 1
If no match found, widen the range gradually (band widening).
for (var delta = 50; delta <= 300; delta += 50)
{
var candidates = await _redis.ZRangeByScoreAsync("matchmaking:global", rating - delta, rating + delta, 0, 1);
if (candidates.Any()) return candidates.First();
}
8.1.2 Cross-Region Latency
Store region tags as secondary keys. Filter candidates within acceptable ping bounds before confirming pairing.
8.2 Throughput & Capacity Planning
Scaling to 100k matches requires predictable resource budgeting.
8.2.1 Orleans Cluster Sizing
Heuristics:
- 1 grain ≈ 50 KB RAM
- 16-core VM handles ~25k active matches
- Add silos linearly for capacity
Monitor grain_activation_count and CPU utilization. When >70% sustained, add nodes.
8.2.2 SignalR Fan-Out
Azure SignalR Service units scale horizontally. Each unit ≈ 1,000–2,000 concurrent connections. For 100k connections, provision ~50 units across regions.
Backpressure protection limits message fan-out rates; large broadcasts should use group sends instead of all-clients broadcasts.
8.3 Testing at Scale
Testing validates not only throughput but correctness under failure.
8.3.1 Synthetic Load
Use k6 or Artillery to stress SignalR hubs:
k6 run signalr-loadtest.js --vus 1000 --duration 5m
Simulate realistic latency and disconnection patterns.
8.3.2 Chaos Testing
Inject failures into silos and Redis nodes. Confirm automatic grain reactivation and reconnect. Example chaos scenario:
- Kill one Redis shard.
- Observe reconnect latency.
- Verify no message loss through replay buffer.
8.3.3 Replay Load
Use recorded event streams to simulate historical load. Replay matches concurrently to benchmark persistence throughput.
8.4 Observability & SLOs
Production success depends on measurable reliability.
8.4.1 Instrumentation
OpenTelemetry auto-instrumentation tracks:
- SignalR message latency
- Redis operations
- Orleans grain activations
Add manual spans for game logic:
using var span = _tracer.StartActiveSpan("Match.SubmitMove");
await _rules.ValidateAsync(cmd);
span.SetAttribute("player.id", playerId);
8.4.2 RED/USE Dashboards
Follow RED (Rate, Errors, Duration) for services and USE (Utilization, Saturation, Errors) for infrastructure. Key charts:
signalr_messages_per_secmatch_latency_msgrain_activationsredis_ops_latency
Alerts trigger when p95 latency > 200 ms or activation churn spikes unexpectedly.
8.5 Cost & Runbooks
8.5.1 Unit Economics
Estimate costs per 10k concurrent matches:
- Azure SignalR: 10 units ≈ $300/month
- Redis Premium: 3-node cluster ≈ $500/month
- PostgreSQL/Marten: $250/month
- EventStoreDB cluster: $400/month
Overall, $0.01–$0.02 per match-hour depending on retention and throughput.
8.5.2 Hot/Cold Retention
Keep 30 days of events in the primary database; older data is offloaded to Azure Blob cold storage with signatures. Snapshots remain available for disputes.
8.5.3 Runbooks
Operational runbooks include:
- Player stuck in match → deactivate grain and replay state.
- Redis latency spike → reroute to replica, flush backplane.
- SignalR reconnect storms → enable backoff policies, scale out units.
- High grain churn → check matchmaker fairness window.
Consistent operational patterns ensure stability even at massive scale.