Skip to content
Modern C# Collections in Practice: List, Dictionary, Immutable Collections, and Custom Structs

Modern C# Collections in Practice: List, Dictionary, Immutable Collections, and Custom Structs

1 Why Collection Choice Is an Architectural Decision

Modern .NET systems don’t fail because a for loop was slow. They fail because a hot path allocates too much, because a dictionary lookup creates transient strings, because a cache grows without bounds, or because trimming removed something reflection depended on.

Collection choice directly affects:

  • GC pressure and allocation rate
  • CPU cache utilization
  • Contention and scalability
  • Serialization and trimming behavior
  • Native AOT compatibility

In a high-throughput API gateway handling 50k+ requests per second, every request performs multiple lookups. If those lookups allocate strings, trigger dictionary resizes, contend on locks, or scan linear lists, you pay that cost on every request. Under load, this translates into higher Gen0 frequency, increased tail latency, and reduced throughput.

This article looks at modern C# collections the way an architect should: not as APIs, but as runtime behaviors. We move from fundamentals to concrete .NET 9 and C# 13 capabilities, keeping everything grounded in production scenarios.

1.1 The Performance vs. Readability Trade-off

Most teams default to expressive LINQ. And in many places, that’s perfectly fine:

var emails = users
    .Where(u => u.IsActive)
    .Select(u => u.Email)
    .ToList();

This is readable and idiomatic. But it allocates iterator state machines, allocates a new list, and may perform multiple passes. In a hot path, that matters.

Modern .NET gives us a middle ground. If you already have a List<T>, you can use CollectionsMarshal.AsSpan() to operate directly on the internal array (covered in Section 2). The rule is simple: optimize hot paths deliberately using the right primitive for the workload, and keep LINQ where readability matters more than allocation.

1.2 The Modern .NET Stack: Memory Efficiency, Cache Locality, and AOT Readiness

A modern .NET system optimizes not just for algorithmic complexity but for runtime behavior.

Memory efficiency means minimizing temporary arrays, hidden iterator allocations, boxing, and LOH-triggering buffers. Prefer Span<T>, ArrayPool<T>, contiguous storage, and frozen or immutable snapshots for shared state. The goal is stabilizing latency under load.

Cache locality matters because contiguous data structures outperform pointer-heavy ones in tight loops. An array-backed List<T> enables prefetching and reduces pointer chasing, while tree-based structures increase cache misses. When designing large in-memory indexes, layout matters as much as complexity.

AOT readiness introduces constraints many systems ignore until deployment. Consider this common pattern:

public static T Deserialize<T>(byte[] data) =>
    System.Text.Json.JsonSerializer.Deserialize<T>(data)!;

If T has properties accessed via reflection and trimming removes unused members, deserialization fails in AOT builds. Collections amplify this risk when they store types requiring reflection. You may need to annotate:

using System.Diagnostics.CodeAnalysis;

public static T Deserialize<
    [DynamicallyAccessedMembers(DynamicallyAccessedMemberTypes.PublicProperties)]
    T>(byte[] data)
{
    return System.Text.Json.JsonSerializer.Deserialize<T>(data)!;
}

Frozen and immutable collections are structurally simple and don’t depend on runtime code generation, making them safer in AOT scenarios. But the types they contain still matter. Test with trimming and AOT early, not as a final step.

1.3 What’s New in .NET 9 and C# 13

.NET 9 continues the shift toward allocation-aware, AOT-friendly, and span-first APIs. Here is a summary of the key additions—each explored in depth in later sections:

  • OrderedDictionary<TKey, TValue> — generic ordered dictionary with deterministic insertion order, O(1) key lookup, and index-based access (Section 3).
  • Frozen collectionsFrozenDictionary and FrozenSet reorganize data at creation time for faster lookup and tighter memory layout (Section 3).
  • Alternate lookups — search a Dictionary<string, TValue> using ReadOnlySpan<char> without allocating a string (Section 3).
  • allows ref struct (C# 13) — generic abstractions can now safely work with stack-only types like Span<T> (Section 6).
  • HybridCache — combines in-memory and distributed caching with stampede protection (Section 7).

2 Linear Data Structures: Optimization Beyond List<T>

List<T> is still the default choice for linear storage in C#. And in many systems, it’s the right one. But once you’re operating in high-throughput services or memory-sensitive environments, you need to understand exactly how it behaves—resizing behavior, Large Object Heap thresholds, zero-copy access via CollectionsMarshal, span-based scanning with SearchValues<T>, pooling strategies, and realistic priority queue patterns.

2.1 Internal Mechanics: Resize Costs and LOH Awareness

At its core, List<T> is a dynamically resized array. When capacity is exceeded, a new array (usually 2x) is allocated, existing elements are copied, and the old array becomes eligible for GC. In hot paths, this creates allocation spikes and increased GC activity.

The Large Object Heap threshold is ~85,000 bytes. On x64, a reference is 8 bytes, so List<Order> crosses the LOH at roughly 10,600 elements. LOH objects aren’t compacted by default, increasing fragmentation risk.

If you expect a known size, allocate up front:

var list = new List<Order>(15000);

foreach (var order in source)
{
    list.Add(order);
}

Or call list.EnsureCapacity(expectedCount) before filling. For architects, this isn’t micro-optimization—it’s controlling memory topology.

2.2 Zero-Copy Access with CollectionsMarshal

If you have a List<T> and need fast iteration without enumerator overhead, CollectionsMarshal.AsSpan gives you a span over its internal array without allocating or copying:

using System.Runtime.InteropServices;

var span = CollectionsMarshal.AsSpan(users);
for (int i = 0; i < span.Length; i++)
{
    ref var user = ref span[i];
    if (user.IsActive)
        result.Add(user.Email);
}

This avoids enumerator allocations, avoids bounds checks per iteration (after JIT optimizations), and preserves contiguous memory access. Important constraint: do not add or remove elements while holding the span.

Another modern pattern is CollectionsMarshal.GetValueRefOrAddDefault for dictionaries, which returns a ref directly into the dictionary’s value slot, avoiding double lookup:

ref int value = ref CollectionsMarshal.GetValueRefOrAddDefault(
    dict, "key", out bool exists);

if (!exists) value = 1;
else value++;

This is significantly faster in tight update loops such as counters or aggregations.

2.3 Span<T> and ReadOnlySpan<T> for Zero-Allocation Slicing

Most string-processing bottlenecks are allocation problems, not CPU problems. Naive line.Split(',') allocates a string array and a string per segment.

The modern alternative uses span slicing:

ReadOnlySpan<char> span = line.AsSpan();

while (!span.IsEmpty)
{
    int index = span.IndexOf(',');
    ReadOnlySpan<char> segment;

    if (index < 0)
    {
        segment = span;
        span = default;
    }
    else
    {
        segment = span[..index];
        span = span[(index + 1)..];
    }

    Process(segment);
}

No new strings. No array. No GC pressure.

Span-based design is foundational for:

  • Protocol parsers
  • HTTP header processing
  • Log ingestion
  • Real-time data pipelines

This becomes especially powerful when combined with alternate dictionary lookups (Section 3), where ReadOnlySpan<char> can be used directly as a key.

2.4 SearchValues<T> for High-Throughput Buffer Scanning

SearchValues<T> precomputes a search set and uses SIMD instructions where available:

private static readonly SearchValues<char> Delimiters =
    SearchValues.Create(",;\t ");

Then scan efficiently:

ReadOnlySpan<char> input = line.AsSpan();

while (!input.IsEmpty)
{
    int idx = input.IndexOfAny(Delimiters);

    if (idx < 0)
    {
        Process(input);
        break;
    }

    Process(input[..idx]);
    input = input[(idx + 1)..];
}

Compared to chained IndexOf calls or manual comparisons, this is both cleaner and faster. Ideal for tokenizers, log scanners, command parsers, and CSV/TSV processors.

When scanning buffers repeatedly with a fixed search set, precompute SearchValues<T> once and reuse it.

2.5 Array Pooling with ArrayPool<T>

In a high-throughput server, allocating new byte[8192] on every I/O operation creates thousands of arrays per second. ArrayPool<T> reduces this churn:

var pool = ArrayPool<byte>.Shared;
byte[] buffer = pool.Rent(8192);

try
{
    int read = stream.Read(buffer, 0, buffer.Length);
    Process(buffer.AsSpan(0, read));
}
finally
{
    pool.Return(buffer, clearArray: true);
}

Key rules:

  • Always return rented arrays
  • Clear if holding sensitive data
  • Avoid pooling for long-lived storage

Pooling is most effective for temporary serialization buffers, compression buffers, and network read/write loops. Combined with Span<T>, you get safe slicing over rented arrays without copying.

2.6 PriorityQueue<TElement, TPriority>: Practical Update Patterns

PriorityQueue<TElement, TPriority> is a binary heap. A common misconception is that you must rebuild the heap to support updates.

In .NET 9, PriorityQueue includes a Remove method that allows removal of a specific element (O(n) worst-case scan but no full rebuild), making occasional updates reasonable.

For high-frequency re-weighting (e.g., leaderboards), rebuilding on every update is not acceptable. A practical pattern is lazy invalidation:

var pq = new PriorityQueue<string, int>();
var scores = new Dictionary<string, int>();

void Update(string user, int score)
{
    scores[user] = score;
    pq.Enqueue(user, score);
}

string? PopMax()
{
    while (pq.Count > 0)
    {
        var user = pq.Dequeue();
        if (scores.TryGetValue(user, out var current) &&
            current == scores[user])
            return user;
    }
    return null;
}

The heap may contain stale entries temporarily, but removal cost is amortized.

Architectural takeaway: use built-in PriorityQueue for most scenarios, use Remove sparingly for infrequent updates, and use lazy invalidation for high-frequency re-weighting. Rebuilding the entire heap on every update is almost never the right solution.

2.7 Fixed-Size Stack Collections with [InlineArray]

C# 12 introduced [InlineArray], enabling fixed-size inline storage with type safety:

using System.Runtime.CompilerServices;

[InlineArray(8)]
public struct SmallBuffer
{
    private int _element0;
}

SmallBuffer buffer = default;
for (int i = 0; i < buffer.Length; i++)
    buffer[i] = i * 2;

No heap allocation, no manual stackalloc per call, type-safe fixed-size storage, and better composability inside other structs. Ideal for small fixed-capacity collections, parsing state machines, and value-type aggregators.


3 Dictionary and Hashing: High-Performance Key-Value Architectures

Key-value collections sit at the center of most .NET systems. Routing tables, caches, session stores, feature flags, in-memory indexes—they all depend on hashing.

At small scale, Dictionary<TKey, TValue> feels trivial. At large scale, hashing strategy, comparer choice, memory layout, and trimming behavior directly affect latency, throughput, and even security. This section clarifies the differences between modern dictionary variants, shows how alternate lookups work, and addresses collision resilience in public-facing systems.

3.1 Choosing Between Dictionary Variants

Three types often get conflated, but they solve different problems:

Dictionary<TKey, TValue> — Average O(1) lookup, hash-based, optimized for lookup and update. Preserves insertion order in modern .NET as an implementation detail, not a guarantee. This is your default when order doesn’t matter.

OrderedDictionary<TKey, TValue> — Preserves insertion order by contract and allows index-based access alongside key-based lookup:

var dict = new OrderedDictionary<string, int>();
dict.Add("A", 1);
dict.Add("B", 2);

KeyValuePair<string, int> first = dict[0];   // by index
int value = dict["A"];                         // by key

Use it when order is part of the contract—not just incidental. A practical example: deterministic middleware execution order.

var pipeline = new OrderedDictionary<string, Func<Task>>();

pipeline.Add("Authentication", async () => { /* ... */ });
pipeline.Add("Authorization", async () => { /* ... */ });
pipeline.Add("Logging", async () => { /* ... */ });

// Execute in guaranteed insertion order
foreach (var (name, step) in pipeline)
    await step();

// Also accessible by key
if (pipeline.TryGetValue("Logging", out var logStep))
    await logStep();

Compared to combining Dictionary with a separate List<TKey>, this is simpler and less error-prone.

SortedDictionary<TKey, TValue> — Tree-based, O(log n) lookup, ordered by key using a comparer. No index-based access. Use when entries must be sorted by key.

Long-lived dictionaries: If a dictionary grows and later shrinks (e.g., a session store after burst traffic), capacity remains high unless explicitly trimmed with TrimExcess().

3.2 Alternate Lookups: Span-Based Dictionary Access

Traditionally, looking up a Dictionary<string, TValue> with a parsed ReadOnlySpan<char> required span.ToString(), creating a new string per lookup. In .NET 9, alternate lookups eliminate this:

var dict = new Dictionary<string, int>(StringComparer.Ordinal)
{
    ["content-type"] = 1
};

var lookup = dict.GetAlternateLookup<ReadOnlySpan<char>>();

ReadOnlySpan<char> header = "content-type".AsSpan();
if (lookup.TryGetValue(header, out var value))
{
    // No string allocation occurred
}

The dictionary must use a compatible comparer (e.g., StringComparer.Ordinal). The hash code is computed directly from the span. This is critical in HTTP servers, protocol parsers, and high-throughput routing layers.

3.3 Custom IAlternateEqualityComparer

The built-in StringComparer implements IAlternateEqualityComparer<ReadOnlySpan<char>, string> internally. For custom key types or full control, implement it yourself:

public sealed class OrdinalSpanComparer :
    IAlternateEqualityComparer<ReadOnlySpan<char>, string>,
    IEqualityComparer<string>
{
    public bool Equals(string? x, string? y) =>
        StringComparer.Ordinal.Equals(x, y);

    public int GetHashCode(string obj) =>
        StringComparer.Ordinal.GetHashCode(obj);

    public bool Equals(ReadOnlySpan<char> alternate, string other) =>
        alternate.Equals(other.AsSpan(), StringComparison.Ordinal);

    public int GetHashCode(ReadOnlySpan<char> alternate) =>
        string.GetHashCode(alternate, StringComparison.Ordinal);
}

This bridges two representations—storage as string, lookup as ReadOnlySpan<char>—with no intermediate allocations. The parsing layer remains span-based while the storage layer remains string-based, and the boundary is allocation-free.

3.4 FrozenDictionary: Build Once, Read Forever

FrozenDictionary<TKey, TValue> is not just an immutable Dictionary. It performs analysis at creation time and selects internal strategies based on key characteristics—length-based partitioning, ordinal specialization, optimized bucket layout for dense key sets.

Creation cost is higher, but lookup is typically faster and more predictable for read-mostly workloads:

using System.Collections.Frozen;

public static class MimeTypes
{
    public static readonly FrozenDictionary<string, string> ByExtension =
        new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase)
        {
            [".json"] = "application/json",
            [".xml"]  = "application/xml",
            [".html"] = "text/html"
        }
        .ToFrozenDictionary(StringComparer.OrdinalIgnoreCase);

    public static string Resolve(string ext) =>
        ByExtension.TryGetValue(ext, out var mime)
            ? mime
            : "application/octet-stream";
}

When to use: dataset is stable after initialization, lookup dominates writes, startup cost is acceptable.

When not to: keys change frequently, creation cost is significant relative to workload, or iteration dominates lookup.

The key distinction across the three dictionary families: Dictionary is optimized for mutation. ImmutableDictionary is optimized for concurrency. FrozenDictionary is optimized for lookup stability. Pick based on lifecycle.

3.5 Hash Collisions and HashDoS

Hash-based collections are susceptible to collision attacks. If an attacker sends many keys hashing to the same bucket, lookup degrades from O(1) to O(n).

.NET mitigates this with randomized string hashing (per process, not per dictionary) and improved bucket distribution. Best practices for public APIs:

  1. Use StringComparer.Ordinal for protocol-level keys.
  2. Avoid culture-sensitive comparers for externally controlled inputs.
  3. Apply input limits (max header count, max key length).
  4. Consider frozen or immutable maps for static routing.

4 The Immutability Spectrum: Thread Safety Without Locks

Immutability is not a stylistic preference. It’s a concurrency strategy.

Instead of protecting shared mutable state with locks, you replace the entire state with a new snapshot. Readers continue using the old snapshot safely. Writers build a new one and publish it atomically. No lock contention. No partially mutated objects.

This pattern works best when reads dominate writes, state transitions are discrete, and consistency matters more than micro-optimizing mutation. In modern .NET services, immutable collections sit at key architectural boundaries: routing tables, configuration snapshots, feature flags, authorization policies, and lookup maps. The write path is rare and controlled. The read path is hot and must be stable.

The important part isn’t just choosing an immutable type. It’s choosing the right immutable type and publishing it correctly.

4.1 Choosing the Right Immutable Collection

The first decision is about layout and lookup characteristics:

  • ImmutableArray<T> — contiguous memory, fast indexing, great for tight iteration. Ideal when you rebuild everything at once and then read frequently.
  • ImmutableList<T> — tree-based with structural sharing. Better for incremental updates where you apply many small changes and want to avoid copying a large array each time. Index access is not O(1), and iteration is slightly more expensive than ImmutableArray<T>.
  • ImmutableHashSet<T> — hash-based membership checks, O(1) Contains.

These are not interchangeable. Choosing the wrong immutable collection can silently cost you performance. Using ImmutableList<T>.Contains() for session membership would be O(n)—use ImmutableHashSet<T> instead:

public sealed class ActiveSessions
{
    private ImmutableHashSet<string> _sessions =
        ImmutableHashSet<string>.Empty;

    public void Add(string sessionId) =>
        _sessions = _sessions.Add(sessionId);

    public bool Contains(string sessionId) =>
        _sessions.Contains(sessionId);
}

4.2 Safe Publication with ImmutableArray<T>

One subtle but critical detail: ImmutableArray<T> is a struct (16 bytes). On some architectures, assignment is not guaranteed to be atomic. Publish it safely with Volatile:

public sealed class EndpointRegistry
{
    private ImmutableArray<Endpoint> _snapshot =
        ImmutableArray<Endpoint>.Empty;

    public void Reload(IEnumerable<Endpoint> endpoints)
    {
        var updated = endpoints.ToImmutableArray();
        Volatile.Write(ref _snapshot, updated);
    }

    public Endpoint? Find(string route)
    {
        var local = Volatile.Read(ref _snapshot);
        for (int i = 0; i < local.Length; i++)
        {
            if (local[i].Route == route)
                return local[i];
        }
        return null;
    }
}

That small Volatile call is the difference between “works most of the time” and “architecturally correct.”

4.3 The Builder Pattern for Large Immutable Dictionaries

A common performance trap: building large immutable dictionaries by repeated .Add() calls, which performs repeated structural updates approaching O(n log n). Use a builder:

var builder = ImmutableDictionary.CreateBuilder<string, int>();

foreach (var item in source)
{
    builder[item.Key] = item.Value;
}

var dict = builder.ToImmutable();

Builders are mutable during construction but produce a fully immutable result.

4.4 The “Create-Once, Read-Millions” Pattern

The snapshot pattern is simple: build a new structure, atomically replace the reference, and readers use the current snapshot without locks.

public sealed class RoutingTable
{
    private FrozenDictionary<string, string> _routes =
        new Dictionary<string, string>()
            .ToFrozenDictionary(StringComparer.Ordinal);

    public void Update(IDictionary<string, string> newRoutes)
    {
        var frozen = newRoutes
            .ToFrozenDictionary(StringComparer.Ordinal);
        Interlocked.Exchange(ref _routes, frozen);
    }

    public string? Resolve(string path)
    {
        var snapshot = Volatile.Read(ref _routes);
        return snapshot.TryGetValue(path, out var target)
            ? target : null;
    }
}

If you’re in ASP.NET Core, this pattern already exists via IOptionsMonitor<T>:

  • Builds a new options snapshot on reload
  • Publishes it safely
  • Allows consumers to access current values without synchronization

Before implementing your own snapshot system for configuration, ask: can this be modeled as options? Use custom snapshots when the data is domain-specific, not configuration-bound, or you need explicit lifecycle control.

4.5 Functional State Management

Immutability pairs naturally with functional state transitions. Instead of mutating a shared dictionary, represent state as a value object and transform it:

public sealed record FeatureState(
    ImmutableDictionary<string, bool> Flags);

public static class FeatureEngine
{
    public static FeatureState Enable(FeatureState current, string feature) =>
        current with { Flags = current.Flags.SetItem(feature, true) };
}

Publish changes atomically with compare-and-swap:

private FeatureState _state = new(ImmutableDictionary<string, bool>.Empty);

public void Apply(Func<FeatureState, FeatureState> mutation)
{
    FeatureState original, updated;
    do
    {
        original = Volatile.Read(ref _state);
        updated = mutation(original);
    }
    while (Interlocked.CompareExchange(ref _state, updated, original) != original);
}

This avoids locks, guarantees consistency, and makes transitions testable. The trade-off is allocation per update—acceptable in configuration-heavy or policy-driven systems, but not for high-frequency mutation.


5 Thread-Safety and Concurrency: Building Resilient Shared State

Concurrency in .NET has three main strategies: eliminate shared mutation (immutability), use concurrent collections with fine-grained locking, or use explicit synchronization when invariants span multiple operations.

Choosing the right tool depends on read/write ratios, contention level, and the cost of coordination. This section focuses on the practical trade-offs architects care about: allocation patterns, contention behavior, and correctness under load.

5.1 ConcurrentDictionary<TKey, TValue>

ConcurrentDictionary partitions internal storage and uses fine-grained locking. It scales well under mixed read/write workloads.

A common misunderstanding: GetOrAdd does not guarantee the factory runs once. Under contention, the factory can execute multiple times—but only one value is stored.

That means:

  • Factories must be idempotent
  • Avoid side effects inside the delegate
  • Avoid long blocking operations inside the factory

Closure-free hot path optimization: Many developers overlook the overload that avoids closure allocation:

string GetValue(string key, IService service)
{
    return cache.GetOrAdd(
        key,
        static (k, svc) => svc.Lookup(k),
        service);
}

The static lambda captures nothing. No closure allocation in hot paths.

Ensuring single execution with Lazy<T>: If construction is expensive and must run once:

var cache = new ConcurrentDictionary<string, Lazy<string>>();

string Get(string key)
{
    var lazy = cache.GetOrAdd(
        key,
        static k => new Lazy<string>(() => ExpensiveLookup(k)));
    return lazy.Value;
}

This ensures at most one expensive execution per key with thread-safe lazy initialization.

When ReaderWriterLockSlim is better: If reads heavily dominate, writes are rare, and operations span multiple steps, a ReaderWriterLockSlim around a regular Dictionary can outperform ConcurrentDictionary:

private readonly ReaderWriterLockSlim _lock = new();
private readonly Dictionary<string, Node> _nodes = new();

public Node? Get(string id)
{
    _lock.EnterReadLock();
    try
    {
        return _nodes.TryGetValue(id, out var node) ? node : null;
    }
    finally
    {
        _lock.ExitReadLock();
    }
}

Concurrent collections are not always faster. Measure under your actual contention model.

5.2 Non-Blocking Alternatives: Channels and Concurrent Queues

Sometimes the right answer is not a shared dictionary at all.

Channels are often superior to ConcurrentQueue<T> when you need async consumers, bounded capacity, and backpressure:

var channel = Channel.CreateBounded<string>(100);

async Task Producer() =>
    await channel.Writer.WriteAsync("event");

async Task Consumer()
{
    await foreach (var item in channel.Reader.ReadAllAsync())
        Process(item);
}

ConcurrentQueue<T> is FIFO and appropriate when order matters. ConcurrentBag<T> uses thread-local storage and is optimized for scenarios where each thread mostly consumes what it produces—useful for task-local object pools.

BlockingCollection<T> wraps an IProducerConsumerCollection<T> and adds blocking semantics. It’s useful for classic producer-consumer models when async is not required:

var collection = new BlockingCollection<string>(boundedCapacity: 100);

Task.Run(() =>
{
    foreach (var item in collection.GetConsumingEnumerable())
        Process(item);
});

collection.Add("work item");

Today, Channel<T> is often preferred for new async code. But BlockingCollection<T> remains valid in synchronous or legacy environments.

5.3 Thread-Local Collections and Lock-Free Counters

When aggregation is independent per thread, sharing is unnecessary. ThreadLocal<T> lets each thread maintain its own collection:

using var threadLocalCounts =
    new ThreadLocal<Dictionary<string, int>>(
        () => new Dictionary<string, int>(),
        trackAllValues: true);

Parallel.ForEach(data, item =>
{
    var local = threadLocalCounts.Value!;
    if (!local.TryAdd(item, 1))
        local[item]++;
});

// Merge after parallel work completes
var merged = new Dictionary<string, int>();
foreach (var local in threadLocalCounts.Values)
    foreach (var kv in local)
        merged[kv.Key] = merged.TryGetValue(kv.Key, out var v)
            ? v + kv.Value : kv.Value;

Important: trackAllValues: true is required to access .Values, and always dispose ThreadLocal<T>. In long-lived services, avoid retaining per-thread dictionaries indefinitely.

For request-scoped aggregation in ASP.NET Core, often a local dictionary inside the request scope is sufficient. ThreadLocal<T> is most useful in batch and parallel workloads.

For simple counters, Interlocked is the fastest option—no locks, no allocations:

private long _total;
public void Increment() => Interlocked.Increment(ref _total);
public long Read() => Volatile.Read(ref _total);

6 Memory-Oriented Design: Custom Struct-Based Collections

At high scale, performance bottlenecks are often about memory, not CPU. Allocation rate, cache-line alignment, and layout density start to dominate. Struct-based design and low-level memory APIs can remove GC pressure and improve cache locality—but they require careful use.

6.1 ref struct and allows ref struct (C# 13)

ref struct types (like Span<T> and ReadOnlySpan<T>) are restricted to the stack—they cannot be boxed, stored in reference type fields, or captured by lambdas.

Before C# 13, you couldn’t write generic abstractions that accepted ref struct types. Now you can:

public interface ITokenParser<TToken>
    where TToken : allows ref struct
{
    bool TryParse(ReadOnlySpan<char> input, out TToken token);
}

public readonly ref struct WordToken
{
    public readonly ReadOnlySpan<char> Value;
    public WordToken(ReadOnlySpan<char> value) => Value = value;
}

public sealed class WordParser : ITokenParser<WordToken>
{
    public bool TryParse(ReadOnlySpan<char> input, out WordToken token)
    {
        int idx = input.IndexOf(' ');
        if (idx <= 0) { token = default; return false; }
        token = new WordToken(input.Slice(0, idx));
        return true;
    }
}

This enables generic parsing abstractions with zero-allocation token representation and compile-time enforcement of stack semantics.

Important limitations for architects:

  • ref struct values cannot be stored in fields of reference types.
  • You cannot use them in unconstrained generic contexts.
  • You cannot assign them to object or interface variables without matching constraints.

The compiler enforces these rules—that’s the safety guarantee. This pattern is powerful for high-performance parsers, streaming protocol readers, and zero-allocation pipelines. Generic APIs no longer force heap allocation.

6.2 MemoryMarshal: Reinterpreting Memory Without Copying

MemoryMarshal allows you to reinterpret memory safely. Cast a Span<byte> from a network buffer to int values without copying:

Span<byte> bytes = stackalloc byte[16];
Span<int> ints = MemoryMarshal.Cast<byte, int>(bytes);

Or expose raw bytes of a struct for serialization or checksum calculation:

Span<Header> headerSpan = stackalloc Header[1];
headerSpan[0] = header;
Span<byte> rawBytes = MemoryMarshal.AsBytes(headerSpan);

You can also create spans over arbitrary memory:

int value = 123;
Span<int> span = MemoryMarshal.CreateSpan(ref value, 1);

This exposes stack or heap data as a span without allocation. These APIs are safe compared to raw pointers but require understanding layout and alignment. Constraints: source span length must be divisible by the size of the target type, and endianness must be handled explicitly when reading network data.

6.3 Stack-Only Collections with Safe Fallback

stackalloc is fast and allocation-free but bounded. A naive implementation throws if bounds are exceeded. The production-safe pattern is stack-first, pool-fallback:

Span<int> stackBuffer = stackalloc int[16];
int[]? pooled = null;
Span<int> buffer = stackBuffer;
int count = 0;

foreach (var c in input)
{
    if (!char.IsDigit(c)) continue;

    if (count >= buffer.Length)
    {
        pooled ??= ArrayPool<int>.Shared.Rent(64);
        buffer = pooled;
    }

    buffer[count++] = c - '0';
}

// Process buffer[..count]

if (pooled is not null)
    ArrayPool<int>.Shared.Return(pooled);

Fast path in the common case, safe scaling when bounds are exceeded.

6.4 Struct Layout and Alignment

Memory layout directly affects cache efficiency. Consider:

public struct Record
{
    public int Id;      // 4 bytes
    public short Code;  // 2 bytes
    public byte Flags;  // 1 byte
}

The runtime may insert padding for alignment, increasing size but ensuring aligned access. You can force tight packing:

using System.Runtime.InteropServices;

[StructLayout(LayoutKind.Sequential, Pack = 1)]
public struct CompactRecord
{
    public int Id;
    public short Code;
    public byte Flags;
}

This removes padding and the struct becomes smaller, improving density.

However, on ARM architectures (Apple Silicon, ARM cloud servers), unaligned memory access can trigger additional instructions and reduce SIMD efficiency. Prefer natural alignment for in-memory collections unless profiling proves otherwise. Use Pack = 1 primarily for interop or wire formats.

If you store millions of structs in arrays, alignment often matters more than raw byte size. Measure with real workloads on your target architecture.

6.5 NativeMemory for Extreme Alignment Control

For specialized scenarios—game engines, DSP pipelines, high-frequency trading—you may need explicit cache-line alignment. .NET provides NativeMemory:

unsafe
{
    nint size = 1024;
    nint alignment = 64; // cache line size

    nint ptr = NativeMemory.AlignedAlloc(size, alignment);

    try
    {
        Span<byte> span = new Span<byte>((void*)ptr, (int)size);
        // Operate on aligned memory
    }
    finally
    {
        NativeMemory.AlignedFree(ptr);
    }
}

This gives cache-line-aligned memory with no GC involvement, but requires unsafe code and manual lifetime management. This is not for typical web APIs—it’s for workloads where you have already proven that GC or alignment is a bottleneck.


7 Architectural Case Studies

The previous sections covered individual tools: List<T>, Dictionary<TKey,TValue>, frozen collections, span-based parsing, concurrent collections, and memory-oriented design.

Now we put them together. These case studies reflect real architectural trade-offs: read-heavy systems, hot-path parsing, ranking under load, multi-layer caching, and network protocol handling. Each example calls out concurrency, memory, and API choices explicitly.

7.1 In-Memory Multi-Key Index with Immutable Snapshots

Consider a product catalog requiring O(1) lookup by ID, filtering by (TenantId, Category), sorted enumeration by price, high read concurrency, and infrequent bulk updates.

The correct approach is immutable snapshot rebuild + atomic swap:

public sealed record Product(
    int Id, int TenantId, string Category, decimal Price);

public sealed class ProductIndex
{
    private ImmutableDictionary<int, Product> _byId =
        ImmutableDictionary<int, Product>.Empty;

    private ImmutableDictionary<(int TenantId, string Category),
        ImmutableSortedSet<Product>> _byCategory =
        ImmutableDictionary<(int, string),
            ImmutableSortedSet<Product>>.Empty;

    private static readonly IComparer<Product> PriceComparer =
        Comparer<Product>.Create((a, b) =>
            a.Price != b.Price
                ? a.Price.CompareTo(b.Price)
                : a.Id.CompareTo(b.Id));

    public void Rebuild(IEnumerable<Product> products)
    {
        var idBuilder =
            ImmutableDictionary.CreateBuilder<int, Product>();
        var categoryBuilder =
            ImmutableDictionary.CreateBuilder<
                (int, string), ImmutableSortedSet<Product>>();

        foreach (var p in products)
        {
            idBuilder[p.Id] = p;
            var key = (p.TenantId, p.Category);

            if (!categoryBuilder.TryGetValue(key, out var set))
                set = ImmutableSortedSet.Create(PriceComparer);

            categoryBuilder[key] = set.Add(p);
        }

        Volatile.Write(ref _byId, idBuilder.ToImmutable());
        Volatile.Write(ref _byCategory, categoryBuilder.ToImmutable());
    }

    public Product? GetById(int id)
    {
        var snapshot = Volatile.Read(ref _byId);
        return snapshot.TryGetValue(id, out var p) ? p : null;
    }

    public IEnumerable<Product> GetByCategorySorted(
        int tenantId, string category)
    {
        var snapshot = Volatile.Read(ref _byCategory);
        return snapshot.TryGetValue((tenantId, category), out var set)
            ? set : Enumerable.Empty<Product>();
    }
}

This design guarantees:

  • Lock-free reads
  • Atomic publication of new state
  • Clear separation between rebuild and read

Writers build new dictionaries using builders (avoiding repeated structural churn), then publish them. Readers never observe partial updates. This is the same snapshot pattern from Section 4, now applied to a multi-index scenario.

7.2 Global Leaderboard Ranking

Problem: millions of score updates per minute, occasional reads for Top N, writes must remain cheap.

The key insight is to separate accumulation from ranking:

public sealed class Leaderboard
{
    private readonly ConcurrentDictionary<string, int> _scores = new();

    public void AddScore(string user, int delta)
    {
        _scores.AddOrUpdate(user, delta,
            static (_, old) => old + delta);
    }

    public IReadOnlyList<(string User, int Score)> Top(int n)
    {
        var snapshot = _scores.ToArray();
        var pq = new PriorityQueue<(string User, int Score), int>();

        foreach (var (user, score) in snapshot)
        {
            if (pq.Count < n)
                pq.Enqueue((user, score), score);
            else if (score > pq.Peek().Score)
            {
                pq.Dequeue();
                pq.Enqueue((user, score), score);
            }
        }

        var result = new List<(string, int)>(pq.Count);
        while (pq.Count > 0) result.Add(pq.Dequeue());
        result.Sort((a, b) => b.Item2.CompareTo(a.Item2));
        return result;
    }
}

Writes are lock-free and cheap. Sorting is isolated to the read path. If ranking is extremely frequent, you may maintain a background task that rebuilds a bounded heap periodically and publishes it as a snapshot.

The architectural lesson: separate accumulation from ranking. Don’t try to keep an ordered structure in sync on every write.

7.3 High-Throughput Caching with HybridCache

.NET 9 introduces Microsoft.Extensions.Caching.Hybrid.HybridCache, combining in-memory and distributed caching:

public sealed class ProductService
{
    private readonly HybridCache _cache;

    public ProductService(HybridCache cache) => _cache = cache;

    public Task<Product> GetProductAsync(string id, CancellationToken ct)
    {
        return _cache.GetOrCreateAsync(
            $"product:{id}",
            async _ => await LoadFromDatabaseAsync(id, ct),
            token: ct);
    }
}

Internally, HybridCache:

  • Uses an in-memory cache for fast reads
  • Delegates to distributed cache when needed
  • Deduplicates concurrent factory executions
  • Avoids unnecessary allocations

When should you build a custom hybrid cache? Only when you need domain-specific eviction rules, custom serialization formats, or precise control over memory limits. Otherwise, prefer the framework implementation—it already handles concurrency, stampede protection, and layering correctly.

Architecturally, the key decision is not “how to cache,” but where to store state (memory vs distributed), how to manage eviction, and how to prevent duplicate expensive work. Collections underpin all of these.

7.4 Packet Processing with ReadOnlySequence<byte> and Pipelines

For network protocols, collections are memory segments and sequences. System.IO.Pipelines integrates with ReadOnlySequence<byte> for zero-copy packet processing:

using System.Buffers;
using System.IO.Pipelines;

public static async Task ProcessAsync(
    PipeReader reader, CancellationToken ct)
{
    while (true)
    {
        var result = await reader.ReadAsync(ct);
        var buffer = result.Buffer;

        while (TryParse(ref buffer, out var packet))
            HandlePacket(packet);

        reader.AdvanceTo(buffer.Start, buffer.End);

        if (result.IsCompleted) break;
    }

    await reader.CompleteAsync();
}

public static bool TryParse(
    ref ReadOnlySequence<byte> buffer,
    out ReadOnlySequence<byte> packet)
{
    if (buffer.Length < 4)
    {
        packet = default;
        return false;
    }

    var reader = new SequenceReader<byte>(buffer);

    if (!reader.TryReadBigEndian(out int length) ||
        reader.Remaining < length)
    {
        packet = default;
        return false;
    }

    packet = buffer.Slice(reader.Position, length);
    buffer = buffer.Slice(reader.Position).Slice(length);
    return true;
}

This avoids copying into intermediate arrays, works with segmented buffers, and scales under high network throughput. It’s still collection design—just at a lower abstraction level.

7.5 End-to-End Hot Path: Span + Alternate Lookup + FrozenDictionary

This ties everything together: an HTTP header router that parses without allocation, routes via frozen configuration, and avoids string creation.

Step 1: Frozen dictionary at startup.

static readonly FrozenDictionary<string, int> HeaderMap =
    new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase)
    {
        ["content-type"] = 1,
        ["authorization"] = 2,
        ["x-tenant-id"] = 3
    }
    .ToFrozenDictionary(StringComparer.OrdinalIgnoreCase);

Step 2: Span-based parsing.

ReadOnlySpan<char> line = headerLine.AsSpan();
int colonIndex = line.IndexOf(':');
if (colonIndex < 0) return;
ReadOnlySpan<char> name = line[..colonIndex];

Step 3: Alternate lookup without allocation.

var lookup = HeaderMap.GetAlternateLookup<ReadOnlySpan<char>>();
if (lookup.TryGetValue(name, out var headerId))
    HandleHeader(headerId);

This hot path allocates no strings, uses FrozenDictionary for optimized lookup, and uses span slicing for parsing. This is the signature modern pattern: parse with Span<T>, lookup with alternate comparers, store configuration in frozen collections.

It demonstrates the core promise of modern C# collections: predictable performance, minimal allocation, and clean architectural separation.


8 The Ecosystem: Benchmarking, Diagnostics, and Decision Matrix

If you’re choosing collections based on “this feels faster,” you’ll be wrong often enough to ship regressions. The good news is the .NET ecosystem makes it straightforward to measure, confirm, and operationalize collection decisions.

This section gives you practical benchmarking patterns, shows what modern LINQ already provides, and ends with a decision matrix meant to be referenced during design reviews.

8.1 Micro-benchmarks with BenchmarkDotNet

A single TryGetValue benchmark rarely tells the story. The real differences show up when you benchmark hit vs miss rates, string-keyed lookups (where hashing and comparer choice matter), and allocations (especially in span-to-string scenarios).

Use [MemoryDiagnoser] so you see allocation and GC data alongside throughput:

[MemoryDiagnoser]
public class FrozenVsDictionaryBenchmarks
{
    private Dictionary<string, int> _dict = null!;
    private FrozenDictionary<string, int> _frozen = null!;
    private string[] _hitKeys = null!;

    [GlobalSetup]
    public void Setup()
    {
        _dict = new Dictionary<string, int>(
            10_000, StringComparer.Ordinal);
        for (int i = 0; i < 10_000; i++)
            _dict.Add("k:" + i, i);

        _frozen = _dict.ToFrozenDictionary(StringComparer.Ordinal);
        _hitKeys = Enumerable.Range(0, 9_000)
            .Select(i => "k:" + i).ToArray();
    }

    [Benchmark]
    public int Dictionary_HitHeavy()
    {
        int sum = 0;
        foreach (var key in _hitKeys)
            if (_dict.TryGetValue(key, out var v)) sum += v;
        return sum;
    }

    [Benchmark]
    public int Frozen_HitHeavy()
    {
        int sum = 0;
        foreach (var key in _hitKeys)
            if (_frozen.TryGetValue(key, out var v)) sum += v;
        return sum;
    }
}

Illustrative Results

Benchmark results depend on CPU, runtime version, and key distribution. But the output shape is what matters—you get time + allocation data in one place:

MethodMean (ns)Allocated
Dictionary_HitHeavy1,200,0000 B
Frozen_HitHeavy1,020,0000 B
Dictionary_MissHeavy190,0000 B
Frozen_MissHeavy165,0000 B
Dictionary_IterateAll420,0000 B
Frozen_IterateAll520,0000 B

Frozen often wins on lookup-heavy workloads, especially with ordinal comparers. Iteration can be slower depending on the internal layout chosen at freeze time. Allocation should stay at 0 B; if it doesn’t, you likely introduced accidental string creation or captured closures.

Micro-benchmarks are necessary but not sufficient. Use them to validate a hypothesis, then confirm with an integration benchmark (real request path, real payload shape).

8.2 Modern LINQ: Built-in First

For .NET 6+, many patterns that previously required third-party libraries are now built-in:

// Batching with Chunk (replaces MoreLINQ Batch)
var batches = Enumerable.Range(1, 100).Chunk(10);

// Distinct by key
var uniqueUsers = users.DistinctBy(u => u.Id);

In .NET 9, LINQ expands with operators that reduce boilerplate in aggregation-heavy code:

  • CountBy(...) for frequency maps
  • AggregateBy(...) for grouped accumulation
  • Index() to carry index values cleanly in ordered sequences

These are especially relevant when building in-memory summaries and wanting to avoid repeated dictionary plumbing. The takeaway: check the current BCL first. It’s usually faster, better supported, and more AOT-friendly than pulling in a dependency.

8.3 Collection Serialization

If you serialize large graphs of lists and dictionaries frequently (distributed cache, RPC payloads), JSON’s allocation and CPU cost becomes visible. Libraries like MemoryPack or MessagePack-CSharp can reduce payload size, CPU time, and allocations:

using MemoryPack;

[MemoryPackable]
public partial record Order(int Id, decimal Total);

var orders = new List<Order> { new(1, 10.5m), new(2, 20m) };
byte[] bytes = MemoryPackSerializer.Serialize(orders);
var roundTrip = MemoryPackSerializer.Deserialize<List<Order>>(bytes);

Serialization is a collection decision too. If your cache stores Dictionary<string, object>, your serializer is forced into reflection-heavy work and trimming becomes harder. Strongly typed collections make serialization faster and AOT-safer.

8.4 Runtime Diagnostics

Monitor GC metrics with dotnet-counters:

dotnet-counters monitor --process-id <PID> System.Runtime \
  --counters gc-heap-size,alloc-rate,gen-0-gc-count,gen-2-gc-count,loh-size

What to look for: alloc-rate spiking during bursts (hidden allocations), loh-size growing steadily (large buffers crossing LOH thresholds), frequent Gen2 collections (long-lived retention).

Capture GC events with dotnet-trace to identify allocation hotspots and LOH sources:

dotnet-trace collect --process-id <PID> \
  --providers Microsoft-Windows-DotNETRuntime:0x1C000080000:5 \
  --duration 00:00:30 \
  --output gc.trace.nettrace

Open the trace in Visual Studio or PerfView. You’re looking for:

  • Allocation hotspots (where allocations originate)
  • LOH allocations and who created them
  • GC pause distribution (tail latency impact)

This is where you confirm whether “FrozenDictionary reduced allocations” actually affected the real request path.

8.5 Collection Decision Matrix

CollectionLookupInsert/UpdateThread-SafetyBest For
List<T>O(n)Amortized O(1)Reads onlyOrdered sequences, tight iteration
T[]O(n)Fixed sizeReads onlyHot loops, fixed buffers
Dictionary<TKey,TValue>O(1) avgO(1) avgReads if no mutationMutable maps, request-scoped lookup
OrderedDictionary<TKey,TValue>O(1) avgO(1) avgSame as DictionaryInsertion order + keyed lookup
SortedDictionary<TKey,TValue>O(log n)O(log n)Same as DictionaryKey-ordered maps, range queries
ImmutableArray<T>O(n)RebuildSafe snapshotsRead-heavy snapshots, config views
ImmutableDictionary<TKey,TValue>O(log n)O(log n)Safe snapshotsFunctional state, safe publication
FrozenDictionary<TKey,TValue>O(1) optimizedRebuildSafe snapshotsCreate-once read-mostly maps
ConcurrentDictionary<TKey,TValue>O(1) avgO(1) avgYesShared mutable maps under contention
ConcurrentQueue<T>N/AO(1)YesFIFO producer/consumer
ConcurrentBag<T>N/AO(1) avgYesThread-local workloads, work-stealing
Channel<T>N/AN/AYesAsync pipelines with backpressure
PriorityQueue<TElement,TPriority>Peek O(1)O(log n)NoScheduling, top-N selection
ReadOnlySequence<T>N/AN/ASafe by designNetwork parsing, segmented buffers

How to use this:

  • Start with the dominant operation (lookup, iterate, update, enqueue/dequeue).
  • Decide whether data is shared mutable, shared immutable, or request-local.
  • Consider the data lifecycle: built once and read forever? Updated frequently? Rebuilt periodically?
  • Confirm with a benchmark and runtime counters, not intuition.

Modern C# collection design is not about picking List<T> over Dictionary<TKey,TValue>. It’s about aligning data structures with workload characteristics, memory behavior, concurrency requirements, and deployment constraints.

Advertisement