Skip to content
Container-Optimized .NET: Native AOT, Trimming & GC Tuning on AKS and Azure Container Apps

Container-Optimized .NET: Native AOT, Trimming & GC Tuning on AKS and Azure Container Apps

1 Why container-optimized .NET now? (Context, goals, and trade-offs)

Containerized .NET applications have matured from “it works in Docker” to “it scales efficiently across thousands of pods.” The conversation has shifted from portability to performance, cold-start latency, and memory economics. In distributed environments like AKS (Azure Kubernetes Service) and Azure Container Apps (ACA), the ability to start fast, consume less RAM, and coexist with other workloads defines both reliability and cost efficiency. Modern .NET—from .NET 7 onward—gives us new tools for this: Native AOT, link-time trimming, and GC tuning designed for containers.

This guide focuses on how to apply these tools in production, not just what they do. You’ll see why startup speed matters more than ever, what each compilation model optimizes, and how to make code truly lean without losing diagnostics or maintainability.

1.1 Problem framing: cold starts, noisy neighbors, and memory bills in K8s/ACA

In a static VM or App Service, your .NET service starts once and lives for weeks. In Kubernetes or ACA, it might start hundreds of times per day—scaled by request rate, KEDA rules, or scheduled jobs. Each cold start burns CPU to JIT-compile IL, initializes GC heaps, and loads assemblies. When scaling from zero, this startup time directly adds to user latency.

A few recurring pain points:

  • Cold starts – Even lightweight ASP.NET Core APIs can take 400–800 ms to warm up under JIT. Multiply that by dozens of replicas and you’re burning precious CPU cycles before doing real work. In ACA jobs or event-driven workers, that’s often longer than the job itself.
  • Noisy neighbors – Containers share nodes. If one service over-allocates memory, it can trigger GC pressure or OOMs in another. The .NET GC by default sizes heaps to ~75 % of container limits, so multiple processes can unknowingly overcommit.
  • Memory bills – Memory is the silent cost driver in AKS and ACA. A 512 MiB service scaled to 50 replicas across three environments adds up quickly. Smaller resident sets (RSS) from trimming and AOT compilation reduce both cloud cost and bin-packing friction.

Optimizing .NET for containers isn’t about chasing micro-benchmarks—it’s about controlling startup latency, memory footprint, and CPU efficiency under elastic scaling.

1.2 Startup vs. throughput vs. memory: how serverless-ish scaling changes priorities

Traditional tuning balanced throughput vs. latency for steady-state workloads. In container platforms, we add a third axis: startup cost.

ConcernWhat mattersWhy it matters in containers
StartupBuild size, IL→native overhead, JIT warm-upImpacts scale-out speed, cold-start latency
ThroughputGC tuning, thread pool, async efficiencySustained request handling under load
MemoryTrimmed assemblies, GC heap size, native footprintDetermines pod density and cost efficiency

ACA, KEDA, and Kubernetes Horizontal Pod Autoscalers all assume new replicas appear quickly. A 1-second startup vs. 300 ms can mean missing your P95 latency SLOs under burst load. Similarly, when scaling down to zero, memory-heavy images delay cold starts and inflate registry pulls.

Modern .NET gives you compilation strategies (JIT/R2R/AOT) and linker optimizations (trimming) to balance these axes. The key is understanding what each mode optimizes—and its costs.

1.3 What’s new in modern .NET relevant to containers

Three evolutions since .NET 6 have fundamentally changed container efficiency:

  1. Native AOT (Ahead-of-Time compilation) – Introduced as stable in .NET 8, it compiles IL directly to native code, removing the JIT entirely. The result: faster startup and smaller memory footprints—ideal for microservices and background jobs.
  2. Trimming and link-time analysis – The IL linker now aggressively removes unused code, even across assemblies, when PublishTrimmed=true. Combined with AOT, this can cut output sizes by 50–80 %.
  3. Container diagnosticsdotnet-monitor, dotnet-counters, and OpenTelemetry integrations are now container-friendly. You can capture runtime metrics without volume mounts or privileged access.
  4. ASP.NET Core minimal APIs and source generators – Reduced reflection, leaner DI, and compile-time endpoint generation make ASP.NET Core more compatible with AOT and trimming.
  5. Smarter GC for containers – The runtime now detects cgroup memory limits (especially under v2) and sizes heaps proportionally, respecting DOTNET_GCHeapHardLimit.

Together, these features shift .NET from “fast after warm-up” to “fast from the first request,” enabling parity with Go and Rust in container environments while keeping the productivity of C#.

1.4 Baseline mental model: JIT vs. ReadyToRun (R2R) vs. Native AOT

Think of these as points on a spectrum balancing flexibility, startup speed, and build complexity.

ModeHow it runsBenefitsTrade-offs
JIT (default)IL compiled at runtimePortable, simple builds, full reflection supportSlow startup, more memory overhead
ReadyToRun (R2R)IL pre-compiled to native ahead of time, still uses JIT for some codeFaster startup, retains full .NET feature setLarger binaries, architecture-specific
Native AOTFully compiled to native, no JIT presentSmallest, fastest startup, low RSSLimited reflection, static linking, reduced flexibility

1.4.1 JIT: dynamic and flexible

JIT (Just-In-Time) compilation turns IL into native machine code as methods execute. It enables dynamic features like reflection, runtime code generation, and cross-platform portability. The downside: startup tax—each process must JIT its own methods—and increased memory use for JIT caches.

1.4.2 ReadyToRun: hybrid

ReadyToRun pre-compiles most IL at publish time using crossgen2. The resulting assemblies embed native code sections, reducing JIT time. However, they remain partly managed—methods that rely on runtime generics or dynamic code may still be JIT-compiled. You gain startup speed but pay with larger images (often 20–50 % bigger).

1.4.3 Native AOT: static and lean

Native AOT eliminates the JIT entirely. You get single-file executables with no dependency on the .NET runtime (other than a few native libraries). The trade-off is stricter feature support: limited dynamic code, reflection via source generation, and constrained libraries. For microservices or background jobs that don’t use dynamic loading, it’s a clear win.

1.5 Scope of this guide

This series focuses on three workload types running on AKS and Azure Container Apps:

  1. Microservices – REST or gRPC APIs with predictable workloads, where startup and memory dominate cost.
  2. Background jobs – Short-lived or bursty workloads triggered by queues, events, or schedules.
  3. Sidecars – Lightweight agents (e.g., Dapr, telemetry exporters) co-deployed with main apps, where small memory footprints prevent OOMs.

Each section builds from fundamentals (compilation and trimming) toward container-specific tuning and deployment strategies. You’ll see side-by-side builds—JIT, R2R, and Native AOT—deployed to AKS and ACA with performance measurements.


2 Compilation strategies for containers: JIT, R2R, and Native AOT

In containers, compilation strategy is not an academic choice—it directly impacts cold-start latency, image size, and resource usage. Let’s dissect how each works, when to use it, and how to integrate it into a CI/CD pipeline targeting AKS and ACA.

2.1 JIT inside containers: warm-up cost, profile data, and image size implications

A default dotnet publish creates IL assemblies that depend on JIT compilation at runtime. On first execution, methods are compiled on demand.

2.1.1 Warm-up overhead

When a pod starts, JIT compilation contributes hundreds of milliseconds before handling the first request. You can mitigate this by profile-guided optimization (PGO) introduced in .NET 8.

Example build with dynamic PGO enabled:

dotnet publish -c Release -p:TieredPGO=1

PGO records which methods are frequently executed, optimizing them more aggressively across future runs. In container environments, you can capture profile data from staging workloads and bake it into production images via:

dotnet publish -c Release -p:ReadyToRunProfilePath=profiledata.mibc

2.1.2 Image size

JIT builds are smallest—only IL assemblies and the runtime—but every container carries the full .NET runtime image (~200 MB). Smaller base images like mcr.microsoft.com/dotnet/aspnet:8.0-alpine reduce pull time but not JIT cost.

2.1.3 When JIT makes sense

Use JIT when:

  • You need dynamic features (reflection, plugins, dynamic loading)
  • Build times must be minimal
  • Startup latency isn’t critical (long-running services)

For everything else, consider R2R or AOT.

2.2 ReadyToRun (R2R)

ReadyToRun is a middle ground: pre-compile IL into native code at publish time using crossgen2. It speeds up cold starts while keeping the full runtime feature set.

2.2.1 How it works

During publishing:

dotnet publish -c Release -p:PublishReadyToRun=true

The compiler produces PE files that embed both IL and native sections. The runtime uses the native version directly, skipping JIT for most methods.

2.2.2 Show warnings and verify

Crossgen2 can emit diagnostics about unverifiable methods:

dotnet publish -c Release -p:PublishReadyToRun=true -p:PublishReadyToRunShowWarnings=true

Warnings typically indicate dynamic code that couldn’t be pre-compiled—these methods still JIT at runtime.

2.2.3 Performance and image impact

Expect:

  • Startup 20–40 % faster than JIT
  • Image 20–60 % larger, due to native sections
  • Memory usage slightly lower once warmed up (less JIT cache)

Example Dockerfile for R2R:

FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build
WORKDIR /src
COPY . .
RUN dotnet publish -c Release -p:PublishReadyToRun=true -o /app/publish

FROM mcr.microsoft.com/dotnet/aspnet:8.0
WORKDIR /app
COPY --from=build /app/publish .
ENTRYPOINT ["dotnet", "MyService.dll"]

2.2.4 Known caveats

  • Platform-specific binaries—need per-architecture builds (linux-x64, linux-arm64)
  • Larger deployment packages
  • Slightly longer build times
  • Still requires the .NET runtime in the container

R2R fits well for long-running APIs or gRPC services where startup latency matters but full runtime features are needed.

2.3 Native AOT

Native AOT produces standalone executables with no JIT and no runtime dependencies. It’s the most transformative change for containerized .NET.

2.3.1 Publishing a Native AOT binary

dotnet publish -r linux-x64 -c Release -p:PublishAot=true

Output: a single native ELF binary.

2.3.2 Benefits

  • Startup <100 ms even for ASP.NET Core apps
  • Memory use 30–60 % lower (no JIT or metadata tables)
  • Smaller images when paired with distroless or alpine bases
  • No runtime dependency—just a native binary and libc

2.3.3 Limitations

  • Limited reflection and dynamic code
  • No runtime code generation (e.g., System.Reflection.Emit)
  • Some libraries not yet compatible (dynamic proxies, some serializers)
  • Longer build times and platform-specific output

2.3.4 Compatibility checks

Run trimming analysis before attempting AOT:

dotnet publish -c Release -p:PublishTrimmed=true -p:TrimMode=link

If the build passes without trimming warnings, it’s likely AOT-safe. The compiler will flag unsupported APIs automatically.

2.4 ASP.NET Core with Native AOT

ASP.NET Core 8+ officially supports AOT for minimal APIs and lightweight web services. This subset avoids heavy DI or runtime code generation.

2.4.1 Suitable patterns

  • Minimal APIs
  • gRPC services with static contracts
  • Background workers (Queue or Event processing)
  • CLI tools and sidecars

2.4.2 Example minimal API build

Program.cs:

var app = WebApplication.CreateSlimBuilder(args).Build();
app.MapGet("/healthz", () => "OK");
app.Run();

csproj:

<PropertyGroup>
  <OutputType>Exe</OutputType>
  <TargetFramework>net8.0</TargetFramework>
  <PublishAot>true</PublishAot>
  <InvariantGlobalization>true</InvariantGlobalization>
</PropertyGroup>

Dockerfile:

FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build
WORKDIR /src
COPY . .
RUN dotnet publish -r linux-x64 -c Release /p:PublishAot=true -o /out

FROM mcr.microsoft.com/dotnet/runtime-deps:8.0
WORKDIR /app
COPY --from=build /out .
ENTRYPOINT ["./MyService"]

2.4.3 Current gaps and workarounds

  • Reflection in DI frameworks – Prefer source-generated DI (e.g., Microsoft.Extensions.DependencyInjection.Generators)
  • JSON serialization – Use System.Text.Json source generation
  • OpenTelemetry exporters – Work with trimming mode link and explicit attributes
  • Middleware discovery – Avoid dynamic assembly scans; register explicitly

2.4.4 Trimming and AOT synergy

AOT implicitly trims unused code. Still, specify:

dotnet publish -c Release -r linux-x64 -p:PublishAot=true -p:TrimMode=full

This produces minimal native executables under 20 MB for small APIs.

2.5 Decision tree: choosing JIT + R2R vs. Native AOT

CriterionPrefer JITPrefer R2RPrefer Native AOT
Needs heavy reflection / dynamic loading
Startup latency critical (<300 ms)⚠️
Memory constrained (<512 MiB per pod)⚠️
Build simplicity⚠️
CI/CD build time tolerance⚠️
Portability across OS/arch⚠️❌ (arch-specific)
Diagnostics & observability⚠️ (limited runtime hooks)

Rule of thumb:

  • For APIs and gRPC services, start with R2R and dynamic PGO; consider AOT when warm-up dominates latency.
  • For background jobs and event handlers, default to Native AOT—they benefit most from short startup and low memory.
  • For sidecars or agents, use AOT for minimal footprint and static linking.

Trimming eliminates unused IL and metadata at build time, reducing image size and memory footprint. It’s essential whether you target JIT, R2R, or AOT—though AOT benefits the most.

3.1 What trimming is and why it matters in containers

Trimming analyzes code and dependencies to remove unreferenced members, similar to link-time optimization in C/C++. In containers, trimming helps by:

  • Reducing image size – Less code copied into images → faster pulls and deployments
  • Lowering RSS – Fewer assemblies and metadata tables loaded
  • Improving cold starts – Less code to load at startup

Typical size reductions:

  • Minimal API: from 90 MB → 35 MB trimmed
  • Native AOT: down to 15–20 MB total binary

3.2 Trimming modes and pitfalls

Set trimming with:

dotnet publish -c Release -p:PublishTrimmed=true

Key options:

  • TrimMode=copyused – Conservative (safe for libraries)
  • TrimMode=link – Aggressive (for apps)
  • SuppressTrimAnalysisWarnings=false – Show potential breakages

3.2.1 Common pitfalls

  • Reflection – If code uses Type.GetType("Foo"), linker can’t detect it. Annotate with [DynamicallyAccessedMembers].
  • Serializers – Libraries like Newtonsoft.Json rely on reflection; prefer System.Text.Json with source generation.
  • DI frameworks – Reflection-based injection may remove needed constructors. Switch to compile-time DI generators.
  • Plug-ins or MEF – Dynamic loading via Assembly.Load can’t be analyzed safely; mark assemblies as PreserveDependency.

Example attribute usage:

[DynamicDependency(DynamicallyAccessedMemberTypes.PublicConstructors, typeof(MyType))]

3.3 Making libraries trim-friendly

3.3.1 Patterns

  • Avoid string-based reflection (Activator.CreateInstance("TypeName"))
  • Replace typeof(T).Assembly.GetTypes() with explicit registrations
  • Use source generators for DI, JSON, and gRPC code
  • Apply [RequiresUnreferencedCode] to methods that use reflection internally

3.3.2 Choosing NuGet packages

Check for the <IsTrimmable>true</IsTrimmable> property in a library’s .nuspec. Libraries that declare trimming compatibility are safe for AOT pipelines.

Examples:

  • System.Text.Json (with source gen)
  • prometheus-net
  • ⚠️ Newtonsoft.Json
  • ⚠️ Autofac (reflection-heavy)
  • Microsoft.Extensions.DependencyInjection.Generators

3.4 Verification: enforcing trimming correctness in CI

Add this step in your pipeline:

dotnet publish -c Release -p:PublishTrimmed=true -warnaserror:IL2026,IL3050

Fail builds if trimming warnings appear. ILLink analyzer warnings (IL2026, IL3050) indicate missing attributes or unsafe reflection.

You can integrate this into GitHub Actions:

- name: Build trimmed
  run: dotnet publish -c Release -p:PublishTrimmed=true -warnaserror

3.5 Example: trimming a minimal API and a background job

3.5.1 Minimal API

Untrimmed:

dotnet publish -c Release
# Output: 92 MB
# Startup: ~450 ms

Trimmed:

dotnet publish -c Release -p:PublishTrimmed=true -p:TrimMode=link
# Output: 38 MB
# Startup: ~260 ms

3.5.2 Background job (AOT)

Untrimmed AOT:

dotnet publish -r linux-x64 -p:PublishAot=true
# 28 MB binary

Trimmed AOT:

dotnet publish -r linux-x64 -p:PublishAot=true -p:TrimMode=full
# 16 MB binary

Cold start dropped from 300 ms → 90 ms; RSS reduced by ~35 %.

3.6 Open-source libraries that play nicely with trimming/AOT

When building AOT-ready containerized services, library choice matters.

LibraryCompatibleNotes
System.Text.JsonUse source generators for reflection-free serialization
YARP⚠️Works with trimming if middleware registration explicit
prometheus-netNo reflection usage
OpenTelemetry SDKUse 1.7+; exporter trimming safe
MassTransit / NServiceBus⚠️Reflection-heavy; test thoroughly
Dapr SDK⚠️Some reflection; isolate in sidecar for AOT safety

If your dependency stack includes reflection-heavy libraries, prefer R2R builds and partial trimming to maintain reliability.


4 GC & memory tuning in containers (with cgroup v2 awareness)

The garbage collector (GC) is one of the most important moving parts of .NET performance inside containers. Unlike VMs, containers run under explicit memory constraints—set by resources.limits.memory in Kubernetes or by the environment in Azure Container Apps (ACA). The .NET runtime automatically adapts heap sizing based on those limits, but the defaults can surprise you. Understanding how the GC reacts to container limits and how to override that behavior is essential for keeping services stable and cost-efficient.

4.1 How .NET GC sizes heaps in containers by default (75% of limit or 20 MB min) and why that matters for pod OOMs and bin-packing

When running inside a container, .NET uses cgroup data to estimate available memory. The GC sets its heap budget to roughly 75% of the container’s memory limit or a minimum of 20 MB—whichever is higher. That 75% is a heuristic meant to prevent aggressive collection but still leave headroom for native allocations, thread stacks, and runtime overhead.

In Kubernetes, this means a pod with:

resources:
  limits:
    memory: 512Mi

will let the .NET GC allocate up to about 384 MiB for managed heaps before it starts collecting aggressively.

Why it matters

This works well for single-process containers, but if you have multiple processes (e.g., app + sidecar), each assumes it owns 75% of the limit. The combined footprint often exceeds the cgroup limit, leading to OOMKills even though each process individually looks fine. Similarly, bin-packing on AKS relies on pods respecting their limits—so any overuse disrupts scheduling efficiency.

If you’ve ever seen random OOMs despite “plenty” of memory, or inconsistent behavior between local Docker runs and AKS, this default is usually why.

4.2 cgroup v2 on AKS and Azure Linux nodes: what changed in accounting, how it can shift observed usage

Recent AKS and Azure Linux node pools now use cgroup v2 by default. The change isn’t visible in YAML, but it directly affects how memory usage is measured.

Under cgroup v1, .NET often misread available memory because the GC saw only node-level metrics or partial container quotas. cgroup v2 exposes unified, hierarchical memory accounting that lets .NET measure exactly the memory assigned to the container.

Implications

  1. Tighter enforcement – GC now correctly stops before crossing memory limits, meaning apps that previously worked near the edge may now hit OOMs sooner.
  2. Different RSS numbers – Because v2 counts page cache and shared memory differently, metrics from tools like kubectl top pod or dotnet-counters may shift by 5–15%.
  3. Behavioral change after cluster upgrades – Upgrading from AKS 1.26+ or Azure Linux node pools can subtly increase GC frequency, as the runtime perceives less available memory.

Practical check

You can verify cgroup mode from inside the container:

cat /sys/fs/cgroup/cgroup.controllers

If the output lists controllers (not subdirectories), you’re on cgroup v2.

When moving to v2 clusters, retest your GC tuning and review DOTNET_GCHeapHardLimit settings. What used to fit comfortably under v1 may now need explicit limits or reduced concurrency.

4.3 Key knobs: Server GC vs. Workstation, GCLatencyMode, heap limits, and NUMA considerations

4.3.1 GC modes

Containers default to Server GC, optimized for throughput on multi-core environments. It spawns one GC thread per core and uses larger heaps. That’s ideal for gRPC backends or high-QPS APIs but can be heavy for small microservices.

You can switch to Workstation GC by setting:

DOTNET_GCServer=0

or in runtimeconfig.json:

{
  "runtimeOptions": {
    "configProperties": {
      "System.GC.Server": false
    }
  }
}

Server GC scales well above 2 cores. Below that, Workstation GC often gives lower latency and smaller footprints.

4.3.2 GCLatencyMode

GCLatencyMode controls how aggressively GC blocks for collections. For bursty jobs or background tasks, SustainedLowLatency avoids long pauses:

System.Runtime.GCSettings.LatencyMode = GCLatencyMode.SustainedLowLatency;

For microservices that serve short requests, this can reduce p95 spikes. For CPU-bound backends, stay with default Batch mode.

4.3.3 Heap hard limits

You can enforce a hard cap on GC heap size with environment variables:

DOTNET_GCHeapHardLimit=300000000     # bytes
DOTNET_GCHeapHardLimitPercent=70     # percentage of total container memory

This overrides the 75% heuristic and ensures GC never exceeds that value. It’s especially useful when multiple containers share a pod.

4.3.4 NUMA awareness

Inside containers, NUMA awareness rarely helps unless the container spans cores from multiple NUMA nodes (uncommon in small nodes). If you see unpredictable GC performance in large-node clusters, you can disable NUMA-based heap partitioning:

DOTNET_GCCpuGroup=0

This avoids per-node heap allocation overhead.

4.4 Multiple processes per pod (app + sidecars): coordinating memory targets across processes

Modern .NET pods often include sidecars—Dapr, Envoy, OpenTelemetry collector, or metrics agents. Each process interprets the same memory limit independently, so each sets its heap to ~75% of the total. Add them up, and the node kernel sees 150–200% of the pod’s limit, triggering OOM kills.

For example:

containers:
- name: api
  image: myservice
  resources:
    limits:
      memory: 512Mi
- name: dapr
  image: daprio/daprd

Each container believes it can use ~384 MiB. The fix is to split limits explicitly:

resources:
  limits:
    memory: 384Mi  # app
---
resources:
  limits:
    memory: 128Mi  # dapr

Or set DOTNET_GCHeapHardLimit manually in the .NET container to coordinate budgets. Always validate with kubectl top pod—if RSS for all containers exceeds the total limit, reduce heap caps.

When sidecars are essential, using Native AOT apps can reclaim enough headroom to stay within the combined limit.

4.5 Practical recipes

4.5.1 Microservice with tight 256–512 MiB limits (latency-biased)

For APIs that prioritize responsiveness and quick scaling:

DOTNET_GCServer=1
DOTNET_GCHeapHardLimitPercent=65
DOTNET_ReadyToRun=1
DOTNET_TieredPGO=1

Keep GC small enough to avoid background compaction pauses. In code, you can tune latency mode:

GCSettings.LatencyMode = GCLatencyMode.Interactive;

Publish trimmed or AOT builds to reduce JIT and metadata memory. Measure with dotnet-counters under load:

dotnet-counters monitor --process-id <pid> System.Runtime

4.5.2 gRPC service with higher throughput (server GC tuning)

gRPC workloads benefit from large object heap stability and throughput optimization. Configure:

DOTNET_GCServer=1
DOTNET_GCHeapHardLimitPercent=75
DOTNET_GCLatencyLevel=Batch

Ensure requests and limits match CPU expectations:

resources:
  requests:
    cpu: "500m"
    memory: "512Mi"
  limits:
    cpu: "1"
    memory: "768Mi"

Benchmark under load with ghz to confirm sustained throughput:

ghz --insecure --proto ./proto/service.proto --call MyService.Echo -d '{"msg":"hi"}' -n 10000 0.0.0.0:5000

Server GC with two or more cores typically yields smoother p95 latencies.

4.5.3 Short-lived job (SustainedLowLatency, no LOH thrash)

Event-driven jobs or ACA Jobs that start frequently should avoid GC stalls:

DOTNET_GCServer=0
DOTNET_GCHeapHardLimitPercent=60

In code:

GCSettings.LatencyMode = GCLatencyMode.SustainedLowLatency;

Use Native AOT where possible to eliminate JIT and minimize footprint. To prevent large-object heap churn:

ArrayPool<byte>.Shared.Rent(1024 * 32);

Reusing buffers keeps LOH allocation under control.

4.6 Measuring impact: RSS vs. GC heap vs. native allocations

When tuning GC, watch three distinct metrics:

  • RSS (Resident Set Size) – Total physical memory, includes managed + native.
  • GC heap size – Managed memory only; view via dotnet-counters (gc-heap-size-bytes).
  • Native allocations – From thread stacks, runtime, buffers, JIT code (less in AOT).

Example command:

dotnet-counters collect --process-id <pid> --counters System.Runtime

Native AOT changes the shape of this curve: RSS is dominated by managed heap and native runtime segments, not JIT caches. Typically you’ll see 25–40% less total memory compared to equivalent R2R builds.

The key is correlating heap limits with RSS plateaus—when GC heap stabilizes but RSS keeps growing, you likely have unmanaged allocations (e.g., gRPC buffers or pinned arrays). Tune heap limits and buffer pools accordingly.


5 Real-world implementation on AKS and Azure Container Apps

Now that we’ve tuned build and runtime settings, let’s look at real-world deployment. The goal is to pair the right base image, pod spec, and scaling configuration to make container-optimized .NET deliver consistent performance across AKS and ACA.

5.1 Container base choices: Azure Linux/Mariner vs. Debian/Ubuntu; glibc vs. musl considerations for AOT; scanning and SBOM in CI

5.1.1 Azure Linux (Mariner)

Azure Linux (CBL-Mariner) is Microsoft’s container-optimized base. It uses glibc, supports musl images via compatibility shims, and offers smaller footprints than Debian-based images. Ideal for AKS workloads needing compliance scanning and long-term support.

5.1.2 Alpine (musl)

mcr.microsoft.com/dotnet/runtime-deps:8.0-alpine uses musl libc, which produces smaller AOT binaries and tighter memory usage. However, debugging tools (like dotnet-trace) can be limited, and some libraries assume glibc. For AOT microservices and ACA jobs, Alpine or Distroless is preferred.

5.1.3 Security and SBOM

Use dotnet publish --os linux --arch x64 --self-contained to avoid unnecessary dependencies. Then generate SBOMs in CI:

dotnet sbom generate --manifest ./manifest.spdx.json

Integrate with container scanners (e.g., Trivy, Azure Defender for Containers) to validate no outdated CVEs before pushing to ACR.

5.2 AKS: Pod specs that matter for .NET

5.2.1 Requests/limits for CPU/memory

Set CPU/memory requests based on queue math rather than guesswork. For example, if your service handles 100 req/s with 25 ms CPU time per request:

(100 * 0.025) = 2.5 CPU cores

With 50% buffer, set:

resources:
  requests:
    cpu: "3"
    memory: "512Mi"
  limits:
    cpu: "3"
    memory: "768Mi"

Minimum replicas are derived from expected p95 latency; always allocate one replica per vCPU for latency-critical APIs.

5.2.2 Startup probes vs. liveness/readiness for R2R/AOT apps

AOT apps start fast, often under 100 ms. To prevent premature restarts during image pulls or networking, decouple startup probes:

startupProbe:
  httpGet:
    path: /healthz
    port: 80
  periodSeconds: 3
  failureThreshold: 20

Once startup is complete, readiness probes gate traffic routing:

readinessProbe:
  httpGet:
    path: /ready
    port: 80
  periodSeconds: 10
  timeoutSeconds: 2

This distinction avoids false restarts when AOT images start before dependencies (e.g., databases) are ready.

5.2.3 Topology examples

  • Single-container pod – simplest, best for APIs or workers.
  • Sidecar (Dapr, Envoy) – add inter-container communication; budget memory explicitly.
  • Init container warm-up – preload cache or compile templates before main container runs:
initContainers:
- name: warmup
  image: curlimages/curl
  command: ["sh", "-c", "curl -s http://localhost/prime-cache"]

This pattern is valuable when using R2R apps that benefit from warming data caches before traffic hits.

5.3 Azure Container Apps (ACA)

5.3.1 Cold-start reducers

ACA automatically scales from zero, so startup time directly affects latency. You can mitigate this with:

  • minReplicas to keep warm instances:
scale:
  minReplicas: 1
  maxReplicas: 10
  • Use regional ACR and enable pre-pull for images.
  • Keep AOT or trimmed images under 100 MB to minimize cold-start pull time.
  • Open the port early in the app to signal readiness:
app.Urls.Add("http://*:8080");
  • Define shorter custom probes for readiness.

5.3.2 Scaling rules (KEDA under the hood)

ACA uses KEDA for event-driven scaling. For queue-based jobs:

scale:
  triggers:
  - type: azure-queue
    metadata:
      queueName: myqueue
      queueLength: "5"

For HTTP-based scaling, ACA watches concurrent requests per replica. Native AOT binaries help scale faster because they initialize in milliseconds.

Scheduled jobs use the cron trigger—ideal for lightweight AOT background tasks that run briefly and exit cleanly.

5.3.3 ACA vs. AKS decision points

Use ACA when:

  • You need automatic scale-to-zero
  • You want managed KEDA triggers
  • You prioritize simplicity over customization

Use AKS when:

  • You run service meshes or sidecars
  • You require custom networking (private clusters, VNETs)
  • You need advanced GC tuning and multi-container coordination

For most event-driven .NET services, ACA is simpler and cheaper; for tightly coupled microservices or mixed workloads, AKS remains the right tool.

5.4 Concrete walk-throughs (code + YAML snippets)

5.4.1 Minimal API service in three builds

Baseline JIT
dotnet publish -c Release

Image size: ~180 MB Startup: ~600 ms Memory: ~150 MB RSS

ReadyToRun
dotnet publish -c Release -p:PublishReadyToRun=true

Image size: ~220 MB Startup: ~380 ms Memory: ~130 MB RSS

Native AOT
dotnet publish -r linux-x64 -p:PublishAot=true

Image size: ~70 MB Startup: ~80 ms Memory: ~85 MB RSS

Deploy YAML excerpt:

apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: api
        image: myapi:aot
        ports:
        - containerPort: 8080

5.4.2 Background job (queue triggered) built as Native AOT

Program.cs:

var queue = new QueueClient("<conn>", "jobs");
await foreach (var msg in queue.ReceiveMessagesAsync())
{
    await ProcessAsync(msg);
}

Publish:

dotnet publish -r linux-x64 -p:PublishAot=true -p:TrimMode=full

For ACA job configuration:

scale:
  triggers:
  - type: azure-queue
    metadata:
      queueName: jobs
template:
  containers:
  - image: myjob:aot
    env:
    - name: DOTNET_GCHeapHardLimitPercent
      value: "60"

In AKS, equivalent as a CronJob:

schedule: "*/10 * * * *"
jobTemplate:
  spec:
    template:
      spec:
        containers:
        - name: job
          image: myjob:aot

5.4.3 Sidecar pattern: API + Dapr sidecar

For APIs using Dapr:

containers:
- name: api
  image: myapi:aot
  resources:
    limits:
      memory: 384Mi
- name: dapr
  image: daprio/daprd:latest
  resources:
    limits:
      memory: 128Mi

In code, you can disable Dapr tracing if redundant:

builder.Services.Configure<DaprOptions>(o => o.EnableTracing = false);

Monitor resource usage:

kubectl top pod myapi-pod

Expect combined RSS under 480 MiB with trimmed AOT builds, compared to 700+ MiB with baseline JIT. This difference can double your pod density on the same node pool.


6 Sidecars & service mesh: performance and cost

Sidecars and meshes bring powerful cross-cutting features to microservices—security, observability, and resilience—but they come at a measurable cost in startup time, memory, and CPU. For container-optimized .NET workloads, the goal is to decide when the benefits outweigh the overhead and how to integrate these components without undoing the gains from Native AOT or trimming.

6.1 When a sidecar is worth it (observability, retries, mTLS, state) vs. when embedded libraries are leaner

The sidecar pattern offloads networking and platform responsibilities to a separate process. Common use cases include mTLS enforcement, automatic retries, service discovery, and distributed tracing. In practice, this means adding a container such as Dapr, Envoy, or an OpenTelemetry collector alongside your .NET app.

Sidecars are worth it when:

  • Security or compliance requires mTLS between services, and you can’t embed certificate rotation logic into each service.
  • Multi-language polyglot systems need consistent retry/backoff or tracing without duplicating code.
  • Stateful or event-driven integration is required (e.g., Dapr bindings or pub/sub).

However, each sidecar consumes 50–200 MiB of memory and adds inter-container latency (often 1–2 ms per hop). For lightweight APIs or short-lived jobs, embedded libraries are leaner and more predictable.

In-process options that replace sidecars effectively:

  • ResiliencePolly for retries, circuit breakers, and fallback.
  • Observability – OpenTelemetry SDK exporting directly via OTLP.
  • Configuration/Secrets – Azure SDKs instead of external injectors.

For example, replacing a Dapr pub/sub call with direct Azure Service Bus SDK access:

await using var client = new ServiceBusClient(conn);
var sender = client.CreateSender("orders");
await sender.SendMessageAsync(new ServiceBusMessage(JsonSerializer.Serialize(order)));

This single in-proc call avoids network serialization overhead through the sidecar, making it ideal for latency-sensitive endpoints.

6.2 Dapr today: capabilities, performance considerations, and production configuration

Dapr has matured considerably, with focus shifting from developer convenience to production-grade performance and component predictability. Modern versions (v1.13+) support direct HTTP/gRPC integration, actor runtime optimization, and configurable connection pooling. Still, each Dapr sidecar adds measurable cost.

Typical footprint on AKS:

  • Memory: 80–150 MiB RSS per Dapr sidecar
  • CPU: ~50–100 millicores idle, more under load
  • Startup delay: 200–400 ms for component initialization

6.2.1 Configuration to reduce impact

Limit Dapr’s scope to only what you use:

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: pubsub
spec:
  type: pubsub.azure.servicebus
  version: v1
  metadata:
  - name: connectionString
    secretRef:
      name: servicebus-secret
scopes:
- orderservice

Avoid loading all default components—each one adds initialization cost. For high-QPS APIs, configure Dapr’s HTTP connection pool:

dapr.io/http-max-conns-per-host: "20"
dapr.io/http-max-idle-conns: "10"

Use dapr run --app-protocol grpc in development to match production wiring. For production builds, always pin to specific versions to avoid upgrade drift:

image: "daprio/daprd:1.13.2"

6.2.2 Performance testing

You can benchmark the Dapr sidecar’s added latency:

bombardier -c 50 -n 5000 http://localhost:3500/v1.0/invoke/orderapi/method/order

Expect roughly 1–1.5 ms per hop overhead compared to direct service invocation. With Native AOT apps, that extra hop may represent 10–15% of total request time, so confirm the trade-off aligns with business goals.

6.3 Envoy/Istio sidecars vs. gateway-only patterns for low-latency .NET APIs

Full meshes like Istio insert Envoy sidecars into every pod for traffic routing, telemetry, and mTLS. The result is powerful observability—but at the cost of per-pod overhead. Each Envoy typically consumes:

  • 100–150 MiB memory
  • 100–300 millicores CPU baseline
  • Additional 0.5–2 ms per hop latency

For APIs optimized via AOT and trimming, that overhead can double end-to-end latency.

A practical alternative is the gateway-only pattern. In this model, only edge or shared ingress pods run Envoy/Istio, while backend services communicate directly over standard Kubernetes DNS. You still get centralized ingress routing and mTLS, but no per-pod sidecar.

Example configuration using Istio’s Gateway and VirtualService:

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: public-gateway
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 443
      name: https
      protocol: HTTPS
    tls:
      mode: SIMPLE
    hosts:
    - "api.example.com"

This pattern scales better for small pods or ACA-style ephemeral instances. If you use a gateway-only mesh, make sure internal traffic still benefits from retry/backoff logic—handled in-process using Polly or the built-in HttpClientFactory.

6.4 Cost modeling: per-pod overhead multiplied by replicas

Sidecars introduce a fixed per-pod cost. For large clusters, that cost scales linearly with replica count. Suppose you deploy 50 microservices with 5 replicas each (250 pods) and each sidecar uses 100 MiB. That’s 25 GiB of RAM consumed just for sidecars, not business logic.

If each pod costs $0.001 per MiB-hour on Azure, that’s roughly $18/day in idle cost—over $500/month of pure overhead. Trimming or AOT builds that reduce app RSS by 30–40% can reclaim that headroom, letting you run both app and sidecar under the same budget.

You can visualize this trade-off with a quick model in C#:

double sidecarMemMiB = 100;
int replicas = 250;
double monthlyCostPerMiB = 0.001 * 24 * 30;
var total = sidecarMemMiB * replicas * monthlyCostPerMiB;
Console.WriteLine($"Sidecar monthly cost: ${total:F2}");

If your workloads depend on Dapr or Envoy, factor this baseline into pod sizing and choose node pools with higher pod density to amortize per-node idle cost.


7 Measuring what matters: counters, traces, and on-call dashboards

Optimization is only real if you can measure it. In containerized .NET, the right metrics are those that connect runtime behavior with container limits and user experience. You want numbers that help you answer, “Is this pod healthy under load?” and “Why did latency spike?” rather than hundreds of unrelated charts.

7.1 The short list of production counters to watch for .NET services in containers

7.1.1 GC counters

Key metrics:

  • % Time in GC
  • Gen0/1/2 Collection Count
  • GC Heap Size (Bytes)
  • LOH Size (Bytes)

These tell you if the heap is balanced. If % Time in GC exceeds 10–15% during normal load, you’re over-allocating or under-tuning heap size.

7.1.2 Thread pool metrics

  • ThreadPool Completed Work Items/sec
  • ThreadPool Queue Length

Sudden queue buildup usually precedes latency spikes. Track how these behave under burst traffic to decide when to increase replicas.

7.1.3 Exception metrics

  • Exception Count
  • First-Chance Exceptions/sec

Frequent first-chance exceptions are costly—even if caught. They often show up as minor latency drift before logs flag errors.

7.1.4 HTTP/gRPC counters

For ASP.NET Core:

  • requests-per-second
  • current-requests
  • request-duration
  • active-connections

Expose these via Prometheus or OpenTelemetry. P95 and P99 latencies under load should guide your autoscaling thresholds.

7.1.5 Memory counters

  • working-set
  • private-bytes
  • gc-heap-size
  • container memory usage (via cgroup)
  • OOM kill count

These tie directly to AKS and ACA cost and stability. A rising working set with flat heap usually means native allocations leaking (e.g., pinned buffers).

7.2 Tooling in containers: dotnet-counters, dotnet-trace, dotnet-gcdump, dotnet-dump

All standard .NET diagnostics tools run inside containers now, even in restricted clusters.

  • dotnet-counters – real-time performance counter stream

    dotnet-counters monitor --refresh-interval 1 --process-id 1 System.Runtime
  • dotnet-trace – lightweight event tracing for performance analysis

    dotnet-trace collect --process-id 1 --duration 30s
  • dotnet-gcdump – GC heap snapshots

    dotnet-gcdump collect --process-id 1
  • dotnet-dump – full process dumps for postmortem analysis

    dotnet-dump collect --process-id 1

All these tools can be side-loaded into running pods using kubectl exec. For production safety, restrict tracing to short durations and redirect output to Azure Blob or persistent volumes.

7.3 OpenTelemetry for .NET: tracing and metrics pipelines

OpenTelemetry has become the default observability stack for containerized .NET. For minimal overhead, use OTLP exporters over gRPC.

Example setup

builder.Services.AddOpenTelemetry()
    .WithMetrics(m => m.AddAspNetCoreInstrumentation()
                       .AddRuntimeInstrumentation()
                       .AddOtlpExporter())
    .WithTracing(t => t.AddAspNetCoreInstrumentation()
                       .AddHttpClientInstrumentation()
                       .AddOtlpExporter());

For Azure Monitor:

.AddOtlpExporter(o => o.Endpoint = new Uri("https://otlp.azure.com"))

In AOT builds, ensure you reference explicit providers instead of using reflection-based discovery. The OpenTelemetry 1.7+ SDK is fully trimming- and AOT-compatible.

7.4 Example dashboards (Prometheus/Grafana)

Effective dashboards answer operational questions, not just show data. For containerized .NET, the core panels should include:

PanelMetricWhat to look for
GC Heapdotnet_gc_heap_size_bytesGrowth over time → memory pressure
GC Time %dotnet_gc_time_ratio>0.1 indicates GC contention
ThreadPool Queuedotnet_threadpool_queue_lengthSustained >0 means CPU saturation
HTTP P95 latencyhttp_request_duration_secondsSLO violations
Container Memorycontainer_memory_working_set_bytesRising → leaks or heap limits
CPU usagecontainer_cpu_usage_seconds_totalCorrelate with thread pool saturation

Example PromQL for GC time:

avg(rate(dotnet_gc_time_ratio[1m])) by (pod)

When alerts are too sensitive, alert on trends rather than single spikes—e.g., GC heap growth over 10 minutes.

7.5 Load testing harnesses (Bombardier, k6) and experiment templates

To evaluate JIT vs. R2R vs. AOT builds, run controlled load tests that measure cold-start and steady-state throughput. Tools:

  • Bombardier for quick HTTP benchmarks

    bombardier -c 100 -n 10000 http://api/load
  • k6 for scripted scenarios with thresholds

    import http from 'k6/http';
    import { check } from 'k6';
    export default function () {
        let res = http.get('http://api/load');
        check(res, { 'status was 200': (r) => r.status == 200 });
    }
  • ghz for gRPC load testing

Use these in CI/CD to confirm startup and p95 latency improvements translate to production behavior.


8 Putting it together: reference architectures, rollout, and a decision playbook

The last step is combining everything—build modes, trimming, GC tuning, observability—into coherent architectures that you can deploy and evolve safely.

8.1 Reference architecture A: low-latency API on ACA

Target: sub-200 ms cold start, 300 ms P95 under load.

  • Build: Native AOT, trimmed, InvariantGlobalization=true
  • Runtime: Server GC, 65% heap limit
  • Deployment: ACA with minReplicas: 1

Program.cs:

var app = WebApplication.CreateSlimBuilder(args).Build();
app.MapGet("/health", () => "OK");
app.Run();

containerapp.yaml:

scale:
  minReplicas: 1
  maxReplicas: 10
template:
  containers:
  - image: myapi:aot
    env:
    - name: DOTNET_GCHeapHardLimitPercent
      value: "65"

8.2 Reference architecture B: high-throughput gRPC on AKS

Target: sustained 10k RPS, multi-core scaling.

  • Build: R2R with dynamic PGO
  • Runtime: Server GC, heap 75%
  • Mesh: Gateway-only Envoy ingress

publish command:

dotnet publish -c Release -p:PublishReadyToRun=true -p:TieredPGO=1

Kubernetes deployment uses pinned CPU and memory:

resources:
  requests:
    cpu: "2"
    memory: "1Gi"
  limits:
    cpu: "2"
    memory: "1Gi"

Gateway handles TLS termination, avoiding per-pod sidecars.

8.3 Reference architecture C: event-driven job workers as ACA Jobs

  • Build: Native AOT
  • Runtime: Workstation GC, SustainedLowLatency
  • Trigger: Azure Queue or Cron

job.yaml:

scale:
  triggers:
  - type: cron
    metadata:
      schedule: "*/15 * * * *"
template:
  containers:
  - image: jobworker:aot
    env:
    - name: DOTNET_GCHeapHardLimitPercent
      value: "60"

Each job instance starts in under 100 ms and exits quickly after completion, minimizing compute cost.

8.4 Rollout steps: canaries and blue/green

  1. ACA – use revisions: deploy new image as a new revision, direct 10% traffic for 30 minutes, then promote.
  2. AKS – apply blue/green deployments with a temporary service routing.
  3. Observability bake-off – collect GC time %, startup latency, and memory for both versions.
  4. Fallback – pin old revision or rollback deployment if metrics regress.

Example ACA rollout command:

az containerapp revision set-mode --app myapi --mode multiple

8.5 Cost & performance worksheet

Estimate savings by comparing build modes:

MetricJITR2RAOT
Image size180 MB220 MB70 MB
Startup600 ms380 ms80 ms
RSS150 MB130 MB85 MB
Cold start cost (ACA)HighMediumLow

A 100-service cluster converting 50% of workloads to AOT can save dozens of cores and tens of GiB of memory monthly. Always validate with real metrics under load.

8.6 The decision checklist

8.6.1 Feature compatibility

Do you depend on reflection, dynamic proxies, or runtime codegen? Yes → R2R No → Native AOT

8.6.2 Latency sensitivity

If cold starts or P95 latency drive user experience, start with AOT + trimming, and use minReplicas for safety.

8.6.3 Sidecar requirements

If you require mTLS or centralized policies, use gateway-only or selective sidecars. Budget memory before GC tuning.

8.6.4 cgroup version awareness

On cgroup v2 clusters, revalidate heap limits. Set explicit DOTNET_GCHeapHardLimitPercent to avoid unexpected OOMs.

8.6.5 Diagnostics readiness

Ensure metrics and tracing pipelines are configured before rollout. Test with dotnet-counters in staging.

Advertisement