1 Executive summary and reading guide
1.1 What this article covers
This article explores the gap between classical design principles and the realities of cloud-native production — where pods scale up, APIs retry across regions, and observability systems chase p95 latency spikes.
We examine this through two lenses: SOLID (the classical OO design compass) and CUPID (Dan North’s service-oriented heuristics). Our focus: how these principles apply when building resilient microservices with .NET 8/9, ASP.NET Core Minimal APIs, and Microsoft.Extensions.DependencyInjection.
We walk from principle to practice — grounding each concept with real-world examples, refactoring patterns, and code smells.
1.2 Who should read this
This guide targets practitioners comfortable with ASP.NET Core, async programming, and distributed systems basics — Senior Developers refactoring toward microservices, Tech Leads defining cross-service patterns, or Architects translating principles into service boundaries that survive real-world load.
1.3 Tech baseline
All examples assume .NET 8+ with:
- Minimal APIs for lean endpoints and composable pipelines.
- Microsoft.Extensions.DependencyInjection (built-in DI container).
IHttpClientFactorywith Polly v8 resilience pipelines.- System.Threading.Channels and TPL Dataflow for backpressure.
- OpenTelemetry for distributed tracing and metrics.
Supporting libraries: Scrutor (scanning/decorators), FluentValidation, Asp.Versioning, and Serilog.
1.4 How to use this piece
Read linearly once, then use selectively:
- Section 2 gives the mental model shift from SOLID to CUPID.
- Sections 3–7 apply these to production topics: DI, backpressure, idempotency, observability, and versioning.
- Section 8 is an end-to-end “Payments” walkthrough that ties it together.
- Section 9 provides a practical scorecard for code review and go/no-go readiness.
Each section includes “Smell → Refactor” patterns — quick wins that take a service from “works in dev” to “survives in prod.”
2 From SOLID to CUPID: principles through today’s lens
2.1 SOLID refresher — where it still shines
SOLID remains foundational for maintainable software. Here’s a concise recap oriented toward cloud-native concerns.
2.1.1 Single Responsibility Principle (SRP)
Each class should have one reason to change. In services, SRP extends to feature boundaries.
Incorrect — responsibilities crammed:
public class OrderService {
public void CreateOrder(OrderDto dto) { /* ... */ }
public void SendOrderEmail(Order order) { /* ... */ }
public void LogOrderMetrics(Order order) { /* ... */ }
}
Correct — responsibilities composed:
public class OrderService {
private readonly IEmailSender _emails;
private readonly IMetrics _metrics;
public async Task<Order> CreateOrder(OrderDto dto) {
var order = await SaveOrder(dto);
await _emails.SendOrderCreated(order);
_metrics.TrackOrderCreated(order.Id);
return order;
}
}
2.1.2 Open/Closed Principle (OCP)
Open for extension, closed for modification — achieved in .NET through interfaces, DI, and decorators:
services.Decorate<IOrderProcessor, LoggingOrderProcessor>();
OCP shines in reusable libraries but excessive abstraction often violates CUPID’s Composability.
2.1.3 Liskov Substitution (LSP) and Interface Segregation (ISP)
LSP ensures subtypes are substitutable without breaking behavior — essential in polymorphic messages and EF Core entities.
ISP keeps interfaces granular — clients shouldn’t depend on methods they don’t use.
Incorrect:
public interface ICustomerService {
Task<Customer> GetCustomerAsync(Guid id);
Task<List<Order>> GetOrdersAsync(Guid customerId);
Task<bool> SendMarketingEmailAsync(Guid customerId);
}
Correct:
public interface ICustomerQueries {
Task<Customer> GetAsync(Guid id);
}
public interface ICustomerNotifications {
Task SendMarketingEmailAsync(Guid customerId);
}
Simpler testing, clearer domain language.
2.1.4 Dependency Inversion Principle (DIP)
High-level modules depend on abstractions, not details. Realized via constructor injection — but DIP overuse produces “abstraction forests” where no one can trace the real logic.
Takeaway: SOLID shines in library design — domain models, internal utilities, shared packages. At the service boundary, we need principles that value runtime characteristics — latency, throughput, observability, deployability — as much as code structure.
2.2 CUPID — why it’s pragmatic for services
Dan North proposed CUPID as a service-oriented alternative to SOLID for distributed systems, focusing on operational resilience over static purity.
2.2.1 Composable
Systems built from small, reusable blocks that combine cleanly:
app.MapPost("/orders", CreateOrder)
.AddEndpointFilter<ValidationFilter>()
.AddEndpointFilter<AuthorizationFilter>();
Each behavior can be added, removed, or reordered — the system composes.
2.2.2 Unix-like
“Do one thing well.” A service should have a narrow purpose and expose clear contracts (HTTP, gRPC, JSON). It discourages over-coupled services sharing databases or performing multiple unrelated concerns. Easy to reason about, test in isolation, and replace.
2.2.3 Predictable
A production property — how software behaves under stress. A predictable system:
- Handles retries idempotently.
- Propagates cancellation.
- Fails fast when dependencies stall.
- Emits metrics and traces that explain its state.
In .NET: CancellationToken everywhere, timeouts via Polly, structured logs tied to traces.
2.2.4 Idiomatic
Use the language’s natural style. For .NET: async all the way, proper DI, Minimal APIs for simple endpoints.
Incorrect:
public async Task<ActionResult> GetUser() {
return new JsonResult(await _repo.GetUserAsync());
}
Correct:
app.MapGet("/user/{id}", async (Guid id, IUserRepo repo)
=> Results.Ok(await repo.GetUserAsync(id)));
Shorter, clearer, and consistent with the platform.
2.2.5 Domain-based
Design grounded in domain language, not technical layers. Instead of “InfrastructureService” or “HelperManager,” use PaymentAuthorizer, InventoryChecker, OrderPolicyEvaluator. This improves communication and makes boundaries explicit.
Summary:
| CUPID Principle | Core Idea | .NET Application |
|---|---|---|
| Composable | Small, reusable pieces | DI, middleware, minimal APIs |
| Unix-like | Do one thing well | Single-purpose services |
| Predictable | Fail fast, be observable | Polly, timeouts, OTel |
| Idiomatic | Fit native patterns | async/await, DI, Options |
| Domain-based | Speak business language | DDD-style naming, bounded contexts |
2.3 Complementary, not adversarial: a blended mental model
SOLID and CUPID operate at different levels:
- SOLID: structure and change within code units.
- CUPID: behavior and resilience of running services.
A SOLID library can live inside a CUPID service. Your domain layer uses SRP and DIP to stay modular; your API surface uses CUPID for runtime robustness.
Think of SOLID as “how code evolves safely” and CUPID as “how systems behave predictably.” Together, they define design integrity across scales.
2.4 Mapping principles to modern concerns
| Concern | SOLID View | CUPID View | .NET Example |
|---|---|---|---|
| Idempotency | Not addressed | Predictable | POST /payments with Idempotency-Key middleware |
| Backpressure | Not addressed | Composable + Predictable | Channels with bounded capacity |
| Observability | N/A | Predictable | OpenTelemetry spans, structured logs |
| Versioning | OCP (extend, don’t modify) | Domain-based | Asp.Versioning with additive endpoints |
| Resilience | DIP via abstractions | Predictable | Polly v8 pipelines (timeouts → retries → circuit breakers) |
2.4.1 Predictability via Idempotency
A payment service retrying on transient 500s needs more than clean abstractions — it must prevent duplicate charges:
if (await _store.ExistsAsync(key))
return Results.Ok(await _store.GetAsync(key));
var result = await _processor.ProcessAsync(request);
await _store.SaveAsync(key, result);
return Results.Ok(result);
Predictability manifests in production safety, not just code quality.
2.4.2 Composable Backpressure
Instead of ad hoc thread pools, CUPID favors composable primitives:
var channel = Channel.CreateBounded<Order>(
new BoundedChannelOptions(100) { FullMode = BoundedChannelFullMode.DropOldest });
await channel.Writer.WriteAsync(order, token);
2.5 Code smells → refactor (principles level)
2.5.1 Smell: Over-abstracted DI graphs
When every class gets its own interface and DI registration hits 1,000 lines, you’ve overfit SOLID.
Refactor — group by feature module with Scrutor scanning:
public static class OrdersModule {
public static IServiceCollection AddOrders(this IServiceCollection services) {
services.Scan(s => s
.FromAssemblyOf<OrderProcessor>()
.AddClasses()
.AsImplementedInterfaces());
return services;
}
}
Then register once: builder.Services.AddOrders();
2.5.2 Smell: Non-idiomatic code
Incorrect:
public class HttpService {
private static readonly HttpClient _client = new HttpClient();
public string GetData(string url) => _client.GetStringAsync(url).Result; // sync-over-async
}
Correct:
public class HttpService(HttpClient client) {
public async Task<string> GetDataAsync(string url, CancellationToken ct)
=> await client.GetStringAsync(url, ct);
}
Registered via builder.Services.AddHttpClient<HttpService>(); — Idiomatic (async all the way), Predictable (respects cancellation), and Composable (via DI).
CUPID isn’t about replacing SOLID — it’s about tempering it for production reliability.
3 Dependency injection and composition without contortions
3.1 Composition root patterns
Your composition root — typically Program.cs — should register what the app needs, not how it uses it:
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddOrders();
builder.Services.AddPayments();
builder.Services.AddHttpClient<OrderClient>()
.AddResilienceHandler("default");
var app = builder.Build();
app.MapOrderEndpoints();
app.Run();
Each feature exposes its own extension methods. The root stays shallow and each module self-contained.
Root responsibilities:
- Register services, not logic.
- Wire cross-boundary dependencies (external clients, config).
- Configure cross-cutting concerns (resilience, logging, telemetry).
- Avoid runtime I/O — startup work belongs in
IHostedService.
3.2 Keeping registration lean: Scrutor and decorators
3.2.1 Assembly scanning with Scrutor
Scrutor eliminates manual registration repetition:
services.Scan(scan => scan
.FromAssemblyOf<OrderService>()
.AddClasses(classes => classes.InNamespaces("MyApp.Orders"))
.AsImplementedInterfaces()
.WithScopedLifetime());
Apply scanning within known namespaces to avoid pulling in unintended helpers.
3.2.2 Decorators
Decorators add behavior without changing core logic — a direct OCP realization ideal for logging, caching, or metrics:
public class LoggingOrderProcessor : IOrderProcessor {
private readonly IOrderProcessor _inner;
private readonly ILogger<LoggingOrderProcessor> _logger;
public LoggingOrderProcessor(IOrderProcessor inner, ILogger<LoggingOrderProcessor> logger) {
_inner = inner; _logger = logger;
}
public async Task ProcessAsync(Order order) {
_logger.LogInformation("Processing order {Id}", order.Id);
await _inner.ProcessAsync(order);
}
}
Registered using Scrutor:
services.Decorate<IOrderProcessor, LoggingOrderProcessor>();
3.2.3 Bounded lifetimes
Match service lifetimes to context:
- Transient: stateless, fast-constructing objects.
- Scoped: per-request data (repositories, use cases).
- Singleton: configuration, caches, thread-safe clients.
Misaligned lifetimes lead to memory leaks or contention in multi-threaded services.
3.3 Options binding, validation, and anti-patterns
3.3.1 Options pattern with validation
The Options pattern encourages explicit, type-safe config binding with startup validation:
builder.Services.AddOptions<PaymentOptions>()
.BindConfiguration("Payments")
.ValidateDataAnnotations()
.Validate(o => !string.IsNullOrEmpty(o.ApiKey),
"API Key must be provided.")
.ValidateOnStart();
.ValidateOnStart() catches misconfigurations early — critical in auto-scaled clusters where silent failures cascade.
3.3.2 Avoiding the Service Locator anti-pattern
Injecting IServiceProvider to manually resolve dependencies hides the graph and breaks lifetime control.
Incorrect:
public class OrderHandler(IServiceProvider provider) {
public void Handle() {
var repo = provider.GetRequiredService<IOrderRepository>();
repo.Save();
}
}
Correct:
public class OrderHandler(IOrderRepository repo) {
public void Handle() => repo.Save();
}
3.3.3 Async work in constructors
Avoid any async calls in constructors — they execute during DI resolution, blocking startup. Use lifecycle hooks instead:
public class WarmupService : IHostedService {
private readonly ICache _cache;
public WarmupService(ICache cache) => _cache = cache;
public async Task StartAsync(CancellationToken ct) => await _cache.PreloadAsync(ct);
public Task StopAsync(CancellationToken _) => Task.CompletedTask;
}
3.4 Minimal APIs vs MVC: when each fits
3.4.1 Minimal APIs: composable and concise
Minimal APIs favor function-level composition, aligning with CUPID’s Composable and Unix-like principles:
app.MapPost("/orders", async (OrderDto dto, IOrderProcessor proc)
=> Results.Created($"/orders/{dto.Id}", await proc.CreateAsync(dto)))
.AddEndpointFilter<ValidationFilter>();
Ideal for microservices, gateways, and focused endpoints.
3.4.2 MVC: structure for complex domains
MVC controllers suit domains with deep validation pipelines, rich routing conventions, and many actions sharing filters. Choose MVC when the domain warrants the ceremony.
3.4.3 Problem Details
For both paradigms, return RFC 9457 Problem Details for consistent, machine-readable error responses:
app.UseExceptionHandler(errorApp => {
errorApp.Run(async context => {
var problem = new ProblemDetails {
Status = 500,
Title = "An unexpected error occurred",
Detail = "Please contact support with the trace ID."
};
context.Response.ContentType = "application/problem+json";
await context.Response.WriteAsJsonAsync(problem);
});
});
3.5 Code smells → refactor
3.5.1 Smell: 1,500-line AddServices
Refactor: Split into feature modules with Scrutor scanning:
builder.Services.AddOrders();
builder.Services.AddPayments();
builder.Services.AddReporting();
3.5.2 Smell: Sync-over-async during startup
Symptom: .Result or .Wait() in constructors, blocking thread pool threads.
Refactor: Move initialization to an IHostedService or startup initializer.
4 Backpressure and streaming pipelines that don’t melt down
4.1 Why backpressure matters
Cloud-native services don’t fail because of slow code — they fail because they accept more work than they can handle. Without limits: memory spikes from unbounded queues, thread starvation, and latency outliers that kill tail performance.
// Dangerous: unbounded fire-and-forget
await foreach (var msg in consumer.ReadAllAsync(ct)) {
_ = ProcessAsync(msg);
}
A bounded channel fixes the problem.
4.2 Choosing primitives: Channels vs TPL Dataflow
4.2.1 System.Threading.Channels
High-performance producer/consumer pipelines with built-in backpressure:
var channel = Channel.CreateBounded<Job>(
new BoundedChannelOptions(100) { FullMode = BoundedChannelFullMode.Wait });
async Task Producer() {
while (await GetNextJobAsync() is { } job)
await channel.Writer.WriteAsync(job);
}
async Task Consumer() {
await foreach (var job in channel.Reader.ReadAllAsync())
await ProcessJobAsync(job);
}
When capacity (100) is reached, WriteAsync naturally applies backpressure.
4.2.2 TPL Dataflow
Higher-level blocks with built-in linking and buffering for multi-stage pipelines:
var block = new TransformBlock<Order, Payment>(
async order => await AuthorizePayment(order),
new ExecutionDataflowBlockOptions {
BoundedCapacity = 200,
MaxDegreeOfParallelism = 8
});
var publish = new ActionBlock<Payment>(p => PublishAsync(p));
block.LinkTo(publish, new DataflowLinkOptions { PropagateCompletion = true });
| Scenario | Recommended Primitive |
|---|---|
| Single-producer / single-consumer | System.Threading.Channels |
| Complex multi-stage pipelines | TPL Dataflow |
Channels are faster; Dataflow gives richer coordination.
4.3 HTTP/gRPC streaming with IAsyncEnumerable
4.3.1 Server-side streaming with NDJSON
app.MapGet("/orders/stream", async (HttpResponse response, IOrderFeed feed, CancellationToken ct) => {
response.ContentType = "application/x-ndjson";
await foreach (var evt in feed.StreamAsync(ct)) {
await response.WriteAsync(JsonSerializer.Serialize(evt) + "\n", ct);
await response.Body.FlushAsync(ct);
}
});
The same pattern applies to gRPC server streaming via IServerStreamWriter<T>.
4.3.2 Cancellation as control flow
Cancellation is not a failure — it’s control flow. Always propagate CancellationToken:
public async IAsyncEnumerable<OrderEvent> StreamAsync([EnumeratorCancellation] CancellationToken ct) {
while (!ct.IsCancellationRequested) {
yield return await _queue.DequeueAsync(ct);
}
}
4.4 Load shedding and fairness
4.4.1 Channel drop strategies
BoundedChannelFullMode provides four options: Wait, DropNewest, DropOldest, and DropWrite. Choose based on data freshness — DropOldest works well for telemetry where latest data matters most.
4.4.2 Rate limiting for fairness
builder.Services.AddRateLimiter(options =>
options.AddFixedWindowLimiter("api", o => {
o.PermitLimit = 10;
o.Window = TimeSpan.FromSeconds(1);
}));
app.UseRateLimiter();
This prevents noisy neighbors from monopolizing resources.
4.4.3 Async producers respecting backpressure
Never spin. Always use async APIs that yield when capacity is reached:
while (await channel.Writer.WaitToWriteAsync(ct)) {
var item = await GetNextJobAsync(ct);
await channel.Writer.WriteAsync(item, ct);
}
4.5 Code smells → refactor
4.5.1 Smell: Unbounded queues
Symptom: ConcurrentQueue<T> or BlockingCollection<T> without limits.
Refactor: Replace with Channel.CreateBounded<T> or Dataflow with explicit capacity.
4.5.2 Smell: Spin-writing producers
Symptom: Producers loop endlessly while consumers lag.
Refactor: Use async flow control (WriteAsync, SendAsync).
5 Idempotency and resilience where money moves
5.1 HTTP semantics: when POST must be idempotent
Idempotency means executing the same operation multiple times produces the same result as once. While PUT and DELETE are naturally idempotent, production APIs often need idempotent POST — especially under retries. A retry after a timeout shouldn’t double-charge a customer.
Natural idempotent PUT:
app.MapPut("/accounts/{id}/email", async (Guid id, string email, IAccountRepo repo) => {
var account = await repo.GetAsync(id);
account.Email = email;
await repo.UpdateAsync(account);
return Results.NoContent();
});
Re-running the same call yields the same state. But a POST /payments that charges a card is not idempotent by default — that’s where the Idempotency-Key pattern comes in.
5.2 Idempotency-Key pattern
The client provides a unique key per logical operation; the server persists the first successful result and replays it on retries.
Flow: Client sends Idempotency-Key header → server checks store → if found, replay response → if not, execute and persist outcome.
public class IdempotencyMiddleware {
private readonly RequestDelegate _next;
private readonly IIdempotencyStore _store;
public IdempotencyMiddleware(RequestDelegate next, IIdempotencyStore store) {
_next = next; _store = store;
}
public async Task InvokeAsync(HttpContext ctx) {
var key = ctx.Request.Headers["Idempotency-Key"].ToString();
if (string.IsNullOrWhiteSpace(key)) {
await _next(ctx);
return;
}
var existing = await _store.GetAsync(key);
if (existing is not null) {
ctx.Response.StatusCode = existing.StatusCode;
await ctx.Response.WriteAsync(existing.Body);
return;
}
using var buffer = new MemoryStream();
var original = ctx.Response.Body;
ctx.Response.Body = buffer;
await _next(ctx);
buffer.Position = 0;
var body = await new StreamReader(buffer).ReadToEndAsync();
await _store.SaveAsync(key, ctx.Response.StatusCode, body);
buffer.Position = 0;
await buffer.CopyToAsync(original);
}
}
To prevent key misuse, hash the request body and store { key, hash, response } with TTLs for natural expiration.
5.3 Outbox/inbox for once-effect semantics
When APIs publish messages after writing to a database, the Outbox pattern ensures both succeed atomically by persisting outbound messages in the same transaction.
public async Task AuthorizePayment(Payment payment, CancellationToken ct) {
_db.Payments.Add(payment);
_db.OutboxMessages.Add(new OutboxMessage {
Type = "PaymentAuthorized",
Payload = JsonSerializer.Serialize(payment)
});
await _db.SaveChangesAsync(ct);
}
A background worker reads the outbox and publishes reliably:
public class OutboxPublisher : BackgroundService {
protected override async Task ExecuteAsync(CancellationToken ct) {
while (!ct.IsCancellationRequested) {
var messages = await _db.OutboxMessages
.OrderBy(m => m.Created).Take(50).ToListAsync(ct);
foreach (var msg in messages) {
await _bus.PublishAsync(msg.Type, msg.Payload, ct);
_db.OutboxMessages.Remove(msg);
}
await _db.SaveChangesAsync(ct);
await Task.Delay(1000, ct);
}
}
}
For inbound messages, an Inbox table of processed message IDs prevents double-handling:
if (await _db.Inbox.AnyAsync(x => x.MessageId == msg.Id))
return;
await HandleAsync(msg);
_db.Inbox.Add(new InboxMessage { MessageId = msg.Id });
await _db.SaveChangesAsync();
Together, outbox and inbox form a reliable bridge across distributed boundaries.
5.4 Resilience essentials with Polly v8
Resilience isn’t about retrying everything — it’s about failing fast, retrying safely, and isolating damage.
5.4.1 Timeouts first
Always start with explicit timeouts before retries:
builder.Services.AddHttpClient("Payments", client => client.BaseAddress = new("https://bank"))
.AddResilienceHandler("default", builder => {
builder.AddTimeout(TimeSpan.FromSeconds(3));
});
5.4.2 Retry with jitter
Retries should be limited, jittered, and only for transient errors:
builder.AddRetry(new RetryStrategyOptions<HttpResponseMessage> {
ShouldHandle = e => e.Result?.StatusCode is >= HttpStatusCode.InternalServerError,
Delay = TimeSpan.FromSeconds(0.5),
MaxRetryAttempts = 3,
BackoffType = DelayBackoffType.Exponential,
UseJitter = true
});
5.4.3 Circuit breakers
Prevent cascading failures by tripping after repeated errors:
builder.AddCircuitBreaker(new CircuitBreakerStrategyOptions<HttpResponseMessage> {
FailureRatio = 0.5,
SamplingDuration = TimeSpan.FromSeconds(30),
MinimumThroughput = 10,
BreakDuration = TimeSpan.FromSeconds(15)
});
5.4.4 Bulkheads, retry budgets, and hedging
Bulkheads limit concurrency per dependency. Retry budgets cap total retries per time window to prevent storms. Hedging creates parallel requests to mitigate tail latency — use sparingly, only on idempotent read endpoints.
Order of policies matters: timeout → retry → circuit breaker → bulkhead.
5.5 Code smells → refactor
5.5.1 Smell: Duplicate payments on transient errors
Symptom: Retries of POST requests trigger repeated side effects. Refactor: Idempotency-Key middleware + persistent outcome store + per-client Polly pipelines.
5.5.2 Smell: Global catch-all retry
Symptom: One global retry handler wraps all HTTP calls, including non-idempotent endpoints. Refactor: Per-client pipelines; guard POST retries unless the endpoint is idempotent.
6 Observability that shortens MTTR
6.1 Traces, metrics, logs — golden signals and SLOs
Each pillar answers a distinct question:
- Traces: Where is latency introduced?
- Metrics: How is the system performing over time?
- Logs: What specifically happened?
Instrument around golden signals — latency, traffic, errors, saturation — and pair with SLOs:
- 99.9% of POST /payments complete < 500ms
- Queue backlog < 1000 messages for >95% of intervals
6.2 OpenTelemetry in .NET
6.2.1 Auto-instrumentation
Most ASP.NET Core, HTTP, and EF Core operations are auto-instrumented:
builder.Services.AddOpenTelemetry()
.WithTracing(t => t.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddEntityFrameworkCoreInstrumentation())
.WithMetrics(m => m.AddRuntimeInstrumentation()
.AddHttpClientInstrumentation())
.UseOtlpExporter();
6.2.2 Manual spans and context propagation
For domain logic, create spans manually:
using var activity = MyActivitySource.StartActivity("AuthorizePayment");
await _gateway.ChargeAsync(request, cancellationToken);
activity?.SetTag("payment.amount", request.Amount);
6.2.3 Exemplars
Link sample traces to metrics for fast root cause analysis:
counter.Add(1, new("traceId", Activity.Current?.TraceId.ToString()));
Exemplars let you jump from dashboards directly into detailed traces.
6.3 Local and production observability
.NET Aspire simplifies local observability with an OTel-powered dashboard for inner-loop development. In production, forward OTLP data to:
- Tempo / Jaeger for traces
- Prometheus for metrics
- Honeycomb / New Relic for unified observability
6.4 Dashboards and alerting
Dashboards should mirror user tasks, not infrastructure layers. Use the RED method (Rate, Errors, Duration) for APIs and USE (Utilization, Saturation, Errors) for infrastructure.
Alert on SLO burn rates, not single-instance spikes:
if error_rate > 0.5% for 5m AND > 0.1% for 1h → alert
6.5 Code smells → refactor
6.5.1 Smell: Logs with no correlation
Symptom: Each service logs independently with no trace context.
Refactor: Use Activity and structured logging:
using var activity = _activitySource.StartActivity("ProcessOrder");
_logger.LogInformation("Processing order {OrderId} trace={TraceId}",
order.Id, activity?.TraceId);
6.5.2 Smell: Per-service silos
Symptom: Traces stop at service boundaries.
Refactor: Propagate trace context via headers and name spans consistently (payments.process, orders.create).
7 Versioning and evolvable contracts
7.1 Strategy palette
APIs are contracts, and contracts evolve. The rule: add fields, don’t rename or remove them.
| Strategy | Example | Use case |
|---|---|---|
| URI | /v2/orders | Simple REST APIs |
| Header | x-api-version: 2.0 | Gateways, multiple clients |
| Media-type | application/vnd.app.orders.v2+json | Fine-grained control |
Define compatibility envelopes — clients should tolerate unknown fields and default values.
7.2 Minimal APIs with Asp.Versioning
var versionSet = app.NewApiVersionSet()
.HasApiVersion(1.0)
.HasApiVersion(2.0)
.ReportApiVersions()
.Build();
app.MapGroup("v{version:apiVersion}/orders")
.WithApiVersionSet(versionSet)
.MapGet("/", GetOrders)
.MapPost("/", CreateOrder);
Mark version-neutral endpoints:
app.MapGet("/health", () => Results.Ok("ok"))
.IsApiVersionNeutral();
7.3 Protocol evolution
gRPC/Protobuf rules:
- Never reuse field numbers — mark them
reserved. - Assign defaults for new fields.
- Don’t change types of existing fields.
JSON APIs: Ignore unknown properties during deserialization. Make new fields nullable to avoid breaking older clients.
7.4 API gateways: YARP and APIM
YARP routes per version and enables canary releases:
"Routes": {
"v1": { "ClusterId": "OrdersV1", "Match": { "Path": "/v1/orders/{**catch-all}" } },
"v2": { "ClusterId": "OrdersV2", "Match": { "Path": "/v2/orders/{**catch-all}" } }
}
Azure APIM enforces version-specific quotas and transforms at the edge.
7.5 Contract tests
Use contract tests to validate compatibility across versions:
[Fact]
public void EnsureProviderHonoursConsumerPact() {
new PactVerifier("orders")
.ServiceProvider("Orders API", "http://localhost:5000")
.PactUri("pacts/consumer-orders.json")
.Verify();
}
Replay harnesses storing real requests and responses complement contract tests for regression coverage.
7.6 Code smells → refactor
7.6.1 Smell: In-place field rename
Symptom: A renamed JSON field breaks existing clients. Refactor: Add the new field, dual-write temporarily, deprecate the old one.
7.6.2 Smell: “Latest only” clients
Symptom: Clients only work with the newest API version. Refactor: Maintain a version matrix and return sunset headers:
context.Response.Headers.Add("Sunset", "Wed, 01 Jan 2026 00:00:00 GMT");
context.Response.Headers.Add("Link", "</v3>; rel=\"successor-version\"");
8 End-to-end slice: Orders & Payments
8.1 Requirements and constraints
Every design principle becomes real when applied across boundaries. This section walks through an Orders & Payments subsystem that unifies the patterns from Sections 3–7 into a cohesive, production-grade design.
- Throughput: 2,000 payment authorizations/sec sustained, burst to 5,000.
- Latency: p95 < 300 ms under nominal load.
- Quotas: 100 req/s and 10 concurrent authorizations per tenant.
- SLOs: 99.9% of payments complete within 500 ms; 99.99% processed exactly once.
8.2 Public API
The API layer applies CUPID’s Composable and Predictable principles: simple routes, strong validation, and structured errors.
8.2.1 Idempotent POST /payments
app.MapPost("/payments", async (
PaymentRequest request,
HttpContext ctx,
IPaymentService service,
CancellationToken ct) =>
{
var key = ctx.Request.Headers["Idempotency-Key"].ToString();
if (string.IsNullOrEmpty(key))
return Results.Problem("Missing Idempotency-Key", statusCode: 400);
return Results.Ok(await service.AuthorizeAsync(key, request, ct));
})
.AddEndpointFilter<ValidationFilter<PaymentRequest>>()
.AddEndpointFilter<AuthorizationFilter>();
The AuthorizeAsync method checks the idempotency store, executes on cache miss, and persists the result (applying the pattern from Section 5.2).
8.2.2 Streaming GET /orders/{id}/events
app.MapGet("/orders/{id}/events", async (Guid id, IOrderFeed feed, HttpResponse resp, CancellationToken ct) =>
{
resp.ContentType = "application/x-ndjson";
await foreach (var evt in feed.StreamAsync(id, ct))
{
await resp.WriteAsync(JsonSerializer.Serialize(evt) + "\n", ct);
await resp.Body.FlushAsync(ct);
}
});
8.3 Internal pipeline
After validation, payment requests are enqueued into a bounded channel (Section 4.2) that decouples ingestion from processing:
var paymentChannel = Channel.CreateBounded<PaymentRequest>(
new BoundedChannelOptions(500) { FullMode = BoundedChannelFullMode.Wait });
A worker consumes with proper DI scoping:
await foreach (var req in paymentChannel.Reader.ReadAllAsync())
{
using var scope = scopeFactory.CreateScope();
var handler = scope.ServiceProvider.GetRequiredService<PaymentHandler>();
await handler.HandleAsync(req);
}
For multi-stage flows (validation → gateway → persistence → publish), TPL Dataflow provides richer topologies with per-block bounded buffers.
8.4 Outbound resilience
External dependencies use IHttpClientFactory with layered Polly v8 policies:
builder.Services.AddHttpClient<BankClient>(client =>
client.BaseAddress = new Uri("https://bank-api"))
.AddResilienceHandler("bank", builder =>
{
builder.AddTimeout(TimeSpan.FromSeconds(3))
.AddRetry(new() {
MaxRetryAttempts = 3,
Delay = TimeSpan.FromMilliseconds(200),
BackoffType = DelayBackoffType.Exponential,
UseJitter = true,
ShouldHandle = e => e.Result?.StatusCode >= HttpStatusCode.InternalServerError
})
.AddCircuitBreaker(new() {
FailureRatio = 0.5,
BreakDuration = TimeSpan.FromSeconds(15),
MinimumThroughput = 20
})
.AddConcurrencyLimiter(10);
});
Each client gets its own retry budget for predictable global capacity. Calls propagate cancellation tokens from incoming requests for end-to-end timeouts.
8.5 Messaging and delivery
Payment authorization uses the Outbox pattern (Section 5.3) to ensure PaymentAuthorized events reach the Orders service exactly once. The flow: business write + outbox message in one transaction → background relay publishes to broker → consumers deduplicate via Inbox.
Failed messages land in a Dead-Letter Queue (DLQ), where a periodic reconciliation job replays them after transient issues resolve. Predictability improves because message loss is measurable and recoverable.
8.6 Observability across the stack
Every layer emits correlated telemetry via OpenTelemetry:
using var activity = _activitySource.StartActivity("Payment.Authorize");
activity?.SetTag("payment.amount", req.Amount);
var result = await _bankClient.AuthorizeAsync(req, ct);
Activity context flows automatically through IHttpClientFactory and MassTransit. Consumers in the Orders service attach to the parent trace, producing a contiguous view across HTTP → processing → messaging.
Custom histograms with exemplars enable dashboard drill-down into the slowest traces:
PaymentLatency.Observe(sw.Elapsed.TotalMilliseconds,
new("traceId", Activity.Current?.TraceId.ToString()));
8.7 Edge layer
YARP manages per-version traffic and canary rollouts. Azure APIM enforces tenant quotas and idempotency at the edge — the first layer of backpressure and predictability (Section 7.4).
8.8 Failure drills
No architecture is complete until tested under failure. These drills validate that CUPID’s Predictable trait holds:
- Latency spikes: Inject delay into BankClient. Verify circuit breakers open and the API returns
503quickly instead of stalling threads. - Quota exceeded: Synthetic load exceeding tenant limits. Confirm
429responses and channel backpressure prevents CPU runaway. - Partial outages: Kill a broker partition or database node. The Outbox queues messages until recovery without data loss.
Success is measured by graceful degradation — the service slows down but never lies or corrupts data.
9 Architect’s review scorecard and quick-reference catalog
9.1 CUPID × production rubric (0–3)
| Principle | 0 | 1 | 2 | 3 |
|---|---|---|---|---|
| Composable | Hard-coded dependencies | Some modularization | Clear feature modules | Reusable pipelines & modules |
| Unix-like | Multi-purpose service | Mixed concerns | Single domain, coarse-grained | Does one thing extremely well |
| Predictable | Unbounded retries, no TOs | Basic resilience | Full timeout/retry/circuit | Proven under chaos drills |
| Idiomatic | Framework abuse | Partial patterns | Async & DI correct | Minimal, native constructs |
| Domain-based | Technical naming | Some domain nouns | Consistent domain terms | Bounded contexts, ubiquitous language |
9.2 Production concerns rubric (0–3)
| Concern | 0 | 1 | 2 | 3 |
|---|---|---|---|---|
| Idempotency | None | Manual dedupe | Key-based store | Verified end-to-end once-effect |
| Backpressure | Unbounded | Thread limits | Channel/Dataflow limits | Proven stability under load |
| Observability | Logs only | Partial metrics | Full OTel traces | Dashboards + exemplars + SLOs |
| Versioning | Breaking changes | Manual branches | Route/header versioning | Automated compatibility testing |
| Resilience | No policies | Basic retry | Timeout + retry + circuit | Policy budget + chaos verified |
| Delivery | Fire-and-forget | Partial outbox | Full outbox/inbox | Reconciliation + DLQ automation |
A mature service consistently scores 2–3 across both rubrics.
9.3 Go/no-go guidance
CUPID average ≥ 2.5 AND Production average ≥ 2.5 → ready for scale
1.5–2.4 → limited release / canary
<1.5 → refactor before rollout
Each dimension must be evidenced (code, metrics, test results), not asserted.
9.4 Code smells → refactor crib sheet
| Smell | Refactor |
|---|---|
1,500-line AddServices | Feature modules + Scrutor scanning |
| Sync-over-async startup | IHostedService initializer |
| Unbounded queues | Channel.CreateBounded or Dataflow |
| Duplicate POST effects | Idempotency middleware |
| Catch-all retries | Polly per-client pipeline |
| Uncorrelated logs | OTel Activity + structured logging |
| Field renames breaking clients | Additive change + deprecation headers |
9.5 Library cheat-sheet
| Category | Library | Defaults |
|---|---|---|
| Resilience | Polly v8 | timeout → retry → breaker → bulkhead |
| Observability | OpenTelemetry .NET | Auto-instrument ASP.NET, HTTP, EF |
| Local dev | .NET Aspire | Inner-loop tracing, metrics, logs |
| DI | Scrutor | Assembly scanning + decorators |
| Validation | FluentValidation | Request + options validation |
| Versioning | Asp.Versioning | Minimal + MVC support |
| Gateway | YARP | Canary routes, per-version routing |
| Messaging | MassTransit / Dapr | Outbox, sagas, integration patterns |
| Streaming | Channels, TPL Dataflow | In-process backpressure |
| Errors | ProblemDetails middleware | RFC 9457 compliant |