Skip to content
The Distributed Transaction Masterclass: 2PC, Saga, and TCC Patterns with Spring Cloud, Temporal, and Apache Seata

The Distributed Transaction Masterclass: 2PC, Saga, and TCC Patterns with Spring Cloud, Temporal, and Apache Seata

1 Introduction: The Inevitable Transaction Problem in Microservices

Every architect who has moved from monoliths to microservices faces the same moment of reckoning: the first time a distributed transaction fails mid-flight. An order gets placed, the payment succeeds, but inventory never updates. The system now contains a ghost order that can’t be fulfilled or refunded. It’s not a bug in a single service — it’s a systemic failure of coordination.

Distributed transactions are where elegant service boundaries meet harsh reality. They are also the best test of whether a team truly understands distributed systems. In this masterclass, we’ll take an end-to-end look at how modern Java-based systems handle this challenge — from the theoretical roots in ACID and 2PC to practical, production-grade implementations using Spring Cloud, Temporal.io, and Apache Seata.

1.1 The “All-or-Nothing” Promise: Why Monolithic ACID Transactions Are a “Paradise Lost”

In a monolithic architecture, ACID transactions make life easy. Within a single relational database, the transaction boundary is clear, the commit is atomic, and consistency is guaranteed. Developers write code like this and sleep soundly:

@Transactional
public void placeOrder(Order order) {
    orderRepository.save(order);
    paymentService.charge(order.getPaymentDetails());
    inventoryService.reserve(order.getItems());
}

As long as everything runs in the same JVM and database, the @Transactional annotation wraps all of it in a single, atomic unit of work. If any call fails, everything rolls back automatically.

That’s the “all-or-nothing” paradise — Atomicity, Consistency, Isolation, Durability — perfectly enforced by the database engine.

But as systems evolve, databases split, services specialize, and synchronous calls turn into HTTP or gRPC requests. Suddenly, one business operation touches multiple systems: an order service, a payment gateway, an inventory database, and a shipping microservice. Each has its own schema and transaction boundary.

Now, the simple ACID contract doesn’t hold anymore. The @Transactional annotation no longer protects you across services. If a failure happens halfway through, you’re left with a partial, inconsistent state.

That is the paradise lost — and it’s where distributed transaction patterns begin.

1.2 The New Reality: When a Single Business Process Spans Multiple Services, Databases, and Queues

Let’s look at a real-world workflow: placing an online order.

  1. The customer places an order (Order Service → Database A).
  2. Payment is processed (Payment Service → Database B or external API).
  3. Inventory is updated (Inventory Service → Database C).
  4. A shipping request is created (Shipping Service → Database D + message queue).

Each of these operations is a local transaction within its own context. But together, they must form a single logical business transaction: either all succeed, or none do.

Unfortunately, each hop introduces failure domains:

  • A payment API might be temporarily unavailable.
  • A message might be dropped or duplicated.
  • A downstream service might commit while another fails.

Networks fail, messages are delayed, and retries can cause duplicate events. The naive approach — retry until success — often leads to worse problems like double charging or overselling inventory.

This is why distributed transactions are among the hardest problems in system design. You must guarantee consistency without tight coupling or blocking behavior. The database can’t save you anymore; the logic must move to the application layer.

1.3 The Core Challenge: Consistency Without Coupling

At the architectural level, we want three things simultaneously:

  1. Consistency — business invariants must hold (no double charge, no lost inventory).
  2. Availability — services must remain responsive even if some parts are down.
  3. Scalability and Independence — teams can deploy and evolve services independently.

Unfortunately, as the CAP theorem reminds us, you can’t have all three simultaneously. Under network partitions (which always happen), you must choose between consistency and availability.

The art of distributed transactions is in balancing these trade-offs.

In traditional ACID systems, consistency is enforced synchronously — the database ensures correctness before returning control. In distributed systems, we often shift to asynchronous consistency — allowing temporary divergence with guarantees that the system will converge to a consistent state later.

This means your transaction logic must not just commit data but also include compensation, retries, and reconciliation — mechanisms that make eventual consistency predictable.

1.4 The Spectrum of Solutions: 2PC, Saga, and Try-Confirm-Cancel (TCC)

Over the decades, three main strategies have emerged for handling distributed transactions:

  1. Two-Phase Commit (2PC): A classic protocol where a central coordinator ensures that all participants either commit or roll back together. It provides strong consistency but introduces blocking behavior and a single point of failure.

  2. Saga Pattern: A long-lived transaction composed of multiple local transactions, each with a corresponding compensating action. Instead of blocking, the system moves forward optimistically and undoes previous steps if something fails. This pattern can be implemented via:

    • Choreography: Event-driven, decentralized control.
    • Orchestration: Centralized coordination through a workflow engine.
  3. Try-Confirm-Cancel (TCC): A more granular, service-level protocol that resembles a “distributed 3-phase commit.” Each participant exposes three methods — try, confirm, and cancel — allowing fine-grained control over resource reservation and compensation.

Each of these patterns represents a trade-off between consistency guarantees, performance, and complexity. The key is knowing which one fits your domain and operational model.

1.5 Our Toolkit: Spring Cloud, Temporal.io, and Apache Seata

To make these concepts tangible, we’ll use modern Java tooling that supports distributed transaction coordination out of the box:

  • Spring Cloud Stream & Spring Boot: Ideal for implementing event-driven Sagas and integrating with message brokers like Kafka or RabbitMQ.

  • Temporal.io: A workflow orchestration platform built for reliability. It transparently persists state, retries failed operations, and offers durable execution semantics — perfect for orchestrated Sagas.

  • Apache Seata: A mature, open-source distributed transaction framework in the Spring ecosystem supporting 2PC (XA), TCC, and Saga modes. It brings explicit transactional control to Java microservices without needing a heavy external orchestrator.

By the end of this masterclass, we’ll not only understand these patterns conceptually but implement them in code, showing when and how each approach shines.


2 The Foundation: From Strict ACID to Eventual BASE

Distributed transactions don’t start with code — they start with understanding the shift from ACID to BASE. Before diving into Sagas or TCC, you need a mental model of what’s possible (and what’s not) in a distributed world.

2.1 Why 2PC (and XA) Isn’t the Default

2.1.1 How 2PC Works: The “Prepare” and “Commit” Phases

Two-Phase Commit (2PC) is the canonical solution for ensuring atomicity across multiple resources (like databases or message queues). It introduces a Transaction Coordinator that manages all participants.

The flow looks like this:

  1. Prepare Phase:

    • The coordinator sends a prepare request to each participant.
    • Each participant executes the local transaction but doesn’t commit yet. Instead, it locks resources and writes a “ready to commit” record.
    • If all participants respond “OK,” the coordinator moves to phase two.
  2. Commit Phase:

    • The coordinator sends a commit command.
    • Each participant commits the transaction and releases locks.

If any participant responds with a “NO” or times out, the coordinator sends a rollback to all, and they discard their changes.

This protocol guarantees atomicity — either all commit or none do — but it comes with heavy costs.

2.1.2 The Coordinator Bottleneck: Single Point of Failure

In 2PC, the Transaction Coordinator (TC) becomes a critical dependency. If the coordinator crashes after the prepare phase but before commit, all participants remain in limbo, holding locks indefinitely. This “uncertain” state requires manual recovery or heuristic decisions that can break consistency.

Furthermore, because the coordinator must track every participant and their prepared state, it introduces performance and availability risks. Any network partition can freeze the entire transaction.

In distributed environments with many services and databases, the TC can quickly become a scalability bottleneck and operational nightmare.

2.1.3 The Problem with Blocking: Locks Across the Network

The fatal flaw of 2PC in microservices isn’t just coordination — it’s blocking.

When a service “prepares” a transaction, it holds database locks (row or table level) until the commit phase completes. In a system where each phase involves multiple remote calls across the network, these locks can remain open for hundreds of milliseconds or even seconds.

That means:

  • Other transactions can’t modify those rows.
  • Deadlocks and contention skyrocket.
  • Throughput collapses as concurrency increases.

Microservices, which rely on autonomy and speed, simply can’t afford to hold locks waiting for cross-service confirmations. That’s why XA and JTA — while academically correct — are practically unusable for high-scale distributed systems.

2.1.4 When to (Rarely) Use It

2PC still has niche use cases where it shines:

  • Homogeneous environments — all participants share the same database technology (e.g., multiple schemas in Oracle).
  • Low-latency networks — within a single data center with strong consistency requirements.
  • Short-lived transactions — where locking cost is negligible.
  • Legacy systems — where XA/JTA integration already exists and the transaction scope is small.

But in modern microservices with polyglot persistence, async messaging, and network variability, 2PC is the exception, not the rule.

2.2 Embracing the BASE Model

If 2PC enforces strict ACID, modern distributed systems operate under a more flexible model known as BASE.

2.2.1 Basically Available, Soft State, Eventually Consistent

BASE is not the opposite of ACID but a pragmatic adaptation for large-scale distributed systems.

  • Basically Available: The system guarantees availability — it responds even under partial failures.

  • Soft State: The system’s state may be temporarily inconsistent or “in flux.”

  • Eventually Consistent: Given enough time and communication, all replicas (or services) will converge to a consistent state.

Instead of enforcing immediate consistency, we design systems that heal themselves — automatically or via compensating actions.

This approach trades strong consistency for higher availability and scalability. The key is that business-level correctness (not data-level atomicity) drives design decisions.

2.2.2 Shifting the Architect Mindset: From Preventing Inconsistency to Managing It

This shift is mostly psychological. As architects, we must accept that inconsistency will happen — delayed messages, retries, partial updates. The goal is not to prevent it but to manage it predictably.

That means:

  • Designing idempotent operations that can safely repeat.
  • Adding compensating transactions that can undo previous actions.
  • Building reconciliation jobs that periodically verify and fix divergence.

For example, in an order system:

  • The Payment service might confirm a charge but fail to notify Order.
  • Later, a reconciliation job detects an order marked “pending” but with a successful charge and triggers fulfillment.

By embracing eventual consistency, we move complexity out of the database and into explicit, controllable workflows.

2.2.3 Introducing Compensating Transactions

A compensating transaction is the distributed equivalent of a rollback. But unlike database rollbacks, it’s explicit and asynchronous.

If you reserve inventory and later fail payment, you don’t “rollback” the inventory row — you issue a compensating command like releaseInventory(orderId).

Compensation must be:

  • Idempotent — safe to run multiple times.
  • Logically inverse — semantically undoes the business effect.
  • Observable — visible in logs and monitoring to support auditing.

This idea underpins both Saga and TCC patterns. You can’t rely on the database to undo changes; you must code the undo logic yourself.


3 Pattern Deep Dive 1: The Saga Pattern

The Saga pattern is the workhorse of distributed consistency in microservices. It’s elegant, battle-tested, and works well for business workflows that can tolerate eventual consistency.

3.1 The Saga Defined

A Saga is a sequence of local transactions. Each local transaction updates its own database and triggers the next one through an event or orchestrator call.

If one transaction fails, previously completed ones are undone by running their compensating transactions.

Let’s illustrate this with an example:

  1. OrderService creates an order → emits OrderCreated.
  2. PaymentService charges the customer → emits PaymentCompleted.
  3. InventoryService reserves items → emits InventoryReserved.
  4. ShippingService schedules delivery → emits ShippingScheduled.

If step 3 fails (inventory shortage), the saga engine or choreography triggers compensation:

  • PaymentService issues a refund.
  • OrderService marks the order as “Failed.”

Each service owns its data and logic; the saga provides the coordination.

3.2 The Achilles’ Heel: Handling Failure and Compensation

Failure handling is where most naive saga implementations collapse. Imagine this sequence:

  1. Order created successfully.
  2. Payment charged successfully.
  3. Inventory reservation fails.

Without compensation, you’ve charged the customer without shipping the product.

The Saga pattern solves this by defining compensating actions for every forward step:

StepForward ActionCompensating Action
OrderCreate order recordCancel order
PaymentCharge customerRefund customer
InventoryReserve stockRelease stock
ShippingSchedule shipmentCancel shipment

These compensating actions are independent transactions — not true rollbacks — and must be explicitly invoked by the Saga coordinator or through event-driven choreography.

3.3 Approach 1: Choreography (The “Event-Driven” Dance)

3.3.1 How It Works

In choreographed sagas, there’s no central brain. Each service listens for specific events and reacts by performing its local action and emitting the next event.

For example:

// OrderService publishes event
eventPublisher.publish("order.created", new OrderCreatedEvent(orderId));

// PaymentService listens
@StreamListener("order.created")
public void handleOrderCreated(OrderCreatedEvent event) {
    paymentService.charge(event.getOrderId());
    eventPublisher.publish("payment.completed", new PaymentCompletedEvent(event.getOrderId()));
}

The flow emerges organically through event subscriptions — much like microservices dancing in rhythm.

3.3.2 Implementation with Spring Cloud Stream

Spring Cloud Stream simplifies this model by abstracting message brokers (Kafka, RabbitMQ, etc.) behind bindings:

@EnableBinding(Sink.class)
public class PaymentEventHandler {

    @StreamListener(target = Sink.INPUT, condition = "headers['eventType']=='OrderCreated'")
    public void onOrderCreated(String message) {
        // Perform local transaction
        chargeCustomer(message);
        // Emit next event
        messageChannel.send(MessageBuilder
            .withPayload(new PaymentCompletedEvent(...))
            .setHeader("eventType", "PaymentCompleted")
            .build());
    }
}

Each service becomes both a publisher and consumer of domain events. The transaction boundaries are local, but the business workflow spans multiple asynchronous hops.

3.3.3 Pros: Loose Coupling and Scalability

  • Autonomy: Each service owns its state and logic.
  • Scalability: Event queues handle spikes naturally.
  • Resilience: If one service is down, events queue until it recovers.
  • Evolution: Adding new consumers doesn’t break existing flows.

For simple, linear workflows (two or three steps), this model works beautifully.

3.3.4 Cons: The “Distributed Monolith” and Visibility Problem

However, as the number of services grows, choreography becomes chaotic:

  • Hidden dependencies emerge — one event triggers multiple side effects.
  • State tracking is hard — there’s no single view of transaction progress.
  • Failure handling is scattered — compensations must be coordinated across many listeners.
  • Event cycles can appear accidentally (e.g., Payment → Inventory → Order → Payment).

In large systems, choreographed sagas often evolve into tangled event webs — difficult to reason about, debug, and monitor.

That’s where orchestration comes in.

3.4 Approach 2: Orchestration (The “Conductor” Model)

3.4.1 How It Works

An orchestrated saga introduces a central orchestrator — a dedicated service or workflow engine that controls the execution sequence.

Instead of services reacting to events, the orchestrator explicitly invokes each step:

orderService.create(order);
paymentService.charge(orderId);
inventoryService.reserve(orderId);
shippingService.schedule(orderId);

If any call fails, the orchestrator invokes compensating actions in reverse order.

This centralization provides:

  • Explicit control flow.
  • Consistent error handling.
  • Centralized state management.

In modern Java systems, tools like Temporal.io, Camunda, or Seata Saga implement this orchestration elegantly.

3.4.2 Pros: Centralized Logic, Easier Observability

  • Single Source of Truth: The orchestrator maintains state and progress.
  • Simplified Compensation: One place defines how to undo actions.
  • Better Observability: Monitoring, retries, and metrics are centralized.
  • Debuggability: Easier to trace and replay workflows.

Orchestration is particularly effective for long-running or multi-step business processes, like order fulfillment or loan processing.

3.4.3 Cons: The Orchestrator Bottleneck

The main trade-off is introducing another component — the orchestrator — which must be reliable and scalable.

If poorly designed, it becomes a single point of control that every transaction depends on. In practice, modern workflow engines like Temporal mitigate this risk through distributed state persistence and fault-tolerant execution.

Still, orchestration centralizes workflow logic, which means the orchestrator must evolve carefully to avoid tight coupling with business services.


4 Pattern Deep Dive 2: Try-Confirm-Cancel (TCC)

The Try-Confirm-Cancel (TCC) pattern takes a more synchronous approach than Sagas. While Sagas rely on compensation after failure, TCC focuses on reserving resources before confirming a business operation. This pattern is common in financial systems, booking platforms, and inventory management where guarantees around resource availability are non-negotiable.

Think of it as a distributed version of a hotel booking system — you first hold a room (Try), then confirm it when payment succeeds, or cancel it if payment fails. Every participant in the distributed transaction follows this same lifecycle.

Unlike Sagas, TCC doesn’t depend on an event-driven process or workflow engine for compensation; instead, it exposes explicit operations that define how to reserve, confirm, and cancel resources in a controlled manner.

4.1 The “Reservation” Pattern: Guarantee Before Commit

At the heart of TCC is the principle of reservation — don’t commit unless you can guarantee all resources are available. In other words, you don’t transfer money, allocate inventory, or issue a ticket unless you’ve first locked those resources for a short time window.

Imagine a distributed transaction where you transfer $100 from one account to another across two services:

  1. Account A must freeze $100 before the transfer starts.
  2. Account B must verify it can receive funds.
  3. Once both services are ready, the system confirms the transfer.
  4. If anything fails before confirmation, both sides cancel their reservations.

This reservation model reduces uncertainty. You never reach the confirm phase unless all participants have successfully prepared for it. The cost is extra code and stricter service contracts, but in return, you gain near-atomic behavior across services.

In implementation terms, every TCC participant defines three methods (usually in a single service interface):

  • tryXXX()
  • confirmXXX()
  • cancelXXX()

Each method must be idempotent, and together they form a transactional lifecycle.

Here’s a simple structure:

public interface TransferService {
    void tryTransfer(Long transactionId, Long fromAccount, Long toAccount, BigDecimal amount);
    void confirmTransfer(Long transactionId);
    void cancelTransfer(Long transactionId);
}

Each operation explicitly corresponds to one stage of the distributed transaction. The orchestrator (or coordinator, such as Apache Seata) drives these calls across participants.

4.2 The Three-Phase Flow

TCC transactions unfold in three distinct phases that closely mirror the “prepare–commit–rollback” logic from 2PC but implemented explicitly in the application layer.

4.2.1 Try: Reserve the Resource

The Try phase prepares each participant for the transaction. It checks business constraints and locks or reserves the resource so that no other concurrent transaction can interfere.

Key properties of the Try phase:

  • It must not perform the final operation (no actual debit or commit).
  • It must be idempotent (retrying should not cause issues).
  • It must only fail due to system or validation errors — not due to external, temporary problems.

Example: Account service freezing funds during a money transfer.

@Transactional
public void tryDecreaseBalance(Long transactionId, Long accountId, BigDecimal amount) {
    Account account = accountRepository.findById(accountId)
        .orElseThrow(() -> new IllegalArgumentException("Account not found"));

    if (account.getAvailableBalance().compareTo(amount) < 0) {
        throw new IllegalStateException("Insufficient funds");
    }

    // Move funds to frozen balance
    account.setAvailableBalance(account.getAvailableBalance().subtract(amount));
    account.setFrozenBalance(account.getFrozenBalance().add(amount));

    transactionLogRepository.save(new TransactionLog(transactionId, "TRY", "SUCCESS"));
}

At this point, $100 is frozen — not yet debited. Other transactions cannot use those funds until either confirm or cancel completes.

4.2.2 Confirm: The Point of No Return

The Confirm phase finalizes the transaction. If all participants have successfully completed their Try phase, the coordinator invokes confirm() on each one.

Each confirm call performs the irreversible action — committing the reserved resource. The confirm operation must also be idempotent; if called twice due to network retries, it should not cause duplicate side effects.

Example: Deducting the frozen balance after all participants succeed.

@Transactional
public void confirmDecreaseBalance(Long transactionId, Long accountId, BigDecimal amount) {
    TransactionLog log = transactionLogRepository.findByTransactionId(transactionId);
    if (log != null && "CONFIRMED".equals(log.getStatus())) return; // idempotent check

    Account account = accountRepository.findById(accountId)
        .orElseThrow(() -> new IllegalArgumentException("Account not found"));

    account.setFrozenBalance(account.getFrozenBalance().subtract(amount));
    accountRepository.save(account);

    transactionLogRepository.updateStatus(transactionId, "CONFIRMED");
}

Once confirmed, the business operation is complete. All participants commit their local changes and release locks.

4.2.3 Cancel: Undoing the Try Phase

If any participant’s Try phase fails, or if the global transaction times out before confirmation, the Cancel phase reverses the reservation.

The Cancel phase releases any frozen resources, ensuring that partial operations don’t linger indefinitely. As with other phases, it must be idempotent — in distributed systems, cancellation messages can arrive late or multiple times.

Example: Unfreezing the funds if a downstream service fails.

@Transactional
public void cancelDecreaseBalance(Long transactionId, Long accountId, BigDecimal amount) {
    TransactionLog log = transactionLogRepository.findByTransactionId(transactionId);
    if (log == null || "CANCELLED".equals(log.getStatus())) return; // idempotent check

    Account account = accountRepository.findById(accountId)
        .orElseThrow(() -> new IllegalArgumentException("Account not found"));

    account.setAvailableBalance(account.getAvailableBalance().add(amount));
    account.setFrozenBalance(account.getFrozenBalance().subtract(amount));
    accountRepository.save(account);

    transactionLogRepository.updateStatus(transactionId, "CANCELLED");
}

Together, Try–Confirm–Cancel form a tight lifecycle that mimics ACID-like safety without central locks or blocking. Every participant owns its own data integrity, and the coordinator simply ensures that all participants either confirm or cancel successfully.

4.3 TCC vs. Saga

TCC and Saga both solve the same class of problems — maintaining consistency across distributed systems — but they approach it from different directions.

In a Saga, compensation happens after an operation commits. For example, after charging a payment, you might later refund it if the order fails. This means some real business side effects occur before failure is detected.

In TCC, compensation happens before commit. You first reserve the resource, ensuring that either all confirms succeed, or all reservations are released.

Let’s illustrate with a side-by-side comparison:

FeatureSagaTCC
Compensation TargetCompleted actionReserved action
IsolationEventual (BASE)Stronger (resource locked)
ComplexityEasier to implementMore invasive per service
PerformanceAsync and non-blockingSynchronous and chatty
Failure TimingDetected after partial commitsDetected before confirmation

For example, in an airline booking system:

  • Saga: The ticket is issued immediately after payment, and if later the seat assignment fails, a refund is processed.
  • TCC: The seat is held first (tryHoldSeat), then payment is confirmed; if anything fails, the seat is released before being officially assigned.

TCC thus provides stronger guarantees at the cost of higher implementation effort. Each service must implement explicit reserve and cancel logic and persist intermediate state. But for domains like banking, trading, or reservations, this overhead is well worth it.

4.4 Ideal Use Case: Financial and Reservation Workflows

TCC fits scenarios where resource reservation or locking is essential before committing the final transaction. Typical examples include:

  • Financial transfers: Debiting and crediting accounts across multiple banks.
  • Booking systems: Reserving hotel rooms, seats, or car rentals.
  • Inventory management: Holding stock before order confirmation.
  • Payment gateways: Ensuring funds availability before authorization.

Consider this simple flow for transferring $100 from Account A to Account B:

  1. Account A → tryDecreaseBalance(100) (freeze $100).
  2. Account B → tryIncreaseBalance(100) (verify account).
  3. Both Try calls succeed → coordinator triggers confirm on both.
  4. If any Try fails → coordinator triggers cancel on both.

The explicit reservation model ensures that once confirmation begins, both sides have already guaranteed success, minimizing rollback complexity.


5 Masterclass 1: Orchestrated Sagas with Temporal.io

While TCC handles synchronous, tightly coupled transactions, most real-world workflows are asynchronous, long-running, and need resilient orchestration. Temporal.io brings the concept of durable execution to the Saga pattern — meaning your workflows survive crashes, restarts, and retries without manual recovery logic.

5.1 Why Temporal? Durable Execution Without Plumbing

Traditional orchestrators or BPM engines persist state to a database and rehydrate on resume. Temporal takes a different approach — it persists workflow execution history and replays it to reconstruct state deterministically.

This means you can write workflows in plain Java code as if everything were synchronous and in-memory, while Temporal handles durability, retries, and idempotency under the hood.

Imagine a workflow that spans several services — Order, Payment, Inventory, and Shipping — each of which might take seconds or minutes to respond. In Temporal, you don’t need to manage timers, retry policies, or state persistence manually. The Temporal server ensures that even if your worker crashes mid-execution, the workflow resumes exactly where it left off.

Developers interact with Temporal through:

  • Workflows: Long-lived orchestration logic written in code.
  • Activities: Short-lived service calls or operations (e.g., calling APIs, updating databases).
  • Workers: Processes that execute workflow and activity code.

5.2 The Temporal Model: Workflows, Activities, and Workers

Temporal divides logic into two distinct layers:

  1. Workflow code: Defines the orchestration — the sequence, branching, retries, and compensation logic. Workflow code must be deterministic because Temporal may replay it.

  2. Activity code: Performs the actual work — network calls, database operations, external service calls. Activities can fail, timeout, or be retried automatically.

Example workflow structure in Java:

@WorkflowInterface
public interface OrderWorkflow {
    @WorkflowMethod
    void processOrder(String orderId);
}

Workflow implementation:

public class OrderWorkflowImpl implements OrderWorkflow {

    private final PaymentActivities payment = Workflow.newActivityStub(
        PaymentActivities.class,
        ActivityOptions.newBuilder().setStartToCloseTimeout(Duration.ofMinutes(2)).build()
    );

    private final InventoryActivities inventory = Workflow.newActivityStub(
        InventoryActivities.class,
        ActivityOptions.newBuilder().setStartToCloseTimeout(Duration.ofMinutes(2)).build()
    );

    private final ShippingActivities shipping = Workflow.newActivityStub(
        ShippingActivities.class,
        ActivityOptions.newBuilder().setStartToCloseTimeout(Duration.ofMinutes(5)).build()
    );

    @Override
    public void processOrder(String orderId) {
        boolean paymentCharged = false;
        boolean inventoryReserved = false;
        try {
            payment.charge(orderId);
            paymentCharged = true;

            inventory.reserve(orderId);
            inventoryReserved = true;

            shipping.schedule(orderId);
        } catch (Exception e) {
            if (paymentCharged) payment.refund(orderId);
            if (inventoryReserved) inventory.release(orderId);
            throw Workflow.wrap(e);
        }
    }
}

Each ActivityStub represents a remote call that Temporal executes through registered worker processes. The workflow looks synchronous but is actually resilient — if a worker crashes during inventory.reserve(), Temporal retries automatically when it comes back online.

5.3 Real-World Example: E-Commerce Order Fulfillment

Let’s walk through a practical, production-grade workflow.

5.3.1 Workflow (OrderWorkflow)

The orchestrator defines the overall sequence: charge payment → reserve inventory → schedule shipping. It also defines what happens on failure — the compensation logic.

@WorkflowInterface
public interface OrderWorkflow {
    @WorkflowMethod
    void fulfillOrder(String orderId, BigDecimal total);
}

5.3.2 Activities (PaymentService, InventoryService, ShippingService)

Each service exposes activities that perform side effects, implemented with retry-safe, idempotent semantics.

public interface PaymentActivities {
    void chargePayment(String orderId, BigDecimal amount);
    void refundPayment(String orderId);
}

public interface InventoryActivities {
    void reserveInventory(String orderId, List<String> items);
    void releaseInventory(String orderId, List<String> items);
}

public interface ShippingActivities {
    void scheduleShipping(String orderId, String address);
    void cancelShipping(String orderId);
}

These activities are executed by workers, which are regular Java processes registered with the Temporal server.

5.3.3 Code Example

public class OrderWorkflowImpl implements OrderWorkflow {

    private final PaymentActivities payment = Workflow.newActivityStub(PaymentActivities.class);
    private final InventoryActivities inventory = Workflow.newActivityStub(InventoryActivities.class);
    private final ShippingActivities shipping = Workflow.newActivityStub(ShippingActivities.class);

    @Override
    public void fulfillOrder(String orderId, BigDecimal total) {
        boolean paymentDone = false;
        boolean inventoryDone = false;

        try {
            payment.chargePayment(orderId, total);
            paymentDone = true;

            inventory.reserveInventory(orderId, List.of("item-123", "item-456"));
            inventoryDone = true;

            shipping.scheduleShipping(orderId, "123 Main Street");
        } catch (ActivityFailure e) {
            if (inventoryDone) inventory.releaseInventory(orderId, List.of("item-123", "item-456"));
            if (paymentDone) payment.refundPayment(orderId);
            throw e;
        }
    }
}

Temporal automatically records the full execution history, including retries, results, and compensation steps. There’s no need to persist custom workflow states — Temporal does it transparently.

5.3.4 The “Magic”: Reliability by Design

Temporal ensures “durable execution.” If the worker crashes during scheduleShipping, the Temporal server replays the workflow until the last successful activity and resumes from there.

All retry policies, backoff intervals, and failure handling can be declared declaratively:

ActivityOptions options = ActivityOptions.newBuilder()
    .setRetryOptions(RetryOptions.newBuilder()
        .setMaximumAttempts(5)
        .setInitialInterval(Duration.ofSeconds(2))
        .setMaximumInterval(Duration.ofMinutes(1))
        .build())
    .build();

This eliminates boilerplate retry loops or manual state persistence code.

5.3.5 State Management Simplified

In legacy orchestration systems, you’d maintain a workflow state table and resume manually. In Temporal, your Java variables are the state — Temporal replays them deterministically based on the event history.

That means:

  • No external database required for workflow progress.
  • Automatic handling of worker restarts.
  • Seamless compensation and retry without lost progress.

With Temporal, orchestrated Sagas become readable, debuggable, and fault-tolerant without the traditional complexity of BPM engines or message-driven state tracking.


6 Masterclass 2: TCC & Sagas with Apache Seata

Apache Seata brings distributed transaction management directly into the Java ecosystem. While frameworks like Temporal focus on orchestration at the workflow level, Seata sits lower in the stack — coordinating resource-level consistency across microservices and databases. It’s especially useful in Spring Cloud architectures, where you already have microservices communicating over HTTP or RPC but need transaction integrity across them.

With built-in support for TCC, Saga, and AT (Automatic Transaction) modes, Seata provides the flexibility to match transaction strategy to the problem. You can treat it as a “distributed transaction manager for microservices,” seamlessly integrated into your existing Spring Boot applications.

6.1 What is Apache Seata?

Apache Seata (originally from Alibaba’s Fescar project) is a distributed transaction framework designed for cloud-native, microservice-based applications. It abstracts away the complexity of coordinating local transactions across services by managing them under a global transaction context.

Seata works by intercepting data source operations (for AT mode) or by explicitly marking business actions (for TCC and Saga modes). It ensures all participants of a distributed transaction either confirm successfully or cancel consistently — similar to the logic we discussed earlier in the TCC pattern.

A typical Seata setup includes:

  • Spring Boot microservices that act as participants.
  • Seata Server that functions as the global coordinator.
  • Database resource managers integrated with Seata’s data source proxy.

Its design aligns naturally with the Spring Cloud ecosystem, allowing you to start a global transaction using simple annotations and let Seata handle the heavy lifting of coordination, logging, and rollback.

6.2 Seata’s Architecture: TC, TM, and RM

Seata uses a well-defined three-component architecture to separate responsibilities and maintain scalability. These components interact through RPC within the same logical transaction.

  1. Transaction Coordinator (TC): The TC is a standalone Seata server process. It keeps track of global transaction states, coordinates commit and rollback, and ensures all participants reach a consistent outcome. It’s effectively the “brain” of Seata.

  2. Transaction Manager (TM): The TM lives inside the application that starts a global transaction. It begins the transaction by requesting a Global Transaction ID (XID) from the TC, and it tells the TC when to commit or roll back.

  3. Resource Manager (RM): Each service that participates in the transaction hosts an RM. The RM manages local database resources and communicates with the TC to register branches, report status, and perform commits or rollbacks.

In short:

  • TM starts and ends the transaction.
  • RM executes local operations.
  • TC coordinates everyone.

When an application starts a global transaction, Seata injects a XID into downstream service calls. Each participant uses this ID to join the global transaction. At completion, the TM reports back to the TC, which issues a global commit or rollback across all RMs.

6.3 Real-World Example: Financial Fund Transfer (The TCC Sweet-Spot)

The financial transfer use case is the perfect example of Seata’s TCC mode — explicit reservation and confirmation at the resource level.

6.3.1 Goal: Transfer $100 from AccountA to AccountB

We’ll implement two services:

  • AccountServiceA: Manages withdrawals from Account A.
  • AccountServiceB: Manages deposits into Account B.

Both participate in the same global transaction managed by Seata’s TC. The TM initiates the global transaction; Seata propagates it across the participating RMs through the transaction context.

6.3.2 Defining the TCC Interface with @TwoPhaseBusinessAction

In Seata, the TCC pattern is implemented through annotated business interfaces. The @TwoPhaseBusinessAction annotation defines the three TCC phases: try, confirm, and cancel.

Here’s what the interface might look like for the transfer operation:

@LocalTCC
public interface AccountTccAction {

    @TwoPhaseBusinessAction(name = "transferAction", commitMethod = "confirmTransfer", rollbackMethod = "cancelTransfer")
    void tryTransfer(BusinessActionContext context, String accountId, BigDecimal amount);

    void confirmTransfer(BusinessActionContext context);

    void cancelTransfer(BusinessActionContext context);
}
  • tryTransfer: Performs the resource reservation logic.
  • confirmTransfer: Commits the reserved operation.
  • cancelTransfer: Reverses the reservation if anything fails.

BusinessActionContext carries the transaction context (including the XID), allowing you to access parameters and metadata in all three phases.

6.3.3 Service A (AccountA) Implementation

Account A is responsible for freezing $100 in the tryTransfer phase, debiting it in confirmTransfer, and releasing it in cancelTransfer.

@Component
public class AccountATccActionImpl implements AccountTccAction {

    @Autowired
    private AccountRepository accountRepository;

    @Override
    public void tryTransfer(BusinessActionContext context, String accountId, BigDecimal amount) {
        Account account = accountRepository.findById(accountId)
            .orElseThrow(() -> new IllegalArgumentException("Account not found"));
        if (account.getAvailableBalance().compareTo(amount) < 0) {
            throw new IllegalStateException("Insufficient funds");
        }

        account.setAvailableBalance(account.getAvailableBalance().subtract(amount));
        account.setFrozenBalance(account.getFrozenBalance().add(amount));
        accountRepository.save(account);
    }

    @Override
    public void confirmTransfer(BusinessActionContext context) {
        String accountId = (String) context.getActionContext("accountId");
        BigDecimal amount = new BigDecimal(context.getActionContext("amount").toString());

        Account account = accountRepository.findById(accountId)
            .orElseThrow(() -> new IllegalArgumentException("Account not found"));
        account.setFrozenBalance(account.getFrozenBalance().subtract(amount));
        accountRepository.save(account);
    }

    @Override
    public void cancelTransfer(BusinessActionContext context) {
        String accountId = (String) context.getActionContext("accountId");
        BigDecimal amount = new BigDecimal(context.getActionContext("amount").toString());

        Account account = accountRepository.findById(accountId)
            .orElseThrow(() -> new IllegalArgumentException("Account not found"));
        account.setAvailableBalance(account.getAvailableBalance().add(amount));
        account.setFrozenBalance(account.getFrozenBalance().subtract(amount));
        accountRepository.save(account);
    }
}

Here, each operation corresponds directly to one of the TCC phases we covered earlier. Seata automatically ensures the correct phase executes depending on the global transaction outcome.

6.3.4 Service B (AccountB) Implementation

On the receiving side, the logic is simpler — Account B primarily confirms credit after the transaction succeeds.

@Component
public class AccountBTccActionImpl implements AccountTccAction {

    @Autowired
    private AccountRepository accountRepository;

    @Override
    public void tryTransfer(BusinessActionContext context, String accountId, BigDecimal amount) {
        // Optionally verify account exists
        if (!accountRepository.existsById(accountId)) {
            throw new IllegalArgumentException("Account not found");
        }
    }

    @Override
    public void confirmTransfer(BusinessActionContext context) {
        String accountId = (String) context.getActionContext("accountId");
        BigDecimal amount = new BigDecimal(context.getActionContext("amount").toString());

        Account account = accountRepository.findById(accountId)
            .orElseThrow(() -> new IllegalArgumentException("Account not found"));
        account.setAvailableBalance(account.getAvailableBalance().add(amount));
        accountRepository.save(account);
    }

    @Override
    public void cancelTransfer(BusinessActionContext context) {
        // No action needed since tryTransfer didn’t change any data
    }
}

6.3.5 The Coordinator’s Role

When the TM starts a global transaction, Seata assigns an XID and tracks every @TwoPhaseBusinessAction participant. Each participant’s Try phase registers a branch transaction with the TC.

Once all Try operations succeed:

  • The TM notifies the TC to commit.
  • The TC instructs each RM to execute the corresponding confirm method.
  • If any Try fails or a timeout occurs, the TC instead instructs cancel.

This process is entirely automatic. Developers only define the local logic for each phase — Seata handles registration, context propagation, and distributed commit or rollback.

6.4 Seata’s Other Modes

TCC is explicit and fine-grained, but Seata also supports alternative transaction models that fit other consistency needs.

6.4.1 AT Mode: Automatic SQL-Level Compensation

In AT mode, Seata operates transparently by intercepting SQL statements at the JDBC level. It captures the before-and-after state of modified data, logs it in a local undo table, and reverts it automatically if a rollback occurs.

You don’t need to define Try/Confirm/Cancel logic — Seata does it by analyzing SQL.

Example: When you run this inside a Seata global transaction:

@Transactional
public void deduct(String accountId, BigDecimal amount) {
    jdbcTemplate.update("UPDATE account SET balance = balance - ? WHERE id = ?", amount, accountId);
}

Seata’s data source proxy automatically records the before-image (old balance) and after-image (new balance). If the transaction fails globally, it uses the before-image to restore consistency.

Pros:

  • Extremely simple integration — minimal code changes.
  • Great for straightforward CRUD operations.

Cons:

  • Doesn’t handle complex SQL or non-database resources.
  • Not suitable for long-running or business-level workflows.

In essence, AT mode is Seata’s “XA-lite” solution: strong consistency across databases with minimal developer effort.

6.4.2 Saga Mode: State Machine Orchestration

Seata’s Saga mode is conceptually similar to Temporal but lighter weight. It uses a state machine definition (usually in JSON or YAML) to describe each step of a business process and its corresponding compensating action.

Example Saga definition:

Name: orderSaga
Steps:
  - Name: createOrder
    ServiceName: orderService
    Method: create
    CompensationMethod: cancel
  - Name: reserveInventory
    ServiceName: inventoryService
    Method: reserve
    CompensationMethod: release
  - Name: chargePayment
    ServiceName: paymentService
    Method: charge
    CompensationMethod: refund

The Seata engine executes these steps sequentially, managing retries and compensations automatically.

This mode is ideal for asynchronous or long-running workflows that require compensation but don’t need Temporal’s durability model. You define your workflow declaratively and let Seata manage sequencing and rollback.


7 Handling the “Impossible”: Idempotency and Observability

Even the best transaction coordinator can’t save you if your services aren’t idempotent or observable. Distributed systems fail in unpredictable ways — retries happen, messages duplicate, and compensations trigger twice. The key to surviving this chaos is to design every step of a distributed transaction so it can safely repeat and always tell you where it is.

7.1 The Golden Rule: Idempotency

7.1.1 Why Every Step Must Be Idempotent

In distributed transactions, retries aren’t exceptions — they’re the norm. When a coordinator or network times out, it can’t distinguish between “failure” and “slow response.” To stay safe, it retries the operation.

If your service doesn’t handle duplicates gracefully, you’ll end up with double charges, extra reservations, or inconsistent states. Idempotency ensures that repeating the same request produces the same final result, no matter how many times it’s retried.

In Seata’s TCC or Temporal Sagas, every confirm and cancel action must therefore check whether it has already executed. A simple pattern is to record transaction status in a local table or cache keyed by the transaction ID (XID or workflow ID).

Example:

public void confirmPayment(String orderId) {
    if (paymentLog.exists(orderId, "CONFIRMED")) return; // Already confirmed
    // ...perform confirmation logic
    paymentLog.save(orderId, "CONFIRMED");
}

This small guard prevents catastrophic duplication under retry storms.

7.1.2 Techniques for Ensuring Idempotency

Common techniques include:

  1. Natural Business Keys: Use stable identifiers like orderId or paymentId to detect duplicates. Instead of inserting new records, use upserts or idempotent SQL statements like:

    INSERT INTO payment(order_id, status) VALUES (?, ?)
    ON DUPLICATE KEY UPDATE status = VALUES(status);
  2. Request ID Tracking: Generate a unique requestId per business operation and persist it in a log table. Any repeated requests with the same ID are ignored.

  3. Database Constraints: Use unique keys or conditional updates (UPDATE ... WHERE status = 'PENDING') to enforce single execution at the database level.

  4. Message Deduplication: In event-driven systems, maintain a message log table keyed by message ID to prevent reprocessing already-consumed messages.

Idempotency isn’t optional — it’s foundational. Without it, retries and compensations become data corruption events.

7.2 “Where Did My Transaction Go?“

7.2.1 The Observability Nightmare of Choreographed Sagas

In event-driven Sagas, each service only knows its own state. When something fails midway, tracing what happened across services can be excruciating. You might have hundreds of logs with timestamps and correlation IDs but no single view of the end-to-end flow.

Without observability, distributed transactions feel like black boxes — you only notice failures when users complain.

7.2.2 The Solution: Distributed Tracing with OpenTelemetry

Distributed tracing solves this visibility gap. Tools like OpenTelemetry, Jaeger, and Zipkin let you visualize every step of a transaction across services.

By instrumenting your code with trace spans, you can see:

  • Which services participated.
  • How long each operation took.
  • Where failures or retries occurred.

In a Spring Boot microservice, you can enable tracing with simple configuration:

management:
  tracing:
    sampling:
      probability: 1.0
  otlp:
    tracing:
      endpoint: http://jaeger:4318/v1/traces

Once enabled, each incoming HTTP or gRPC request carries a Trace-ID, and OpenTelemetry automatically propagates it through service calls and message headers.

7.2.3 Propagating Correlation IDs Across Services

To make tracing effective, every service must pass along a shared identifier (like a Correlation-ID or Trace-ID). In Spring Cloud, this can be done using request interceptors:

@Component
public class CorrelationInterceptor implements ClientHttpRequestInterceptor {
    @Override
    public ClientHttpResponse intercept(HttpRequest request, byte[] body, ClientHttpRequestExecution execution)
            throws IOException {
        String traceId = MDC.get("traceId");
        if (traceId != null) {
            request.getHeaders().add("X-Correlation-ID", traceId);
        }
        return execution.execute(request, body);
    }
}

Downstream services log the same ID, allowing full trace reconstruction.

7.2.4 Visualizing a Complete Saga

Once tracing is in place, tools like Jaeger or Zipkin provide visual timelines showing how a transaction flows through each microservice.

You can spot bottlenecks, failed retries, and compensation cascades immediately. In Temporal-based orchestrations, you can even view workflow progress directly in the Temporal Web UI, which shows each activity’s status, retries, and completion times.

With tracing and idempotency combined, distributed transactions go from opaque and fragile to transparent and controllable — an essential step for building resilient microservices at scale.


8 The Architect’s Decision Matrix: Choosing Your Pattern

By now, we’ve explored the mechanics, trade-offs, and tooling behind every major distributed transaction strategy. Choosing the right one isn’t about personal preference or framework familiarity—it’s about aligning system guarantees with business reality. In production, the right decision balances consistency, complexity, and cost. This final section distills those trade-offs into a practical decision matrix and guiding framework for architects designing distributed systems in Java and Spring-based environments.

8.1 Comparison Table: 2PC vs. Saga (Choreography) vs. Saga (Orchestration) vs. TCC

Each pattern we’ve covered sits on a spectrum from tight coordination (2PC) to flexible compensation (Saga) to strong reservation control (TCC). Their differences go beyond theory—they dictate how your system behaves under failure, scales under load, and evolves over time.

Feature2PC (XA)Saga (Choreography)Saga (Orchestration)TCC
ConsistencyStrong (ACID)Eventual (BASE)Eventual (BASE)Eventual (BASE)
CouplingVery TightVery LooseMedium (to Orchestrator)Medium (TCC contract)
PerformanceLow (Blocking)High (Async)High (Async)Medium (Chatty)
ComplexityHigh (Coordinator)High (Tracing)Medium (Centralized)High (Invasive Logic)
Use CaseMonolithic DBsSimple, high-throughputComplex, stateful workflowsResource reservation
ToolingJTA / NarayanaSpring Cloud StreamTemporal.ioApache Seata

In real-world systems, patterns often coexist. A financial platform, for example, might use TCC for fund transfers, Saga orchestration for loan approvals, and local transactions (ACID) for internal updates. The decision isn’t binary—it’s contextual.

For instance:

  • A retail checkout flow with inventory, payment, and shipping steps benefits from Saga orchestration with Temporal.
  • A ticket booking system needs TCC to reserve limited resources without overselling.
  • A legacy ERP system still bound to a shared Oracle DB might rely on 2PC for short-lived, cross-schema updates.

Understanding these boundaries keeps systems both reliable and evolvable.

8.2 Key Questions to Ask

Before locking in a pattern, every architect should walk through a short diagnostic. These questions expose the operational and domain trade-offs early—before code or infrastructure commitments harden design decisions.

Q1: Can the business tolerate eventual consistency? If not, step back. Distributed systems that require strict serial consistency across services usually mask deeper design issues. Often, the correct answer is consolidating ownership of critical invariants rather than enforcing ACID across microservices.

Q2: Is the workflow simple (2–3 steps)? When a process involves a few asynchronous steps, such as “place order → charge card → send notification,” Saga Choreography via Spring Cloud Stream is sufficient. Its event-driven model scales easily and avoids central coordination.

Q3: Is the workflow long-running, stateful, or complex? If steps can span hours or days—say, in a shipping or subscription lifecycle—Saga Orchestration is the right fit. Temporal.io shines here, giving you retry logic, persistence, and observability out of the box. It transforms workflow code into fault-tolerant, durable state machines without extra infrastructure.

Q4: Do I need to reserve resources that can’t easily be undone? Use TCC. When your business operation involves freezing or pre-allocating finite assets (money, seats, stock), you must control both confirmation and cancellation precisely. Seata’s TCC mode allows services to define try, confirm, and cancel semantics explicitly, offering stronger guarantees without blocking.

Q5: Do I control all the databases and they are XA-compliant? If yes, 2PC might work—but this scenario is increasingly rare. Even within a single enterprise, systems use mixed storage technologies (MySQL, PostgreSQL, NoSQL, etc.), making XA coordination impractical. Modern architectures prefer application-level patterns (Saga/TCC) over database-level locks.

To make this concrete, consider an online payment flow:

# Payment Orchestration using Temporal (Saga)
try:
    workflow.charge_card(order_id, amount)
    workflow.reserve_stock(order_id, sku)
    workflow.dispatch_delivery(order_id)
except Exception:
    workflow.compensate(order_id)  # Refunds and releases happen here

If, instead, the same system handled bank-to-bank transfers, you’d need explicit reservations:

// TCC-style fund transfer using Seata
try {
    accountA.tryDebit(xid, 100);
    accountB.tryCredit(xid, 100);
    // all try calls succeeded
    accountA.confirmDebit(xid);
    accountB.confirmCredit(xid);
} catch (Exception) {
    accountA.cancelDebit(xid);
    accountB.cancelCredit(xid);
}

Different patterns, same intent: achieve atomic business consistency without breaking service independence.

8.3 Final Verdict

There is no single “best” pattern—only the one that best fits your constraints.

For enterprise-scale systems, architects need both Saga orchestration and TCC in their toolbox:

  • Use Temporal (Saga Orchestration) for workflows that are asynchronous, multi-step, and long-lived. It offers clarity, retry safety, and durability for high-level business processes.
  • Use Seata (TCC) for operations that are synchronous and resource-sensitive, such as financial transfers, where partial rollback is unacceptable.

The right approach often blends patterns. A logistics system, for example, might use TCC between warehouse and inventory services (to reserve stock) while a Temporal workflow coordinates customer-facing steps (like payment and shipping).

If your system relies on messaging between loosely coupled services, event-driven Sagas provide simplicity—but as complexity grows, orchestration becomes essential for control and visibility.

Avoid defaulting to 2PC unless every participant is XA-compliant, latency is minimal, and coordination cost is trivial. In nearly all modern microservice deployments, application-level transaction management outperforms database-level coordination in both scalability and maintainability.

8.4 Article Summary

Distributed transactions define the line between reliable systems and unpredictable ones. What began as the rigid guarantees of 2PC evolved into flexible, resilient patterns like Saga and TCC—each representing a different trade-off in the balance between consistency and autonomy.

Spring Cloud Stream makes Sagas approachable through event choreography. Temporal.io elevates orchestration to code-level workflows that are durable and debuggable. Apache Seata brings fine-grained control through TCC and automated SQL compensation via AT mode. Together, these tools form a modern distributed transaction stack for Java and Spring Cloud ecosystems.

From ACID to BASE, from blocking locks to eventual convergence, the core principle remains the same: consistency is not about perfection—it’s about predictability under failure. The best systems don’t avoid failure; they expect, isolate, and compensate for it.

When you can explain, implement, and observe your distributed transaction flows across multiple services and databases, you move from chasing correctness to designing it. That’s the mark of a modern architect—building systems that don’t just work when everything goes right, but recover gracefully when everything doesn’t.

Advertisement