What is event-driven architecture?

Event-driven architecture (EDA) is a software design paradigm where the flow of the program is determined by events — significant changes in state that are produced, detected, and consumed by loosely coupled components. Instead of direct request-response calls between services, a publisher emits an event to a channel (message broker, event bus, or stream), and one or more subscribers react to it asynchronously. This decoupling enables independent scaling, fault isolation, and temporal flexibility.

What is the difference between choreography and orchestration?

In choreography, each service independently listens for events and decides how to react — there is no central coordinator. An order service emits OrderCreated, and the payment, inventory, and shipping services each react independently. In orchestration, a central orchestrator (saga coordinator) explicitly directs each step by sending commands and waiting for responses. Choreography is simpler for small flows but harder to debug. Orchestration provides clear visibility into the workflow state but introduces a single point of coordination.

When should I use event-driven architecture vs request-response?

Use event-driven architecture when you need asynchronous processing, loose coupling between services, event replay capability, or when multiple consumers need to react to the same event. Use request-response (synchronous) when the caller needs an immediate result, the operation is simple and fast, or strong consistency is required. Many production systems use a hybrid approach — synchronous for queries and reads, event-driven for commands and state changes.

What are the main event-driven topology patterns?

The four main patterns are: Fan-out (one publisher, multiple subscribers receiving the same event), Fan-in (multiple publishers sending events to one subscriber for aggregation), Choreography (decentralized peer-to-peer event reactions with no coordinator), and Orchestration (centralized saga coordinator directing the workflow). Each pattern has different trade-offs in complexity, coupling, observability, and fault tolerance.

How do I handle failures in event-driven systems?

Handle failures through dead letter queues (DLQ) for messages that fail processing after retry attempts, compensating transactions (sagas) to undo partial work, idempotent consumers that safely handle duplicate events, and circuit breakers to prevent cascade failures. Event sourcing provides natural auditability by storing every state change as an immutable event, enabling replay and recovery. Monitoring event lag and consumer group offsets helps detect processing bottlenecks before they cause visible failures.

Event-Driven Architecture Patterns — Event Flow Designer with Pub/Sub Topology

May 25, 2026 · 16 min read · By Michael Lip

Event-driven architecture (EDA) has become the default design paradigm for distributed systems that need to scale independently, tolerate partial failures, and process data in real time. But designing event flows on a whiteboard or in documentation rarely captures the complexity of real-world topologies. The Event Flow Designer below provides an interactive canvas where you can create publishers, subscribers, and event channels, connect them to form topologies, and generate architecture diagrams — all in your browser.

Load a preset topology pattern to explore fan-out, fan-in, choreography, or orchestration architectures. Add custom nodes, draw connections between them, and compare synchronous request-response patterns against asynchronous event-driven alternatives. The generated diagram exports a text-based architecture description that you can include in design documents, architecture decision records, or pull request descriptions.

Event Flow Designer

Topology Presets

Connect: →

Generated Architecture Diagram

Sync vs Async Comparison

Aspect	Synchronous (Request-Response)	Asynchronous (Event-Driven)
Coupling	Tight — caller knows callee	Loose — publisher unaware of subscribers
Latency	Blocking; latency = sum of all calls	Non-blocking; parallel processing
Failure Impact	Cascading — one failure blocks chain	Isolated — failures contained per consumer
Scalability	Limited by slowest service	Independent scaling per consumer group
Ordering	Implicit via call sequence	Explicit via partition keys or sequence numbers
Debugging	Simple stack traces	Requires distributed tracing (correlation IDs)
Data Consistency	Strong (transactions)	Eventual (compensating transactions / sagas)
Replay	Not possible	Event log enables full replay

The Foundations of Event-Driven Architecture

Event-driven architecture is built on a simple premise: when something happens in one part of a system, other parts that care about it should be notified asynchronously rather than being called directly. This inversion of control — from “I call you” to “I announce, you react” — fundamentally changes how distributed systems are designed, scaled, and operated. The publisher does not know who its subscribers are, how many exist, or what they do with the event. This decoupling is the source of both EDA’s power and its complexity.

The three core components of any event-driven system are producers (publishers that emit events), channels (message brokers, event buses, or streams that transport events), and consumers (subscribers that react to events). The channel mediates all communication, providing buffering, routing, persistence, and delivery guarantees. Producers and consumers interact only with the channel, never directly with each other.

Fan-Out: One Event, Many Reactions

Fan-out is the most common EDA pattern. A single publisher emits an event, and the channel delivers it to multiple independent subscribers. When an e-commerce system processes an order, the OrderCreated event fans out to the payment service (charge the card), the inventory service (reserve stock), the notification service (send confirmation email), and the analytics service (track conversion). Each subscriber processes the event independently, at its own pace, with its own error handling.

The fan-out pattern excels when the publisher does not need to know the outcome of downstream processing. The order service does not wait for the email to be sent or the analytics to be recorded. If the notification service is down, the email will be delivered when it recovers (assuming the channel persists messages). This temporal decoupling means that adding a new subscriber — say, a fraud detection service — requires zero changes to the publisher. You simply subscribe the new service to the same event channel.

Fan-In: Aggregation from Multiple Sources

Fan-in is the inverse of fan-out. Multiple publishers emit events that converge on a single subscriber for aggregation, correlation, or complex event processing. A monitoring dashboard that collects metrics from dozens of microservices uses fan-in. A fraud detection engine that correlates events from payment, login, and session services uses fan-in. An ETL pipeline that merges data from multiple source databases uses fan-in.

The challenge with fan-in is event ordering and correlation. Events from different publishers arrive at different times and in unpredictable order. The aggregating subscriber needs to buffer, window, and correlate events — typically using a correlation ID (like order ID or session ID) to group related events across sources. Time-windowed aggregation (collect all events within a 5-second window) is another approach, but introduces latency and the risk of missed events at window boundaries.

Choreography: Decentralized Event Reactions

In choreography, there is no central coordinator. Each service listens for events it cares about and emits new events as a result. An order saga implemented as choreography works like this: the order service emits OrderCreated, the payment service hears it and emits PaymentCharged, the inventory service hears PaymentCharged and emits StockReserved, and the shipping service hears StockReserved and emits ShipmentScheduled. Each service knows its own rules but has no visibility into the overall workflow.

Choreography is elegant for simple linear workflows. But as the number of services and event types grows, the implicit flow becomes difficult to understand, debug, and modify. There is no single place that describes the complete workflow. A failure in the middle of the chain requires compensating events (like PaymentRefunded if stock reservation fails), and the logic for emitting compensating events is distributed across services. For teams managing complex event pipelines, choreography beyond five services often becomes a maintenance burden.

Orchestration: Centralized Workflow Coordination

Orchestration uses a central coordinator (the orchestrator or saga manager) that explicitly directs each step of the workflow. The orchestrator sends commands to services and waits for their responses. If the order saga is orchestrated, the orchestrator sends ChargePayment to the payment service, waits for PaymentCharged, then sends ReserveStock to the inventory service, waits for StockReserved, then sends ScheduleShipment to the shipping service. If any step fails, the orchestrator executes compensating actions in reverse order.

Orchestration provides clear visibility — the orchestrator’s code is a readable description of the entire workflow. It makes failure handling explicit and centralized. The trade-off is that the orchestrator is a single point of coordination (though not a single point of failure if it is stateful and recoverable). It also introduces tighter coupling: the orchestrator must know about every service in the workflow, while in choreography no single service has that knowledge.

Event Types: Notifications, State Transfers, and Commands

Event notifications are thin messages that something happened: OrderCreated with just the order ID. The subscriber fetches the full data from the source via API if it needs more context. This keeps events small and reduces the risk of stale data in the event payload.

Event-carried state transfers include the full state in the event payload: OrderCreated with the complete order object including items, prices, shipping address, and customer details. Subscribers do not need to call back to the source, which eliminates a synchronous dependency but increases event size and the risk of data staleness if the payload is cached.

Commands are not events in the strict sense — they are instructions to a specific service to do something: ChargePayment, ReserveStock. Commands are used in orchestration patterns where the coordinator directs specific actions. The distinction between events and commands matters: events are facts (something happened, and publishers do not care who reacts), while commands are intentions (do this specific thing, and I expect a response).

Message Broker Selection

Apache Kafka is the dominant choice for high-throughput event streaming. Kafka stores events in persistent, ordered, partitioned logs with configurable retention. Consumers track their own position (offset) in the log, enabling replay and parallel consumption. Kafka excels at fan-out (consumer groups), event sourcing, and stream processing, but has operational complexity and is overkill for low-volume use cases.

RabbitMQ is a traditional message broker that supports multiple messaging patterns (point-to-point, pub/sub, routing, topics) with strong delivery guarantees. RabbitMQ deletes messages after successful consumption (unlike Kafka, which retains them), making it better for task queues and work distribution but less suitable for event sourcing or replay.

AWS SNS/SQS provides a managed pub/sub (SNS) and queue (SQS) combination. SNS fans out events to multiple SQS queues, each serving a different subscriber. This is the simplest operational choice for AWS-based systems but lacks the replay capability of Kafka and the routing flexibility of RabbitMQ.

Redis Streams offers a lightweight event streaming capability for systems already using Redis. It supports consumer groups, acknowledgment, and time-based querying, but has durability limitations compared to Kafka or dedicated message brokers. Redis Streams is a good fit for systems building real-time webhook delivery with moderate throughput requirements.

Designing for Failure in Event-Driven Systems

Every component in an event-driven system can fail: publishers can crash before emitting events, channels can lose messages, and consumers can fail during processing. Robust EDA design addresses each failure mode explicitly. Outbox pattern: Instead of publishing an event directly, the publisher writes the event to a local outbox table in the same database transaction as the state change, and a separate process reads the outbox and publishes to the channel. This guarantees that the event is emitted if and only if the state change is committed.

Dead letter queues (DLQ): Messages that fail processing after a configurable number of retries are moved to a DLQ for manual inspection and reprocessing. Without a DLQ, poison messages (events that always fail) block the consumer indefinitely. Idempotent consumers: Because channels may deliver the same event more than once (at-least-once delivery), consumers must handle duplicates safely. This typically means checking an event ID against a processed-events table before executing side effects.

Event Sourcing and CQRS

Event sourcing takes the event-driven concept to its logical conclusion: instead of storing the current state of an entity, you store the sequence of events that produced it. An account balance is not stored as a single number; it is derived by replaying all deposit and withdrawal events. This provides a complete audit trail, enables temporal queries (what was the balance at time T?), and supports event replay for debugging and data recovery.

CQRS (Command Query Responsibility Segregation) separates the write model (commands that produce events) from the read model (materialized views optimized for queries). Events bridge the two: when a command produces a new event, projections consume the event and update the read model. This separation allows the write side to be optimized for consistency and the read side to be optimized for query performance, which is essential for high-throughput systems processing hundreds of thousands of events per second.

Observability in Event-Driven Systems

Traditional request-response systems have natural observability through synchronous call chains and stack traces. Event-driven systems require deliberate observability design. Every event should carry a correlation ID that links all events belonging to the same workflow. Distributed tracing tools (Jaeger, Zipkin, AWS X-Ray) use correlation IDs to reconstruct the full event flow across services. Consumer lag (the difference between the latest event produced and the latest event consumed) is the primary health metric for event-driven systems. High lag indicates that consumers are falling behind, which can lead to stale data and violated SLAs.

Last updated: May 25, 2026