Event-Driven Architecture Patterns — Event Flow Designer with Pub/Sub Topology
Event-driven architecture (EDA) has become the default design paradigm for distributed systems that need to scale independently, tolerate partial failures, and process data in real time. But designing event flows on a whiteboard or in documentation rarely captures the complexity of real-world topologies. The Event Flow Designer below provides an interactive canvas where you can create publishers, subscribers, and event channels, connect them to form topologies, and generate architecture diagrams — all in your browser.
Load a preset topology pattern to explore fan-out, fan-in, choreography, or orchestration architectures. Add custom nodes, draw connections between them, and compare synchronous request-response patterns against asynchronous event-driven alternatives. The generated diagram exports a text-based architecture description that you can include in design documents, architecture decision records, or pull request descriptions.
| Aspect | Synchronous (Request-Response) | Asynchronous (Event-Driven) |
|---|---|---|
| Coupling | Tight — caller knows callee | Loose — publisher unaware of subscribers |
| Latency | Blocking; latency = sum of all calls | Non-blocking; parallel processing |
| Failure Impact | Cascading — one failure blocks chain | Isolated — failures contained per consumer |
| Scalability | Limited by slowest service | Independent scaling per consumer group |
| Ordering | Implicit via call sequence | Explicit via partition keys or sequence numbers |
| Debugging | Simple stack traces | Requires distributed tracing (correlation IDs) |
| Data Consistency | Strong (transactions) | Eventual (compensating transactions / sagas) |
| Replay | Not possible | Event log enables full replay |
The Foundations of Event-Driven Architecture
Event-driven architecture is built on a simple premise: when something happens in one part of a system, other parts that care about it should be notified asynchronously rather than being called directly. This inversion of control — from “I call you” to “I announce, you react” — fundamentally changes how distributed systems are designed, scaled, and operated. The publisher does not know who its subscribers are, how many exist, or what they do with the event. This decoupling is the source of both EDA’s power and its complexity.
The three core components of any event-driven system are producers (publishers that emit events), channels (message brokers, event buses, or streams that transport events), and consumers (subscribers that react to events). The channel mediates all communication, providing buffering, routing, persistence, and delivery guarantees. Producers and consumers interact only with the channel, never directly with each other.
Fan-Out: One Event, Many Reactions
Fan-out is the most common EDA pattern. A single publisher emits an event, and the channel delivers it to multiple independent subscribers. When an e-commerce system processes an order, the OrderCreated event fans out to the payment service (charge the card), the inventory service (reserve stock), the notification service (send confirmation email), and the analytics service (track conversion). Each subscriber processes the event independently, at its own pace, with its own error handling.
The fan-out pattern excels when the publisher does not need to know the outcome of downstream processing. The order service does not wait for the email to be sent or the analytics to be recorded. If the notification service is down, the email will be delivered when it recovers (assuming the channel persists messages). This temporal decoupling means that adding a new subscriber — say, a fraud detection service — requires zero changes to the publisher. You simply subscribe the new service to the same event channel.
Fan-In: Aggregation from Multiple Sources
Fan-in is the inverse of fan-out. Multiple publishers emit events that converge on a single subscriber for aggregation, correlation, or complex event processing. A monitoring dashboard that collects metrics from dozens of microservices uses fan-in. A fraud detection engine that correlates events from payment, login, and session services uses fan-in. An ETL pipeline that merges data from multiple source databases uses fan-in.
The challenge with fan-in is event ordering and correlation. Events from different publishers arrive at different times and in unpredictable order. The aggregating subscriber needs to buffer, window, and correlate events — typically using a correlation ID (like order ID or session ID) to group related events across sources. Time-windowed aggregation (collect all events within a 5-second window) is another approach, but introduces latency and the risk of missed events at window boundaries.
Choreography: Decentralized Event Reactions
In choreography, there is no central coordinator. Each service listens for events it cares about and emits new events as a result. An order saga implemented as choreography works like this: the order service emits OrderCreated, the payment service hears it and emits PaymentCharged, the inventory service hears PaymentCharged and emits StockReserved, and the shipping service hears StockReserved and emits ShipmentScheduled. Each service knows its own rules but has no visibility into the overall workflow.
Choreography is elegant for simple linear workflows. But as the number of services and event types grows, the implicit flow becomes difficult to understand, debug, and modify. There is no single place that describes the complete workflow. A failure in the middle of the chain requires compensating events (like PaymentRefunded if stock reservation fails), and the logic for emitting compensating events is distributed across services. For teams managing complex event pipelines, choreography beyond five services often becomes a maintenance burden.
Orchestration: Centralized Workflow Coordination
Orchestration uses a central coordinator (the orchestrator or saga manager) that explicitly directs each step of the workflow. The orchestrator sends commands to services and waits for their responses. If the order saga is orchestrated, the orchestrator sends ChargePayment to the payment service, waits for PaymentCharged, then sends ReserveStock to the inventory service, waits for StockReserved, then sends ScheduleShipment to the shipping service. If any step fails, the orchestrator executes compensating actions in reverse order.
Orchestration provides clear visibility — the orchestrator’s code is a readable description of the entire workflow. It makes failure handling explicit and centralized. The trade-off is that the orchestrator is a single point of coordination (though not a single point of failure if it is stateful and recoverable). It also introduces tighter coupling: the orchestrator must know about every service in the workflow, while in choreography no single service has that knowledge.
Event Types: Notifications, State Transfers, and Commands
Event notifications are thin messages that something happened: OrderCreated with just the order ID. The subscriber fetches the full data from the source via API if it needs more context. This keeps events small and reduces the risk of stale data in the event payload.
Event-carried state transfers include the full state in the event payload: OrderCreated with the complete order object including items, prices, shipping address, and customer details. Subscribers do not need to call back to the source, which eliminates a synchronous dependency but increases event size and the risk of data staleness if the payload is cached.
Commands are not events in the strict sense — they are instructions to a specific service to do something: ChargePayment, ReserveStock. Commands are used in orchestration patterns where the coordinator directs specific actions. The distinction between events and commands matters: events are facts (something happened, and publishers do not care who reacts), while commands are intentions (do this specific thing, and I expect a response).
Message Broker Selection
Apache Kafka is the dominant choice for high-throughput event streaming. Kafka stores events in persistent, ordered, partitioned logs with configurable retention. Consumers track their own position (offset) in the log, enabling replay and parallel consumption. Kafka excels at fan-out (consumer groups), event sourcing, and stream processing, but has operational complexity and is overkill for low-volume use cases.
RabbitMQ is a traditional message broker that supports multiple messaging patterns (point-to-point, pub/sub, routing, topics) with strong delivery guarantees. RabbitMQ deletes messages after successful consumption (unlike Kafka, which retains them), making it better for task queues and work distribution but less suitable for event sourcing or replay.
AWS SNS/SQS provides a managed pub/sub (SNS) and queue (SQS) combination. SNS fans out events to multiple SQS queues, each serving a different subscriber. This is the simplest operational choice for AWS-based systems but lacks the replay capability of Kafka and the routing flexibility of RabbitMQ.
Redis Streams offers a lightweight event streaming capability for systems already using Redis. It supports consumer groups, acknowledgment, and time-based querying, but has durability limitations compared to Kafka or dedicated message brokers. Redis Streams is a good fit for systems building real-time webhook delivery with moderate throughput requirements.
Designing for Failure in Event-Driven Systems
Every component in an event-driven system can fail: publishers can crash before emitting events, channels can lose messages, and consumers can fail during processing. Robust EDA design addresses each failure mode explicitly. Outbox pattern: Instead of publishing an event directly, the publisher writes the event to a local outbox table in the same database transaction as the state change, and a separate process reads the outbox and publishes to the channel. This guarantees that the event is emitted if and only if the state change is committed.
Dead letter queues (DLQ): Messages that fail processing after a configurable number of retries are moved to a DLQ for manual inspection and reprocessing. Without a DLQ, poison messages (events that always fail) block the consumer indefinitely. Idempotent consumers: Because channels may deliver the same event more than once (at-least-once delivery), consumers must handle duplicates safely. This typically means checking an event ID against a processed-events table before executing side effects.
Event Sourcing and CQRS
Event sourcing takes the event-driven concept to its logical conclusion: instead of storing the current state of an entity, you store the sequence of events that produced it. An account balance is not stored as a single number; it is derived by replaying all deposit and withdrawal events. This provides a complete audit trail, enables temporal queries (what was the balance at time T?), and supports event replay for debugging and data recovery.
CQRS (Command Query Responsibility Segregation) separates the write model (commands that produce events) from the read model (materialized views optimized for queries). Events bridge the two: when a command produces a new event, projections consume the event and update the read model. This separation allows the write side to be optimized for consistency and the read side to be optimized for query performance, which is essential for high-throughput systems processing hundreds of thousands of events per second.
Observability in Event-Driven Systems
Traditional request-response systems have natural observability through synchronous call chains and stack traces. Event-driven systems require deliberate observability design. Every event should carry a correlation ID that links all events belonging to the same workflow. Distributed tracing tools (Jaeger, Zipkin, AWS X-Ray) use correlation IDs to reconstruct the full event flow across services. Consumer lag (the difference between the latest event produced and the latest event consumed) is the primary health metric for event-driven systems. High lag indicates that consumers are falling behind, which can lead to stale data and violated SLAs.