Backend Architecture

Microservices & Event-Driven Architecture

A practical guide to building resilient, scalable distributed systems. From event sourcing and CQRS to circuit breakers and distributed tracing — every pattern explained with real TypeScript code.

Amit Sharma
Amit SharmaSoftware Engineer, Backend Architecture at TurboDocx
March 23, 202622 min read

Microservices promise independent deployability, technology diversity, and team autonomy. But without the right architectural patterns, they deliver something very different: a distributed monolith with all the complexity of a network and none of the simplicity of a single process.

This guide covers the patterns that separate successful microservice architectures from painful ones. Whether you're building a high-throughput API platform or a document automation pipeline, these patterns will help you design systems that scale without becoming an operational nightmare.

PatternCategoryWhen To UseComplexity
Event SourcingDataFull audit trail, temporal queries, replay capabilityHigh
CQRSArchitectureSeparate read/write models for performanceHigh
Saga PatternTransactionsDistributed transactions across servicesHigh
Event-Driven MessagingCommunicationAsync, decoupled service communicationMedium
API GatewayInfrastructureSingle entry point, routing, auth aggregationMedium
Circuit BreakerResiliencePrevent cascade failures across servicesMedium
Service MeshInfrastructureTraffic management, mTLS, observability at scaleHigh
Distributed TracingObservabilityEnd-to-end request visibility across servicesMedium

Part 1: Microservices Fundamentals

Single Responsibility & Domain-Driven Design

The foundation of microservices is getting service boundaries right. Each service should own a single bounded context — a self-contained domain with its own data model, business rules, and language. If two services need to share a database table, they probably belong in the same service.

Domain-Driven Design (DDD) provides the vocabulary: aggregates, entities, value objects, and domain events. You don't need to adopt the entire DDD framework, but understanding bounded contexts is essential. At TurboDocx, our document generation engine and user management system are separate bounded contexts with clear APIs between them.

* Code examples throughout this guide are simplified for illustrative purposes. Refer to the linked official documentation for complete API references and production-ready configurations.

Service Structure: A Practical Starting Point

A well-structured microservice separates transport (HTTP, gRPC), business logic, and data access into distinct layers. Here is a minimal Express.js microservice that follows this layering:

// order-service/src/app.ts — Transport Layer
import express from 'express';
import { OrderController } from './controllers/order.controller';
import { OrderService } from './services/order.service';
import { OrderRepository } from './repositories/order.repository';
import { KafkaProducer } from './events/kafka.producer';
const app = express();
app.use(express.json());
// Dependency injection — wire layers together
const producer = new KafkaProducer({ brokers: ['kafka:9092'] });
const repository = new OrderRepository(database);
const service = new OrderService(repository, producer);
const controller = new OrderController(service);
// Routes map to controller methods
app.post('/orders', (req, res) => controller.create(req, res));
app.get('/orders/:id', (req, res) => controller.getById(req, res));
app.patch('/orders/:id/status', (req, res) => controller.updateStatus(req, res));
// Health check — essential for orchestrators
app.get('/health', (req, res) => res.json({
status: 'healthy',
service: 'order-service',
timestamp: new Date().toISOString(),
uptime: process.uptime(),
}));
app.listen(3001, () => console.log('Order service running on :3001'));
// order-service/src/services/order.service.ts — Business Logic Layer
import { OrderRepository } from '../repositories/order.repository';
import { KafkaProducer } from '../events/kafka.producer';
import { Order, CreateOrderDTO } from '../types';
export class OrderService {
constructor(
private readonly repository: OrderRepository,
private readonly producer: KafkaProducer,
) {}
async createOrder(dto: CreateOrderDTO): Promise<Order> {
// Business rules live here, not in the controller
if (dto.items.length === 0) {
throw new ValidationError('Order must have at least one item');
}
const total = dto.items.reduce(
(sum, item) => sum + item.price * item.quantity, 0
);
const order = await this.repository.create({
...dto,
total,
status: 'pending',
createdAt: new Date(),
});
// Publish domain event — other services react asynchronously
await this.producer.publish('order.created', {
orderId: order.id,
customerId: dto.customerId,
total,
items: dto.items,
timestamp: new Date().toISOString(),
});
return order;
}
}

Avoid the distributed monolith

If every request to Service A requires a synchronous call to Service B, C, and D before it can respond, you have a distributed monolith. Services should be able to operate independently. Use events and eventual consistency to decouple them, and accept that some data will be stale for milliseconds.

Part 2: Event-Driven Patterns

Event Sourcing

Instead of storing the current state of an entity, event sourcing stores the sequence of events that produced that state. Every state change is an immutable event appended to a log. To reconstruct the current state, you replay the events from the beginning.

This gives you a complete audit trail for free, the ability to replay events to rebuild projections, and temporal queries (“what was the state at 3pm yesterday?”). It is particularly valuable for e-signature workflows where you need a legally defensible record of every action.

// Event Sourcing: store events, derive state
interface DomainEvent {
id: string;
aggregateId: string;
type: string;
payload: Record<string, unknown>;
timestamp: Date;
version: number;
correlationId?: string;
}
class EventStore {
private events: Map<string, DomainEvent[]> = new Map();
async append(aggregateId: string, event: DomainEvent): Promise<void> {
const stream = this.events.get(aggregateId) || [];
// Optimistic concurrency — reject if version conflicts
if (stream.length > 0 && stream[stream.length - 1].version >= event.version) {
throw new ConcurrencyError('Event version conflict');
}
stream.push(event);
this.events.set(aggregateId, stream);
}
async getEvents(aggregateId: string): Promise<DomainEvent[]> {
return this.events.get(aggregateId) || [];
}
}
// Rebuild state by replaying events
class OrderAggregate {
id: string = '';
status: string = 'unknown';
total: number = 0;
items: Array<{ productId: string; quantity: number }> = [];
apply(event: DomainEvent): void {
switch (event.type) {
case 'OrderCreated':
this.id = event.aggregateId;
this.status = 'pending';
this.items = event.payload.items as typeof this.items;
this.total = event.payload.total as number;
break;
case 'OrderConfirmed':
this.status = 'confirmed';
break;
case 'OrderShipped':
this.status = 'shipped';
break;
case 'OrderCancelled':
this.status = 'cancelled';
break;
}
}
static fromEvents(events: DomainEvent[]): OrderAggregate {
const aggregate = new OrderAggregate();
events.forEach(event => aggregate.apply(event));
return aggregate;
}
}

CQRS (Command Query Responsibility Segregation)

CQRS separates your system into two sides: the command side that handles writes and enforces business rules, and the query side that handles reads with optimized, denormalized views. This lets you scale reads and writes independently and optimize each for its specific workload.

CQRS pairs naturally with Event Sourcing. The command side writes events, and the query side builds read-optimized projections from those events. This is the architecture behind systems that need to handle high read throughput while maintaining strict write consistency — like a sales quoting platform where many users read proposals but writes require validation.

// CQRS: separate command and query models
// --- Command Side: enforces business rules, writes events ---
class OrderCommandHandler {
constructor(
private eventStore: EventStore,
private eventBus: EventBus,
) {}
async handle(command: CreateOrderCommand): Promise<string> {
// Validate business rules
if (command.items.length === 0) {
throw new ValidationError('Order must contain at least one item');
}
const orderId = generateId();
const event: DomainEvent = {
id: generateId(),
aggregateId: orderId,
type: 'OrderCreated',
payload: {
customerId: command.customerId,
items: command.items,
total: command.items.reduce((sum, i) => sum + i.price * i.quantity, 0),
},
timestamp: new Date(),
version: 1,
};
await this.eventStore.append(orderId, event);
await this.eventBus.publish(event); // Notify query side
return orderId;
}
}
// --- Query Side: read-optimized, denormalized views ---
class OrderProjection {
private views: Map<string, OrderView> = new Map();
// Event handler builds the read model
handleOrderCreated(event: DomainEvent): void {
this.views.set(event.aggregateId, {
id: event.aggregateId,
customerId: event.payload.customerId as string,
status: 'pending',
total: event.payload.total as number,
itemCount: (event.payload.items as unknown[]).length,
createdAt: event.timestamp,
updatedAt: event.timestamp,
});
}
handleOrderShipped(event: DomainEvent): void {
const view = this.views.get(event.aggregateId);
if (view) {
view.status = 'shipped';
view.updatedAt = event.timestamp;
}
}
// Queries are fast — no joins, no aggregation
async getOrder(id: string): Promise<OrderView | undefined> {
return this.views.get(id);
}
async getOrdersByCustomer(customerId: string): Promise<OrderView[]> {
return [...this.views.values()].filter(v => v.customerId === customerId);
}
}

Saga Pattern for Distributed Transactions

In a monolith, you wrap multiple database operations in a single transaction. In microservices, each service owns its database, so traditional ACID transactions don't work across service boundaries. The Saga pattern solves this by breaking a distributed transaction into a sequence of local transactions, each with a compensating action that undoes the work if a later step fails.

// Orchestration-based Saga: a central coordinator manages the flow
class OrderSaga {
constructor(
private orderService: OrderServiceClient,
private paymentService: PaymentServiceClient,
private inventoryService: InventoryServiceClient,
private notificationService: NotificationServiceClient,
) {}
async execute(command: PlaceOrderCommand): Promise<SagaResult> {
const compensations: Array<() => Promise<void>> = [];
try {
// Step 1: Create order
const order = await this.orderService.create(command);
compensations.push(() => this.orderService.cancel(order.id));
// Step 2: Reserve inventory
const reservation = await this.inventoryService.reserve(order.items);
compensations.push(() => this.inventoryService.release(reservation.id));
// Step 3: Process payment
const payment = await this.paymentService.charge({
orderId: order.id,
amount: order.total,
customerId: command.customerId,
});
compensations.push(() => this.paymentService.refund(payment.id));
// Step 4: Confirm order (no compensation needed — this is the final step)
await this.orderService.confirm(order.id);
// Step 5: Send notification (best-effort, don't compensate on failure)
await this.notificationService.send({
type: 'order.confirmed',
customerId: command.customerId,
orderId: order.id,
}).catch(err => console.error('Notification failed:', err));
return { success: true, orderId: order.id };
} catch (error) {
// Compensate in reverse order
console.error('Saga failed, compensating:', error);
for (const compensate of [...compensations].reverse()) {
try {
await compensate();
} catch (compError) {
// Log and alert — manual intervention may be needed
console.error('Compensation failed:', compError);
}
}
return { success: false, error: (error as Error).message };
}
}
}

Orchestration vs. Choreography

The code above uses orchestration — a central saga coordinator drives the flow. The alternative is choreography, where each service listens for events and decides what to do next. Orchestration is easier to understand and debug; choreography scales better and avoids a single point of failure. Most production systems use orchestration for critical business flows and choreography for everything else.

Part 3: Message Brokers & Communication

Kafka vs. RabbitMQ: Choosing the Right Broker

The two most popular message brokers serve different purposes. Kafka is a distributed log optimized for high-throughput event streaming and replay. RabbitMQ is a traditional message queue optimized for message routing and delivery guarantees. Your choice depends on your primary use case.

CriteriaApache KafkaRabbitMQ
ModelDistributed commit logMessage queue with exchanges
ThroughputMillions of msgs/secTens of thousands/sec
ReplayYes (offset-based)No (consumed = gone)
RoutingTopic-based, partitionedFlexible exchange types
Best ForEvent streaming, logs, analyticsTask queues, RPC, complex routing
OrderingPer-partition guaranteePer-queue guarantee

Kafka Producer/Consumer in Node.js

Here is a production-ready Kafka setup using kafkajs. The producer publishes domain events, and consumers process them asynchronously. This is the backbone of event-driven communication between your API integrations.

// shared/kafka.ts — Kafka client configuration
import { Kafka, Producer, Consumer, EachMessagePayload } from 'kafkajs';
const kafka = new Kafka({
clientId: 'order-service',
brokers: (process.env.KAFKA_BROKERS || 'localhost:9092').split(','),
retry: { initialRetryTime: 100, retries: 8 },
});
// --- Producer: publish domain events ---
export class EventProducer {
private producer: Producer;
constructor() {
this.producer = kafka.producer({
allowAutoTopicCreation: false,
});
}
async connect(): Promise<void> {
await this.producer.connect();
console.log('Kafka producer connected');
}
async publish(topic: string, event: DomainEvent): Promise<void> {
await this.producer.send({
topic,
messages: [{
key: event.aggregateId, // Ensures ordering per aggregate
value: JSON.stringify(event),
headers: {
'event-type': Buffer.from(event.type),
'correlation-id': Buffer.from(event.correlationId || ''),
'timestamp': Buffer.from(event.timestamp.toISOString()),
},
}],
});
}
async disconnect(): Promise<void> {
await this.producer.disconnect();
}
}
// --- Consumer: process events with error handling ---
export class EventConsumer {
private consumer: Consumer;
constructor(groupId: string) {
this.consumer = kafka.consumer({
groupId,
sessionTimeout: 30000,
heartbeatInterval: 3000,
});
}
async subscribe(
topic: string,
handler: (event: DomainEvent) => Promise<void>,
): Promise<void> {
await this.consumer.connect();
await this.consumer.subscribe({ topics: [topic], fromBeginning: false });
await this.consumer.run({
eachMessage: async ({ message, partition, topic: t }: EachMessagePayload) => {
const event = JSON.parse(message.value!.toString()) as DomainEvent;
try {
await handler(event);
} catch (error) {
console.error(`Failed to process event on ${t}[${partition}]:`, error);
// Dead letter queue for failed messages
await this.publishToDeadLetter(t, message, error as Error);
}
},
});
}
private async publishToDeadLetter(
originalTopic: string,
message: unknown,
error: Error,
): Promise<void> {
// Publish to DLQ for manual inspection and replay
console.error(`DLQ: ${originalTopic}${error.message}`);
}
}

Docs: KafkaJS

Async vs. sync: the golden rule

Use synchronous communication (HTTP/gRPC) only when the caller genuinely needs a response to continue. For everything else — notifications, analytics, projection updates, cross-service side effects — use asynchronous events. This keeps services decoupled and resilient. If operations teams need to generate a document after an order is placed, that should be an event consumer, not a synchronous call in the order flow.

Part 4: Resilience Patterns

Circuit Breaker Pattern

When a downstream service is failing, continuing to send requests to it makes things worse — you exhaust connection pools, increase latency, and create cascading failures. The circuit breaker pattern stops this by “opening” the circuit after a threshold of failures, returning a fast fallback response instead of waiting for timeouts.

// Circuit Breaker: prevent cascade failures
enum CircuitState {
CLOSED = 'CLOSED', // Normal operation — requests pass through
OPEN = 'OPEN', // Failing — requests are blocked, return fallback
HALF_OPEN = 'HALF_OPEN' // Testing — one request allowed to check recovery
}
class CircuitBreaker {
private state: CircuitState = CircuitState.CLOSED;
private failureCount = 0;
private lastFailureTime: number = 0;
private successCount = 0;
constructor(
private readonly options: {
failureThreshold: number; // Open after this many failures
recoveryTimeout: number; // Try again after this many ms
successThreshold: number; // Close after this many successes in half-open
}
) {}
async execute<T>(
operation: () => Promise<T>,
fallback?: () => T,
): Promise<T> {
if (this.state === CircuitState.OPEN) {
if (Date.now() - this.lastFailureTime > this.options.recoveryTimeout) {
this.state = CircuitState.HALF_OPEN;
this.successCount = 0;
} else {
if (fallback) return fallback();
throw new CircuitOpenError('Circuit is OPEN — service unavailable');
}
}
try {
const result = await operation();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
if (fallback) return fallback();
throw error;
}
}
private onSuccess(): void {
if (this.state === CircuitState.HALF_OPEN) {
this.successCount++;
if (this.successCount >= this.options.successThreshold) {
this.state = CircuitState.CLOSED;
this.failureCount = 0;
}
} else if (this.state === CircuitState.CLOSED) {
this.failureCount = 0;
}
}
private onFailure(): void {
this.failureCount++;
this.lastFailureTime = Date.now();
if (this.failureCount >= this.options.failureThreshold) {
this.state = CircuitState.OPEN;
}
}
getState(): CircuitState { return this.state; }
}
// Usage
const paymentBreaker = new CircuitBreaker({
failureThreshold: 5,
recoveryTimeout: 30000,
successThreshold: 3,
});
const result = await paymentBreaker.execute(
() => paymentService.charge(order),
() => ({ status: 'queued', message: 'Payment will be retried' }),
);

Retry with Exponential Backoff

Transient failures — network blips, momentary overloads, DNS hiccups — are the norm in distributed systems. A retry with exponential backoff and jitter gives the downstream service time to recover without creating a thundering herd.

// Retry with exponential backoff and jitter
async function withRetry<T>(
operation: () => Promise<T>,
options: {
maxRetries: number;
baseDelay: number; // Initial delay in ms
maxDelay: number; // Cap the delay
jitter: boolean; // Add randomness to prevent thundering herd
} = { maxRetries: 3, baseDelay: 1000, maxDelay: 30000, jitter: true },
): Promise<T> {
let lastError: Error | undefined;
for (let attempt = 0; attempt <= options.maxRetries; attempt++) {
try {
return await operation();
} catch (error) {
lastError = error as Error;
if (attempt === options.maxRetries) break;
// Don't retry non-transient errors
if ((error as { status?: number }).status === 400) throw error;
const delay = Math.min(
options.baseDelay * Math.pow(2, attempt),
options.maxDelay,
);
const finalDelay = options.jitter
? delay * (0.5 + Math.random() * 0.5)
: delay;
console.warn(
`Retry ${attempt + 1}/${options.maxRetries} after ${Math.round(finalDelay)}ms`
);
await new Promise(resolve => setTimeout(resolve, finalDelay));
}
}
throw lastError;
}

Bulkhead Pattern

Named after ship compartments that prevent a single breach from sinking the whole vessel, the bulkhead pattern isolates failures by giving each dependency its own resource pool. If the payment service is slow, it exhausts its own connection pool without affecting the inventory service's pool.

// Bulkhead: isolate resource pools per dependency
class Bulkhead {
private activeRequests = 0;
private queue: Array<{
resolve: (value: void) => void;
reject: (error: Error) => void;
}> = [];
constructor(
private readonly maxConcurrent: number,
private readonly maxQueue: number,
) {}
async execute<T>(operation: () => Promise<T>): Promise<T> {
if (this.activeRequests >= this.maxConcurrent) {
if (this.queue.length >= this.maxQueue) {
throw new BulkheadFullError(
`Bulkhead full: ${this.activeRequests} active, ${this.queue.length} queued`
);
}
// Wait for a slot to open
await new Promise<void>((resolve, reject) => {
this.queue.push({ resolve, reject });
});
}
this.activeRequests++;
try {
return await operation();
} finally {
this.activeRequests--;
if (this.queue.length > 0) {
const next = this.queue.shift()!;
next.resolve();
}
}
}
}
// Separate bulkheads per dependency
const paymentBulkhead = new Bulkhead(10, 50); // Max 10 concurrent, 50 queued
const inventoryBulkhead = new Bulkhead(20, 100);
const notificationBulkhead = new Bulkhead(5, 20);
// Payment slowness won't starve inventory requests
const payment = await paymentBulkhead.execute(() => paymentService.charge(order));
const stock = await inventoryBulkhead.execute(() => inventoryService.check(items));

Layer your resilience

These patterns compose. Wrap your HTTP client in a bulkhead, add a circuit breaker around that, and use retry with backoff for transient failures. The order matters: Bulkhead → Circuit Breaker → Retry → Timeout. Libraries like cockatiel (Node.js) and Polly (.NET) provide composable resilience policies out of the box.

Part 5: Observability & Monitoring

In a monolith, you can follow a request through the code with a debugger. In microservices, a single user action may touch five services, three message brokers, and two databases. Without observability, debugging is guesswork. The three pillars of observability are distributed tracing, structured logging, and metrics.

Distributed Tracing with OpenTelemetry

OpenTelemetry (OTel) is the industry standard for distributed tracing. It propagates a trace ID across service boundaries so you can see the full lifecycle of a request — which services it touched, how long each step took, and where it failed. This is invaluable when debugging why a template generation request is timing out.

// OpenTelemetry setup for a Node.js microservice
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import { resourceFromAttributes } from '@opentelemetry/resources';
import { ATTR_SERVICE_NAME, ATTR_SERVICE_VERSION } from '@opentelemetry/semantic-conventions';
const sdk = new NodeSDK({
resource: resourceFromAttributes({
[ATTR_SERVICE_NAME]: 'order-service',
[ATTR_SERVICE_VERSION]: '1.4.0',
environment: process.env.NODE_ENV || 'development',
}),
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://jaeger:4318/v1/traces',
}),
metricReaders: [
new PeriodicExportingMetricReader({
exporter: new OTLPMetricExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://jaeger:4318/v1/metrics',
}),
exportIntervalMillis: 15000,
}),
],
instrumentations: [
getNodeAutoInstrumentations({
// Auto-instrument Express, HTTP, Kafka, database drivers
'@opentelemetry/instrumentation-express': { enabled: true },
'@opentelemetry/instrumentation-http': { enabled: true },
'@opentelemetry/instrumentation-kafkajs': { enabled: true },
}),
],
});
sdk.start();
console.log('OpenTelemetry tracing initialized');
// Graceful shutdown
process.on('SIGTERM', async () => {
await sdk.shutdown();
process.exit(0);
});

Docs: OpenTelemetry JS SDK

Structured Logging

Plain-text logs are impossible to search at scale. Structured logging outputs JSON with consistent fields — correlation IDs, service names, timestamps, and context — so your log aggregator (ELK, Datadog, Grafana Loki) can index and query them.

// Structured logger with correlation ID propagation
import { trace } from '@opentelemetry/api';
interface LogContext {
service: string;
correlationId?: string;
traceId?: string;
spanId?: string;
[key: string]: unknown;
}
class StructuredLogger {
constructor(private readonly service: string) {}
private getTraceContext(): Partial<LogContext> {
const span = trace.getActiveSpan();
if (!span) return {};
const ctx = span.spanContext();
return { traceId: ctx.traceId, spanId: ctx.spanId };
}
info(message: string, data?: Record<string, unknown>): void {
console.log(JSON.stringify({
level: 'info',
message,
service: this.service,
timestamp: new Date().toISOString(),
...this.getTraceContext(),
...data,
}));
}
error(message: string, error: Error, data?: Record<string, unknown>): void {
console.error(JSON.stringify({
level: 'error',
message,
service: this.service,
timestamp: new Date().toISOString(),
error: { name: error.name, message: error.message, stack: error.stack },
...this.getTraceContext(),
...data,
}));
}
warn(message: string, data?: Record<string, unknown>): void {
console.warn(JSON.stringify({
level: 'warn',
message,
service: this.service,
timestamp: new Date().toISOString(),
...this.getTraceContext(),
...data,
}));
}
}
// Usage
const logger = new StructuredLogger('order-service');
logger.info('Order created', { orderId: '123', customerId: '456', total: 99.99 });
// => {"level":"info","message":"Order created","service":"order-service",
// "timestamp":"2026-03-16T...","traceId":"abc...","orderId":"123",...}

Docs: OpenTelemetry JS Instrumentation

Health Checks

Container orchestrators like Kubernetes rely on health checks to decide whether a service is ready to receive traffic. A proper health check verifies not just that the process is running, but that it can reach its dependencies.

// Comprehensive health check endpoint
app.get('/health', async (req, res) => {
const checks = {
database: false,
kafka: false,
redis: false,
};
// Check each dependency in parallel
const [dbOk, kafkaOk, redisOk] = await Promise.allSettled([
database.query('SELECT 1').then(() => true),
kafkaAdmin.listTopics().then(() => true),
redis.ping().then(() => true),
]);
checks.database = dbOk.status === 'fulfilled' && dbOk.value;
checks.kafka = kafkaOk.status === 'fulfilled' && kafkaOk.value;
checks.redis = redisOk.status === 'fulfilled' && redisOk.value;
const healthy = Object.values(checks).every(Boolean);
res.status(healthy ? 200 : 503).json({
status: healthy ? 'healthy' : 'degraded',
service: 'order-service',
version: process.env.APP_VERSION || '1.0.0',
timestamp: new Date().toISOString(),
uptime: process.uptime(),
checks,
});
});
// Kubernetes probes:
// livenessProbe: GET /health — restart if unhealthy
// readinessProbe: GET /health — stop routing traffic if degraded

The observability stack

A production-ready observability stack: OpenTelemetry for tracing, Prometheus + Grafana for metrics, and ELK or Grafana Loki for logs. All three should share the same trace ID so you can jump from a metric spike to the trace that caused it to the log entry that explains it. This correlation is what makes distributed systems debuggable.

Key Takeaways

Start with Domain Boundaries

Use Domain-Driven Design to identify bounded contexts before splitting into services. Wrong boundaries create distributed monoliths.

Events as First-Class Citizens

Model your system around domain events. Event Sourcing and CQRS unlock audit trails, temporal queries, and independent scaling.

Design for Failure

Circuit breakers, retries with backoff, and bulkheads are not optional. Every network call will eventually fail.

Observability is Non-Negotiable

Distributed tracing, structured logging, and health checks turn opaque microservice systems into debuggable ones.

When to choose microservices

Microservices are not the default choice. If your team is small (under 10 engineers), your domain is well-understood, and you don't need independent scaling or deployment, a well-structured modular monolith is simpler and faster to build. Start monolithic, identify natural boundaries, and extract services only when you have a clear reason — like independent scaling, technology diversity, or team autonomy. The patterns in this guide apply to modular monoliths too; the only difference is that the “network” between modules is an in-process function call.

Applying These Patterns to Document Automation

Document automation platforms are a natural fit for event-driven microservices. Template rendering, PDF generation, e-signature tracking, and notification delivery are all independent bounded contexts that benefit from asynchronous communication. An order-confirmed event can trigger document generation, which triggers a signature request, which triggers a notification — all without tight coupling between services.

At TurboDocx, our API and SDK are designed so developers can integrate document generation as an event consumer in their existing microservice architecture. Whether you're using n8n automation workflows or a custom Kafka pipeline, document generation fits naturally into an event-driven flow.

Related Resources

Build Event-Driven Document Pipelines

Our API and SDK integrate seamlessly into your microservice architecture — trigger document generation from Kafka events, webhooks, or any async pipeline.

Amit Sharma
Amit SharmaSoftware Engineer, Backend Architecture at TurboDocx