DevOps8 min read

Kafka vs RabbitMQ: Event-Driven Microservices Reality Check

I've shipped event-driven architectures with both Kafka and RabbitMQ. Here's what actually matters when you're building microservices in production.

Jay Salot

Senior Full Stack AI Engineer

June 5, 2026 · 8 min read

Last year, I inherited a microservices platform running on RabbitMQ that was choking under load. The business wanted real-time analytics, and suddenly we needed to replay events from hours ago. That's when I learned the hard way that not all message brokers are created equal.

If you're building microservices with Node.js and TypeScript, you'll eventually hit the event-driven architecture decision. Kafka and RabbitMQ both solve the "how do my services talk to each other" problem, but they approach it from completely different angles. I've deployed both in production, and honestly, the choice matters more than most blog posts admit.

Why Event-Driven Architecture for Microservices

Before we get into the broker battle, let's talk about why you'd go event-driven in the first place. In a traditional REST-based microservices setup, services call each other directly. Order service needs to notify the inventory service? HTTP POST. Sounds simple until you have 15 services and you're debugging a cascade of failures at 2 AM.

Event-driven flips this. Services publish events ("order created", "payment processed") to a message broker. Other services subscribe to events they care about. The services don't know about each other. They know about events.

The wins are real:

Decoupling: Services don't need to know who consumes their events
Resilience: If the inventory service is down, the order service doesn't care
Scalability: Add consumers without touching publishers
Audit trail: Every event is a historical record

But here's the gotcha: you've traded synchronous complexity for asynchronous complexity. Debugging distributed async flows is harder than debugging HTTP calls. Much harder.

RabbitMQ: The Smart Broker

RabbitMQ is what I call a "smart broker". It handles routing, queuing, and delivery guarantees for you. It's built on AMQP and has been around since 2007. Rock solid.

RabbitMQ's Architecture Model

RabbitMQ uses exchanges, queues, and bindings. Publishers send messages to exchanges. Exchanges route to queues based on routing keys. Consumers read from queues. The broker tracks which messages have been acknowledged.

Here's a basic producer in Node.js with TypeScript:

import amqp from 'amqplib';

interface OrderCreatedEvent {
  orderId: string;
  userId: string;
  total: number;
  timestamp: Date;
}

async function publishOrderCreated(event: OrderCreatedEvent) {
  const connection = await amqp.connect('amqp://localhost');
  const channel = await connection.createChannel();
  
  const exchange = 'orders';
  await channel.assertExchange(exchange, 'topic', { durable: true });
  
  const routingKey = 'order.created';
  channel.publish(
    exchange,
    routingKey,
    Buffer.from(JSON.stringify(event)),
    { persistent: true }
  );
  
  console.log(`Published order.created: ${event.orderId}`);
  
  await channel.close();
  await connection.close();
}

And a consumer:

async function consumeOrderEvents() {
  const connection = await amqp.connect('amqp://localhost');
  const channel = await connection.createChannel();
  
  const exchange = 'orders';
  const queue = 'inventory-service-orders';
  
  await channel.assertExchange(exchange, 'topic', { durable: true });
  await channel.assertQueue(queue, { durable: true });
  await channel.bindQueue(queue, exchange, 'order.*');
  
  channel.consume(queue, async (msg) => {
    if (!msg) return;
    
    const event: OrderCreatedEvent = JSON.parse(msg.content.toString());
    
    try {
      await processOrderEvent(event);
      channel.ack(msg);
    } catch (error) {
      console.error('Processing failed:', error);
      channel.nack(msg, false, true); // requeue
    }
  });
}

Where RabbitMQ Shines

RabbitMQ is great when you need flexible routing. The exchange types (direct, topic, fanout, headers) give you fine-grained control. I used topic exchanges on a project where different services needed different subsets of user events. Works beautifully.

It also handles backpressure well. If consumers are slow, RabbitMQ will push back on publishers. In practice, this saved us when a payment processing service got hammered and couldn't keep up.

The management UI is actually useful. You can see queue depths, message rates, and even republish messages for testing. The ops team loves it.

RabbitMQ Pain Points

Message retention is where RabbitMQ shows its age. Once a message is acknowledged, it's gone. You can't replay events from yesterday. We ended up having to build a separate event store for audit and replay, which felt like solving the problem twice.

Clustering is another headache. RabbitMQ clustering works, but it's not designed for massive scale. We ran three nodes for HA, and honestly, the cluster management was finicky. Network partitions require manual intervention.

Performance tops out around 50K messages per second on decent hardware. For most apps, that's plenty. But if you're doing high-throughput analytics or event sourcing, you'll hit the ceiling.

Kafka: The Distributed Log

Kafka isn't really a message queue. It's a distributed commit log. That's a mental shift, and it matters.

Kafka's Architecture Model

In Kafka, you have topics divided into partitions. Producers append messages to partitions. Consumers read from partitions at their own pace using offsets. The broker is "dumb" – it just stores messages for a configured retention period. Consumers manage their own state.

Here's a producer using KafkaJS:

import { Kafka } from 'kafkajs';

interface OrderCreatedEvent {
  orderId: string;
  userId: string;
  total: number;
  timestamp: Date;
}

const kafka = new Kafka({
  clientId: 'order-service',
  brokers: ['localhost:9092'],
});

const producer = kafka.producer();

async function publishOrderCreated(event: OrderCreatedEvent) {
  await producer.connect();
  
  await producer.send({
    topic: 'order-events',
    messages: [
      {
        key: event.orderId, // ensures ordering per order
        value: JSON.stringify(event),
        headers: {
          eventType: 'order.created',
        },
      },
    ],
  });
  
  console.log(`Published order.created: ${event.orderId}`);
}

Consumer with consumer groups:

const consumer = kafka.consumer({ groupId: 'inventory-service' });

async function consumeOrderEvents() {
  await consumer.connect();
  await consumer.subscribe({ topic: 'order-events', fromBeginning: false });
  
  await consumer.run({
    eachMessage: async ({ topic, partition, message }) => {
      const event: OrderCreatedEvent = JSON.parse(message.value!.toString());
      
      try {
        await processOrderEvent(event);
        // offset committed automatically by default
      } catch (error) {
        console.error('Processing failed:', error);
        // handle retry logic - Kafka doesn't requeue
        throw error; // will pause partition
      }
    },
  });
}

Where Kafka Dominates

Event replay changed everything for us. When we migrated to Kafka, we set retention to 7 days. New services can consume historical events. Analytics can reprocess yesterday's data. It's a game changer for event sourcing patterns.

Throughput is insane. We're processing 500K+ messages per second on a modest three-broker cluster. Kafka is built for scale. LinkedIn runs trillions of messages per day on it.

The consumer group model is elegant. Add more consumers to a group, and Kafka automatically rebalances partitions. Horizontal scaling just works.

Kafka Connect and Kafka Streams are powerful. We use Connect to sync Kafka topics to BigQuery for analytics. No custom code needed.

Kafka Pain Points

Operations are not simple. Running Kafka in production means running ZooKeeper (or KRaft in newer versions). You need to understand partitions, replication, ISRs, and compaction. The learning curve is real.

Message ordering is per-partition only. If you need global ordering across all messages, you're stuck with a single partition, which kills parallelism. We use message keys to ensure ordering per entity (per user, per order), which works but requires careful design.

No message routing like RabbitMQ exchanges. You filter on the consumer side. For complex routing needs, you end up with more topics or custom consumer logic.

Kafka really wants you to commit offsets manually for exactly-once semantics. The default auto-commit can lead to message loss or duplicates. This bit me hard on a payment processing service. You need idempotent consumers anyway, but still.

When to Choose Which

After shipping both in production, here's my decision tree:

Choose RabbitMQ when:

You need flexible message routing (topic exchanges are powerful)
You want simpler operations and don't have dedicated platform engineers
Your message volume is under 50K/sec
You need traditional queue semantics (work queues, priority queues)
You don't need message replay or event sourcing

Choose Kafka when:

You need event replay or event sourcing patterns
You're dealing with high throughput (100K+ messages/sec)
You want to build real-time analytics or stream processing
You have the ops expertise to run it (or use a managed service)
Multiple consumers need to read the same events independently

Hybrid Approaches in Practice

On my current project, we actually run both. Kafka handles the main event backbone – order events, user events, analytics. RabbitMQ handles work queues for things like email sending and image processing.

Why? Kafka's great for the event log pattern, but for a simple "process this job exactly once and delete it" task, RabbitMQ's acknowledgment model is cleaner. We use Kafka for events (things that happened) and RabbitMQ for commands (things to do).

The operational overhead isn't as bad as it sounds. We run both on Kubernetes with Helm charts. Kafka on managed GCP (Confluent Cloud), RabbitMQ self-hosted because it's so lightweight.

Observability and Debugging

This is where event-driven architectures get painful. You lose the nice stack traces of synchronous calls.

For Kafka, we use:

Kafdrop for browsing topics and messages during development
Prometheus + Grafana for metrics (consumer lag is critical)
Distributed tracing with OpenTelemetry to track events across services

For RabbitMQ:

The built-in management UI (actually quite good)
CloudAMQP's monitoring if you're using their hosted service
Dead letter queues for failed messages – essential for debugging

Pro tip: always include a correlation ID in your events. We use UUIDs generated at the API gateway. Being able to trace an event flow from HTTP request through 5 services to final output saves hours of debugging.

Key Takeaways

Event-driven architecture with Kafka or RabbitMQ transforms how your microservices communicate, but it's not a silver bullet. The asynchronous nature adds complexity you need to be ready for.

RabbitMQ is the pragmatic choice for most teams. It's easier to operate, has flexible routing, and handles typical microservices workloads well. If you're not doing high-scale event sourcing or stream processing, start here.

Kafka is the right tool when you need scale, event replay, or stream processing. But it requires more operational maturity. Don't choose it because it's trendy – choose it because you need what it offers.

In my experience, the winning pattern is often hybrid: Kafka for your event backbone and analytics, RabbitMQ for task queues. But that's only worth the complexity if you're at serious scale.

The most important thing? Pick one, learn it deeply, and build proper observability from day one. Event-driven systems are powerful, but they're only as good as your ability to understand what's happening in production.

#Kafka#RabbitMQ#Microservices#Event-Driven Architecture#Node.js

DevOps

July 13, 20268 min read

Docker and Kubernetes for Full-Stack JS Developers

A practical guide to containerization and orchestration from a developer who learned the hard way. Real examples, gotchas, and deployment patterns.

DockerKubernetesDevOps+2

DevOps

May 18, 20267 min read

Monorepo Architecture: Turborepo & Nx for Large Apps

Explore monorepo architecture with Turborepo and Nx for building scalable JavaScript and TypeScript applications. Learn practical tips and trade-offs from real-world experience.

monorepoarchitectureTurborepo+3

DevOps

March 23, 20269 min read

GoDaddy: A Developer's Perspective on Hosting & Domains

A seasoned developer's honest review of GoDaddy, covering domains, hosting, WordPress, and its place in the modern tech landscape. Learn the pros and cons and make informed decisions.

GoDaddyhostingdomains+2

Why Event-Driven Architecture for Microservices

RabbitMQ: The Smart Broker

RabbitMQ's Architecture Model

Where RabbitMQ Shines

RabbitMQ Pain Points

Kafka: The Distributed Log

Kafka's Architecture Model

Where Kafka Dominates

Kafka Pain Points

When to Choose Which

Hybrid Approaches in Practice

Observability and Debugging

Key Takeaways

Related Articles

Docker and Kubernetes for Full-Stack JS Developers

Monorepo Architecture: Turborepo & Nx for Large Apps

GoDaddy: A Developer's Perspective on Hosting & Domains