What is Apache Kafka?
Apache Kafka is a distributed event streaming platform built around one deceptively simple data structure: an append-only commit log. Producers write events to the end of the log, consumers read forward at their own pace, and the broker stores everything durably for a configured retention period. This log-centric design is what lets a single Kafka cluster move millions of events per second across thousands of services while staying durable, ordered, and replayable.
In production, Kafka becomes the central nervous system of an architecture: orders, clicks, payments, sensor readings, and database changes all flow through it as immutable events. Understanding the log model — and how it differs from a traditional message queue — is the key to using Kafka well.
The log, not a queue
The single most important mental model for Kafka is this: a topic is a durable log, not a queue.
In a traditional message queue (RabbitMQ, ActiveMQ, SQS), a message is delivered to a consumer and then deleted. Consumption is destructive — once read, the message is gone. Throughput is coupled to how fast consumers can drain the queue.
In Kafka, reading an event does not remove it. Each event keeps its position (an offset) in the partition, and every consumer tracks its own offset independently. Events stay on disk until a retention policy expires them (by time or size), so:
- A new consumer can join later and replay the entire history from offset 0.
- Multiple independent consumers (analytics, search indexing, auditing) read the same stream without interfering.
- You can reprocess data after fixing a bug by simply resetting an offset.
Think of Kafka as a shared, replayable ledger that many readers tail — not a mailbox that empties as you read it.
Core building blocks
Kafka’s model is composed of a small number of concepts that compose cleanly.
| Concept | What it is |
|---|---|
| Event (record) | An immutable fact: a key, a value, a timestamp, and optional headers. |
| Topic | A named, logical stream of events (e.g. orders, payments). |
| Partition | An ordered, append-only shard of a topic. Ordering is guaranteed within a partition. |
| Offset | A monotonically increasing ID identifying an event’s position in a partition. |
| Broker | A Kafka server that stores partitions and serves reads/writes. A cluster is many brokers. |
| Producer | A client that publishes events to topics, choosing the partition (often by key hash). |
| Consumer | A client that reads events and commits offsets to track progress. |
| Consumer group | A set of consumers that share the work of a topic; each partition goes to exactly one member. |
A topic is split into partitions to enable horizontal scale and parallelism: more partitions means more consumers can read in parallel. Each partition is replicated across brokers (controlled by the replication factor) so the data survives broker failure.
Topic: "orders" (replication factor 3)
┌──────────────────────────────────────────────────────────┐
│ Partition 0 │ e0 │ e1 │ e2 │ e3 │ e4 │ ──▶ (offset grows)│
│ Partition 1 │ e0 │ e1 │ e2 │ ──▶ │
│ Partition 2 │ e0 │ e1 │ e2 │ e3 │ ──▶ │
└──────────────────────────────────────────────────────────┘
▲ writes appended at the tail reads start at any offset ▼
Producers ──▶ [ Broker 1 ][ Broker 2 ][ Broker 3 ] ──▶ Consumer Group
(partition leaders + replicas) C1 C2 C3
Because each partition is assigned to exactly one consumer in a group, ordering is preserved per key (events with the same key always land in the same partition) while throughput scales with the number of partitions.
What Kafka is used for
Kafka’s durability, ordering, and replay make it the backbone for several patterns:
- Event-driven microservices — services communicate asynchronously by emitting and reacting to events, decoupling producers from consumers.
- Stream processing — continuous transformations, joins, and aggregations over live data with Kafka Streams or Flink.
- Data pipelines / ingestion — moving data between databases, data lakes, and warehouses, typically via Kafka Connect.
- Log and metrics aggregation — collecting high-volume application and infrastructure telemetry into one durable stream.
- Event sourcing — using the log itself as the source of truth, rebuilding state by replaying events.
Kafka vs a message queue
| Dimension | Apache Kafka | Traditional message queue |
|---|---|---|
| Storage model | Durable, append-only log (retained) | Transient; messages deleted on ack |
| Consumption | Non-destructive; consumers track offsets | Destructive; message removed when consumed |
| Replay | Yes — reset offset to reread history | No — once consumed, it’s gone |
| Multiple consumers | Many independent groups read the same stream | Competing consumers share/divide messages |
| Ordering | Guaranteed per partition | Best-effort or per-queue, varies by broker |
| Scaling | Partition the topic, add consumers | Add queues/consumers, harder to scale a stream |
| Typical throughput | Very high (millions/sec) | Moderate to high |
This is why Kafka is described as a streaming platform rather than a message broker: it stores history as a first-class feature.
KRaft: no more ZooKeeper
Historically Kafka used ZooKeeper to store cluster metadata (broker membership, topic configs, controller election). Modern Kafka replaces this with KRaft (Kafka Raft), a built-in consensus protocol where dedicated controller nodes manage metadata using the Raft algorithm.
KRaft simplifies operations dramatically: a single system to deploy and secure, faster controller failover, and metadata that scales to millions of partitions. As of recent releases, KRaft is the default and production-recommended mode, and ZooKeeper has been removed in Kafka 4.x.
When standing up a new cluster today, use KRaft. ZooKeeper-based setups are legacy and should only appear when maintaining older deployments.
Best Practices
- Model topics around events, not commands — name them after facts (
order-placed), and treat each as an immutable stream. - Choose keys deliberately — the key determines the partition and therefore ordering; use a stable business key (e.g.
orderId) when per-entity ordering matters. - Size partitions for parallelism — you cannot have more active consumers in a group than partitions, so plan partition count for peak throughput.
- Set replication factor to 3 in production so the cluster tolerates a broker failure without data loss.
- Treat retention as a deliberate design choice — decide whether a topic keeps days of data or is compacted to keep only the latest value per key.
- Adopt KRaft for new clusters — avoid new ZooKeeper deployments entirely.
Related Topics
- Why Kafka? — the problems Kafka solves and when to reach for it.
- Event streaming concepts — events, streams, and the streaming mindset.
- Kafka architecture overview — brokers, controllers, and replication in depth.
- Installation — run a KRaft cluster locally.
- Your first application — produce and consume your first events.