Brokers & Clusters

A Kafka cluster is nothing more than a set of cooperating servers called brokers. Each broker stores a slice of your data, serves reads and writes for the partitions it owns, and coordinates with its peers to keep the system available even when machines fail. Understanding how brokers form a cluster — and how clients discover and talk to them — is the foundation for everything else in Kafka, from throughput tuning to fault tolerance.

What is a broker?

A broker is a single Kafka server process that runs on a host, listens on a TCP port (typically 9092), and persists messages to local disk. Every broker in a cluster has a unique numeric node.id and is responsible for a subset of the cluster’s partitions. When a producer sends a record or a consumer fetches one, it is ultimately talking to the specific broker that hosts the relevant partition.

A broker does three core jobs:

Storage — it appends records to partition log segments on disk and serves them back to consumers.
Replication — it acts as the leader for some partitions and a follower for others, copying data to keep replicas in sync.
Coordination — it participates in cluster metadata management and may serve as the controller that assigns leadership.

In modern Kafka (KRaft mode), brokers manage cluster metadata themselves through a built-in Raft quorum, eliminating the legacy ZooKeeper dependency.

How brokers form a cluster

Brokers do not work in isolation. They share a common cluster.id and elect a controller that is responsible for cluster-wide decisions such as which broker leads each partition. The remaining brokers register with the controller, report their state, and receive metadata updates describing the full topic and partition layout.

Data in Kafka is divided into partitions, and partitions are distributed across brokers. Each partition has one leader replica and zero or more follower replicas. All produce and consume traffic for a partition flows through its leader; followers passively replicate the leader’s log so they can take over if the leader fails. Spreading leaders evenly across brokers balances load and prevents any single machine from becoming a bottleneck.

A 3-broker cluster

The diagram below shows a topic orders with three partitions and a replication factor of three across a 3-broker cluster. Each partition has exactly one leader (L) and two followers (F), and leadership is balanced so every broker leads one partition.

                 Kafka Cluster (cluster.id = abc123)
 ┌────────────────┐   ┌────────────────┐   ┌────────────────┐
 │   Broker 1     │   │   Broker 2     │   │   Broker 3     │
 │  node.id = 1   │   │  node.id = 2   │   │  node.id = 3   │
 ├────────────────┤   ├────────────────┤   ├────────────────┤
 │ orders-0  (L)  │   │ orders-0  (F)  │   │ orders-0  (F)  │
 │ orders-1  (F)  │   │ orders-1  (L)  │   │ orders-1  (F)  │
 │ orders-2  (F)  │   │ orders-2  (F)  │   │ orders-2  (L)  │
 └────────────────┘   └────────────────┘   └────────────────┘
        ▲                     ▲                     ▲
        └─────────── replication between brokers ───┘

If Broker 2 fails, the controller promotes an in-sync follower of orders-1 (on Broker 1 or 3) to leader, and clients transparently reconnect to the new leader.

bootstrap.servers and client discovery

Clients never need to know the full cluster topology up front. They are configured with bootstrap.servers — a list of one or more broker addresses used purely as an entry point. On connect, the client asks any bootstrap broker for cluster metadata: the complete list of brokers, the partitions of each topic, and which broker currently leads each partition. From then on the client routes each request directly to the correct leader, refreshing metadata when leadership changes.

You should list at least two or three brokers so the client can still bootstrap if one is down.

# Plain producer / consumer client config
bootstrap.servers=broker1.internal:9092,broker2.internal:9092,broker3.internal:9092
client.id=orders-service

The same property drives Spring for Apache Kafka:

spring:
  kafka:
    bootstrap-servers: broker1.internal:9092,broker2.internal:9092,broker3.internal:9092
    producer:
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: org.springframework.kafka.support.serializer.JsonSerializer
    consumer:
      group-id: orders-service
      auto-offset-reset: earliest

You can inspect the live cluster and how partitions map to brokers with the admin CLI:

kafka-metadata-quorum.sh --bootstrap-server broker1.internal:9092 describe --status
kafka-topics.sh --bootstrap-server broker1.internal:9092 \
  --describe --topic orders

Output:

Topic: orders   PartitionCount: 3   ReplicationFactor: 3
  Topic: orders  Partition: 0  Leader: 1  Replicas: 1,2,3  Isr: 1,2,3
  Topic: orders  Partition: 1  Leader: 2  Replicas: 2,3,1  Isr: 2,3,1
  Topic: orders  Partition: 2  Leader: 3  Replicas: 3,1,2  Isr: 3,1,2

Tip: Always point clients at a stable DNS name or load-balanced endpoint per broker rather than raw IPs. The advertised.listeners setting on each broker determines the address clients are told to connect to — if it is wrong, bootstrap succeeds but subsequent leader connections fail.

Scaling the cluster horizontally

Kafka scales out by adding brokers. A new broker joins by registering with the controller, but it does not automatically take over any partitions — existing data stays where it is. To actually use the new capacity, you reassign partitions to spread them (and their leadership) across the larger set of brokers.

# Generate a reassignment plan that includes the new broker (id 4)
kafka-reassign-partitions.sh --bootstrap-server broker1.internal:9092 \
  --topics-to-move-json-file topics.json \
  --broker-list "1,2,3,4" --generate

# Execute the proposed plan
kafka-reassign-partitions.sh --bootstrap-server broker1.internal:9092 \
  --reassignment-json-file plan.json --execute

Because partitions are the unit of parallelism, a topic can never have more useful consumers in a group than it has partitions, and it can never spread across more brokers than it has replicas. Plan partition counts generously up front so you can rebalance onto new brokers without first having to re-partition topics.

Action	Effect on the cluster
Add a broker	Increases total capacity; no data moves until you reassign
Reassign partitions	Rebalances storage and leadership across brokers
Increase replication factor	More fault tolerance, higher network/disk cost
Increase partitions	More parallelism, but breaks key-based ordering guarantees

Warning: Partition reassignment copies data over the network and competes with live traffic. Throttle it with --throttle and run large moves during off-peak windows to avoid degrading producers and consumers.

Best Practices

List multiple brokers in bootstrap.servers so clients can still join the cluster when one broker is down.
Run an odd number of controllers (3 or 5) in KRaft mode to maintain a stable metadata quorum.
Use a replication factor of at least 3 in production so a broker loss never causes data loss or unavailability.
Configure advertised.listeners correctly per broker — most “can connect but then time out” issues trace back to this.
Spread leadership evenly; an unbalanced cluster overloads the brokers that lead the most partitions.
Throttle partition reassignments and schedule large rebalances during low-traffic periods.
Over-provision partition counts modestly at design time, since adding partitions later disrupts key ordering.