ZooKeeper (Legacy)
For most of Kafka’s history, every cluster shipped with a companion service: Apache ZooKeeper. ZooKeeper was the external coordination layer that stored cluster metadata, elected the controller, and let brokers discover each other. As of Kafka 4.0, ZooKeeper support has been removed entirely in favour of the built-in KRaft consensus protocol. This page exists to explain what ZooKeeper did, why it caused operational friction, and how to move off it — it is legacy material for teams still running older clusters.
Legacy notice: New clusters must use KRaft. ZooKeeper mode was deprecated in Kafka 3.5 and removed in Kafka 4.0. If you are designing a new system, skip straight to KRaft mode.
What ZooKeeper stored
ZooKeeper is a hierarchical key-value store (a “znode” tree) with strong consistency and watch notifications. Kafka used it as the single source of truth for cluster-wide state that brokers themselves did not own. The most important pieces of metadata were:
| Metadata | ZooKeeper path | Purpose |
|---|---|---|
| Broker registry | /brokers/ids/<id> | Ephemeral znodes for live broker discovery and host/port info |
| Topic configuration | /config/topics/<topic> | Per-topic overrides like retention and cleanup policy |
| Partition state | /brokers/topics/<topic>/partitions | Leader, ISR, and replica assignment per partition |
| Controller election | /controller | Which broker holds the controller role |
| ACLs | /kafka-acl/ | Authorization rules when using the ZooKeeper-backed authorizer |
| Consumer offsets (very old) | /consumers/... | Pre-0.9 offset storage, later moved to the __consumer_offsets topic |
Brokers registered an ephemeral znode under /brokers/ids on startup. When a broker died, its session expired and the znode vanished, which is how the cluster detected failures.
Controller election via ZooKeeper
Exactly one broker acted as the controller, responsible for leader election, partition reassignment, and propagating metadata changes. ZooKeeper elected it with a simple race: every broker tried to create the ephemeral /controller znode, and the one that succeeded won. If that broker failed, the znode disappeared and the remaining brokers raced again.
You could inspect the current controller directly with the ZooKeeper shell:
zookeeper-shell.sh localhost:2181 get /controller
Output:
{"version":2,"brokerid":3,"timestamp":"1717200000000","kraftControllerEpoch":-1}
Connected to localhost:2181
This model worked, but the controller had to read large amounts of state from ZooKeeper during failover, which made recovery slow on big clusters.
Configuring a broker for ZooKeeper (legacy)
A pre-4.0 broker pointed at its ZooKeeper ensemble through server.properties:
broker.id=1
zookeeper.connect=zk1:2181,zk2:2181,zk3:2181/kafka
zookeeper.connection.timeout.ms=18000
zookeeper.session.timeout.ms=18000
listeners=PLAINTEXT://0.0.0.0:9092
log.dirs=/var/lib/kafka/logs
The trailing /kafka is a chroot — it namespaces all of Kafka’s znodes under a subtree so multiple Kafka clusters can share one ZooKeeper ensemble. ZooKeeper itself ran from a separate zookeeper.properties and was started before any broker.
Operational pain points
Running ZooKeeper alongside Kafka effectively meant operating two distributed systems with different tuning, monitoring, and failure modes. The recurring complaints were:
- Two systems to run. Separate ensembles, separate JVMs, separate upgrade cycles, and a separate quorum to keep healthy.
- Slow controller failover. Metadata lived outside the brokers, so a controller change required reloading state from ZooKeeper — painful on clusters with hundreds of thousands of partitions.
- Scalability ceiling. ZooKeeper’s write throughput and watch fan-out limited how many partitions a cluster could practically hold.
- Metadata divergence. The in-memory controller view and ZooKeeper could drift during partitions, producing subtle, hard-to-debug inconsistencies.
- Security surface. A second service needed its own TLS, authentication, and ACL hardening.
Deprecation timeline
| Kafka version | Status |
|---|---|
| 2.8 | KRaft introduced as early access (KIP-500) |
| 3.3 | KRaft declared production-ready for new clusters |
| 3.4 | ZooKeeper-to-KRaft migration tooling released |
| 3.5 | ZooKeeper mode officially deprecated |
| 3.6–3.9 | Migration path stabilized; dual-write bridge supported |
| 4.0 | ZooKeeper support removed — KRaft only |
Migrating to KRaft
The supported path is an online migration that runs the cluster in a temporary “dual-write” bridge mode, copying metadata from ZooKeeper into a KRaft controller quorum before cutting over. At a high level:
- Provision a dedicated KRaft controller quorum and give it a cluster ID.
- Enable migration on the controllers and existing brokers with
zookeeper.metadata.migration.enable=true. - Restart brokers so they register with the new controllers while ZooKeeper still holds authoritative state.
- Wait for the migration to report complete, then restart brokers in pure KRaft mode and decommission ZooKeeper.
# On the KRaft controllers during migration
process.roles=controller
node.id=3000
controller.quorum.voters=3000@ctrl1:9093,3001@ctrl2:9093,3002@ctrl3:9093
zookeeper.metadata.migration.enable=true
zookeeper.connect=zk1:2181,zk2:2181,zk3:2181/kafka
Warning: Migration is one-way and version-sensitive. Take a full ZooKeeper snapshot and validate the procedure in staging before touching production. Once you cut over to KRaft, you cannot roll back to ZooKeeper.
For the full destination architecture — how brokers, controllers, and the metadata log fit together without ZooKeeper — see Controller and metadata and KRaft mode.
Best Practices
- Do not deploy ZooKeeper for new clusters. Start on KRaft; ZooKeeper is removed from Kafka 4.0 onward.
- Plan migrations off ZooKeeper now if you are on 3.x, while the bridge tooling still exists — it is gone in 4.0.
- Always chroot legacy clusters (e.g.
/kafka) so metadata is isolated and easy to wipe or migrate. - Snapshot ZooKeeper before any change. Its data directory is the only recovery point for legacy metadata.
- Use an odd-sized ensemble (3 or 5 nodes) on dedicated hosts to preserve quorum during failures.
- Monitor session expirations and watch counts, the leading indicators of ZooKeeper-induced controller instability.