MirrorMaker 2
MirrorMaker 2 (MM2) is Kafka’s built-in tool for replicating data between clusters — across data centers, regions, or cloud accounts. Unlike the original MirrorMaker, MM2 is built on the Kafka Connect framework, so it inherits Connect’s scalability, fault tolerance, and offset tracking. Beyond copying messages, MM2 synchronizes topic configurations, ACLs, and — critically — translates consumer offsets so applications can fail over to a remote cluster and resume roughly where they left off. This makes it the foundation for disaster recovery and geo-distributed architectures.
How MirrorMaker 2 works
MM2 runs as a set of Connect connectors that move data and metadata between a source cluster and a target cluster:
| Connector | Responsibility |
|---|---|
MirrorSourceConnector | Copies topic records and topic configs from source to target |
MirrorCheckpointConnector | Emits consumer-group offset checkpoints and translates them for the target |
MirrorHeartbeatConnector | Produces periodic heartbeats to measure replication health and lag |
Each connector is identified by a replication flow written as source->target. You can run many flows at once — primary->backup, backup->primary, us-east->eu-west — and MM2 manages them independently within the same cluster of worker processes.
Remote topic naming
To avoid loops and name collisions, MM2 prefixes replicated topics with the source cluster alias by default. A topic orders on cluster primary, replicated to backup, appears on backup as primary.orders. This DefaultReplicationPolicy makes the data lineage obvious and lets bidirectional replication coexist without one flow re-copying the other’s output.
primary cluster backup cluster
--------------- --------------
orders --MM2--> primary.orders
payments --MM2--> primary.payments
If you need flat topic names (so
ordersstaysorderson the target), use theIdentityReplicationPolicy. It is convenient for one-way active-passive setups but is unsafe for active-active, because two flows can form an infinite replication loop.
Configuration example
MM2 is configured with a single properties file passed to connect-mirror-maker.sh. Define the clusters, the bootstrap servers for each, and which flows are enabled.
# mm2.properties — replicate primary -> backup
clusters = primary, backup
primary.bootstrap.servers = primary-kafka:9092
backup.bootstrap.servers = backup-kafka:9092
# enable the replication flow primary -> backup
primary->backup.enabled = true
primary->backup.topics = orders|payments|inventory
# keep the reverse flow off for active-passive
backup->primary.enabled = false
# sync topic configs and consumer offsets
sync.topic.configs.enabled = true
sync.group.offsets.enabled = true
emit.checkpoints.enabled = true
refresh.topics.interval.seconds = 30
replication.factor = 3
checkpoints.topic.replication.factor = 3
offset-syncs.topic.replication.factor = 3
heartbeats.topic.replication.factor = 3
Run it as a dedicated MM2 cluster (it self-manages Connect internally):
connect-mirror-maker.sh mm2.properties
Output:
INFO Starting with 1 enabled replication flows: [primary->backup]
INFO [MirrorSourceConnector|task-0] Starting with 3 topic-partitions
INFO [MirrorCheckpointConnector] Syncing offsets for groups: [order-service]
INFO Mirroring topics: primary.orders, primary.payments, primary.inventory
Config and ACL sync
MM2 keeps target topics consistent with their source automatically. When you add partitions to orders on primary, the MirrorSourceConnector propagates the change to primary.orders on backup within refresh.topics.interval.seconds. Topic-level configs (retention, cleanup policy, compression) are mirrored when sync.topic.configs.enabled is true. ACL replication is opt-in via sync.topic.acls.enabled so principals retain the same permissions after failover. Note that internal MM2 topics and configs in config.properties.exclude are never copied.
Consumer offset translation
Raw offsets are meaningless across clusters because partition offsets diverge during replication. MM2 solves this with the MirrorCheckpointConnector, which records the mapping between source offsets and the corresponding target offsets in an offset-syncs topic, then emits checkpoints that translate committed consumer-group offsets into target-cluster offsets.
With sync.group.offsets.enabled = true, MM2 writes translated offsets directly into the target’s __consumer_offsets. After failover, a consumer group simply connects to backup, subscribes to primary.orders, and resumes near where it stopped on the source — avoiding both reprocessing the entire topic and silently skipping records.
# inspect translated offsets for a group on the target cluster
kafka-consumer-groups.sh --bootstrap-server backup-kafka:9092 \
--describe --group order-service
Offset translation is approximate. Applications must still be idempotent or tolerant of small amounts of duplicate processing, because the translated offset can land slightly before the true position.
Active-passive vs active-active
These are the two canonical topologies, and the choice drives your replication policy and flow configuration.
| Aspect | Active-passive | Active-active |
|---|---|---|
| Traffic | All writes to one cluster; other is standby | Writes to both clusters |
| Flows | One direction (primary->backup) | Both directions (primary->backup and backup->primary) |
| Replication policy | Identity or Default | Must use DefaultReplicationPolicy |
| Failover | Promote standby, repoint clients | Clients already use the nearest cluster |
| Use case | Disaster recovery | Geo-locality, regional read/write |
In active-active, each cluster holds both its local topics and the remote-prefixed copies from the other cluster. A consumer that wants all events subscribes with a pattern such as orders|.*\.orders to read both local and replicated streams.
Best Practices
- Always run MM2 as its own dedicated cluster of workers, sized independently from your brokers, so replication load never competes with production traffic.
- Co-locate MM2 with the target cluster’s region; a remote consume + local produce pattern is more resilient to WAN hiccups than the reverse.
- Use
DefaultReplicationPolicy(cluster-prefixed names) for any bidirectional setup to prevent replication loops. - Enable
emit.heartbeatsand scrape the heartbeat/checkpoint topics or JMX metrics to alert on replication lag before it becomes a recovery-point problem. - Pre-create internal topics (
offset-syncs,checkpoints,heartbeats) with RF ≥ 3 and protect them with ACLs just like business topics. - Keep consumer applications idempotent — offset translation is best-effort, so design for at-least-once semantics after failover.
- Test failover regularly: promote the standby, repoint a real consumer group, and confirm it resumes from the translated offset rather than the topic head.