min.insync.replicas Explained
min.insync.replicas is the single most important durability knob in Kafka, yet it does nothing on its own — it only takes effect when producers write with acks=all. Together they form a contract: a write is acknowledged only when enough replicas have it on disk. Get this pairing wrong and you either silently lose data on broker failure or block all writes the moment one broker hiccups. This page explains exactly how the setting behaves and how to tune it for production.
What min.insync.replicas actually does
Every partition has a set of replicas. The subset that is fully caught up with the leader is called the in-sync replica set (ISR). min.insync.replicas defines the minimum size the ISR must have for a write to be accepted.
The rule fires only for acks=all producers:
- With
acks=all, the leader waits for every replica currently in the ISR to acknowledge the record. Before doing so, it checks the ISR size. If|ISR| < min.insync.replicas, the leader rejects the write immediately and the producer receives aNotEnoughReplicasException(orNotEnoughReplicasAfterAppendException). - With
acks=1oracks=0,min.insync.replicasis ignored entirely — the leader acknowledges without consulting the ISR. This is the most common misconfiguration: settingmin.insync.replicas=2while leaving producers at the default and assuming you are safe.
So durability comes from the combination acks=all plus min.insync.replicas >= 2. Neither half is sufficient alone.
The availability vs. durability trade-off
The setting is a direct lever between two competing goals.
| min.insync.replicas (RF=3) | Replicas a write is guaranteed on | Broker losses tolerated while still accepting writes | Risk |
|---|---|---|---|
| 1 | 1 | 2 | Acked data can vanish if the sole leader dies before replicating |
| 2 | 2 | 1 | Balanced — recommended default |
| 3 | 3 | 0 | One broker down or restarting blocks all writes to the partition |
Set it too high (e.g. equal to the replication factor) and any single broker being down, restarting for a rolling upgrade, or lagging behind drops the ISR below the threshold, and producers stall with NotEnoughReplicas. Set it too low (1) and Kafka will happily acknowledge a write that exists on only the leader; if that leader fails before the followers catch up, the acknowledged record is lost.
The sweet spot for most clusters is replication factor 3 with min.insync.replicas=2. That tolerates the loss of any one broker while still guaranteeing every acknowledged write lives on at least two machines.
Tip: Always keep
min.insync.replicasstrictly less than the replication factor. With RF=3 and min.insync=3 you have zero headroom — routine maintenance becomes an outage. RF must give you at least one spare beyond the minimum.
Configuring it
min.insync.replicas is a broker/topic-level setting, while acks is a producer-level setting. Both ends must be configured.
Set it as a broker default and override per topic:
# server.properties — cluster-wide default
min.insync.replicas=2
default.replication.factor=3
Override on an individual topic with the admin CLI:
kafka-topics.sh --create \
--bootstrap-server localhost:9092 \
--topic payments \
--partitions 6 \
--replication-factor 3 \
--config min.insync.replicas=2
# Change it on an existing topic
kafka-configs.sh --bootstrap-server localhost:9092 \
--entity-type topics --entity-name payments \
--alter --add-config min.insync.replicas=2
On the producer side you must opt into acks=all:
spring:
kafka:
producer:
acks: all
retries: 2147483647
properties:
enable.idempotence: true
max.in.flight.requests.per.connection: 5
delivery.timeout.ms: 120000
Handling NotEnoughReplicas in application code
When the ISR shrinks, a strong-durability producer should fail loudly rather than silently downgrade. With the Spring KafkaTemplate the failure surfaces through the returned CompletableFuture:
@Service
public class PaymentPublisher {
private final KafkaTemplate<String, PaymentEvent> kafkaTemplate;
public PaymentPublisher(KafkaTemplate<String, PaymentEvent> kafkaTemplate) {
this.kafkaTemplate = kafkaTemplate;
}
public void publish(PaymentEvent event) {
kafkaTemplate.send("payments", event.id(), event)
.whenComplete((result, ex) -> {
if (ex instanceof NotEnoughReplicasException) {
// ISR is below min.insync.replicas — do NOT treat as written.
throw new DurabilityException("Write rejected: insufficient in-sync replicas", ex);
} else if (ex != null) {
throw new DurabilityException("Publish failed", ex);
}
});
}
}
public record PaymentEvent(String id, long amountCents, String currency) {}
A transient NotEnoughReplicasException is retriable: once a follower rejoins the ISR, retries (configured above) will succeed automatically. The danger is treating the error as “probably fine” and dropping the message.
You can inspect the current ISR to confirm headroom before traffic ramps up:
kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic payments
Output:
Topic: payments Partition: 0 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
Topic: payments Partition: 1 Leader: 2 Replicas: 2,3,1 Isr: 2,3
Here partition 1 has only two replicas in sync. With min.insync.replicas=2 it still accepts writes, but a second broker loss would block it.
Best Practices
- Use RF=3, min.insync.replicas=2, acks=all as your default for any topic carrying data you cannot lose.
- Never set
min.insync.replicasequal to the replication factor — leave at least one replica of slack for rolling restarts and upgrades. - Remember that
min.insync.replicasis inert withoutacks=all; audit producer configs, not just topic configs. - Enable producer idempotence (
enable.idempotence=true) so retries after a transientNotEnoughReplicasdo not create duplicates. - Set
retrieshigh (effectively unlimited) with a boundeddelivery.timeout.msso brief ISR dips self-heal instead of failing the request. - Treat
NotEnoughReplicasExceptionas a hard failure in code — surface it, alert on it, and never acknowledge the write to upstream callers. - Monitor
UnderMinIsrPartitionCountandIsrShrinksPerSecso you are warned before partitions become unwritable.