Batching & linger.ms

Batching is the single most effective lever for Kafka producer throughput. Instead of shipping every record to the broker the instant send() is called, the producer groups records destined for the same partition into batches and sends them as one request. Larger batches amortize network and broker overhead across many records, trading a tiny amount of latency for a large gain in throughput. Tuning batch.size and linger.ms is how you dial in that trade-off for your workload.

How batching works

When you call send(), the record is not transmitted immediately. It is serialized, assigned a partition, and appended to an in-memory batch (a record accumulator) for that partition. A separate background I/O thread (the “Sender”) drains ready batches and sends them to the broker.

A batch becomes eligible to send when either of two conditions is met:

The batch reaches batch.size bytes (the batch is full).
linger.ms milliseconds have elapsed since the first record was added to the batch.

Whichever happens first wins. Crucially, batching is per partition — each partition has its own accumulating batch, so a topic with many partitions has many batches filling in parallel.

send(recordA) ─┐
send(recordB) ─┼──► [ Record Accumulator ]
send(recordC) ─┘        │  partition 0: [A][B][C] ──┐ (full OR linger expired)
                        │  partition 1: [D]         │
                        ▼                           ▼
                  Sender thread ───────────► Broker (single produce request)

The two key knobs

Config	Default	Meaning
`batch.size`	16384 (16 KB)	Maximum size in bytes of a single batch per partition. A batch never exceeds this; a record larger than it is sent on its own.
`linger.ms`	0	Time the producer waits to let more records accumulate before sending a non-full batch.
`buffer.memory`	33554432 (32 MB)	Total memory available for buffering unsent records across all partitions.
`max.request.size`	1048576 (1 MB)	Upper bound on the entire produce request (may contain multiple batches).

With the default linger.ms=0, the producer still batches whatever records are already waiting when the Sender thread is free — it just doesn’t wait deliberately. Setting linger.ms to a small positive value introduces an intentional delay so more records pile into each batch.

Increasing linger.ms adds at most that many milliseconds of latency to a record, but it does so only when traffic is light. Under high load, batches fill on size before the linger timer ever fires, so there is no added latency at all.

Throughput vs. latency trade-off

Bigger batches mean fewer, larger requests — less per-request CPU, fewer network round trips, and better compression ratios (see compression). The cost is that records may sit in the accumulator slightly longer.

The table below illustrates the typical shape of the trade-off for a moderate-throughput workload (numbers are illustrative, not guaranteed):

`batch.size`	`linger.ms`	Approx. throughput	p99 produce latency
16 KB	0	~120 MB/s	~3 ms
64 KB	5	~280 MB/s	~8 ms
256 KB	20	~420 MB/s	~25 ms
1 MB	100	~480 MB/s	~110 ms

Throughput climbs steeply at first and then flattens — there is a point of diminishing returns where you are paying latency without buying much more throughput. For most high-volume pipelines, batch.size of 64–256 KB and linger.ms of 5–20 is a strong starting point.

Tuning with the plain Java client

Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker1:9092,broker2:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

// Throughput-oriented batching
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 128 * 1024);   // 128 KB batches
props.put(ProducerConfig.LINGER_MS_CONFIG, 10);            // wait up to 10 ms
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 64 * 1024 * 1024); // 64 MB buffer
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");  // batching + compression pair well

try (Producer<String, String> producer = new KafkaProducer<>(props)) {
    for (int i = 0; i < 1_000_000; i++) {
        producer.send(new ProducerRecord<>("events", "key-" + i, "payload-" + i));
    }
    producer.flush(); // force any lingering partial batches out
}

flush() blocks until every buffered record has been sent (or failed), regardless of linger.ms. It is the right way to drain the accumulator before shutdown or at a checkpoint boundary.

Tuning in Spring Boot

In a Spring for Apache Kafka application, set the same keys under spring.kafka.producer:

spring:
  kafka:
    bootstrap-servers: broker1:9092,broker2:9092
    producer:
      batch-size: 131072          # 128 KB
      buffer-memory: 67108864     # 64 MB
      compression-type: lz4
      properties:
        linger.ms: 10             # linger.ms has no dedicated property, set via properties

Spring exposes batch-size, buffer-memory, and compression-type as first-class properties, but linger.ms must be supplied through the generic properties map, since it has no dedicated relaxed-binding key.

You can verify the producer’s effective settings — and watch batching in action — through its metrics:

@Component
public class BatchMetricsLogger {

    private final ProducerFactory<String, String> producerFactory;

    public BatchMetricsLogger(ProducerFactory<String, String> producerFactory) {
        this.producerFactory = producerFactory;
    }

    public void logBatchSize() {
        producerFactory.createProducer().metrics().forEach((name, metric) -> {
            if (name.name().equals("batch-size-avg")) {
                System.out.printf("avg batch size = %.0f bytes%n", (double) metric.metricValue());
            }
        });
    }
}

Output:

avg batch size = 124982 bytes

An average batch size near your batch.size means batches are filling on size (you are throughput-bound and well tuned). An average far below it means the linger timer is firing first — raise linger.ms if you want bigger batches.

Best Practices

Start with batch.size=64KB–256KB and linger.ms=5–20, then measure batch-size-avg and adjust toward the regime your latency budget allows.
Pair larger batches with compression (lz4 or zstd); compression operates per batch, so bigger batches compress better.
Ensure buffer.memory is large enough for batch.size × number of active partitions, or producers will block (or throw) when the buffer fills.
Keep batch.size below max.request.size; a single oversized batch will be rejected by the broker.
Use flush() (or a graceful close) before shutdown so partial, lingering batches are not lost.
Treat linger.ms as nearly free under sustained load — it only adds latency when traffic is too sparse to fill batches by size.
Watch record-queue-time-avg and request-latency-avg together; rising queue time with stable request latency means you have room to increase linger.ms.