Skip to content
Apache Kafka kf performance 5 min read

Throughput Tuning

Throughput is the number of records (or bytes) Kafka can move per second across producers, brokers, and consumers. In production you tune for throughput when you are ingesting high-volume event streams, change-data-capture feeds, or log pipelines where a few milliseconds of extra per-message latency is an acceptable trade for far higher aggregate rate. The core idea is simple: amortize fixed costs (network round trips, request headers, disk I/O) over larger, compressed batches, and then scale horizontally with partitions and parallel consumers.

The throughput equation

Every Kafka request carries fixed overhead. A producer that sends one record per request wastes most of its bandwidth on protocol framing and network round trips. The single most effective lever is batching: group many records into one request so the per-record overhead shrinks toward zero.

effective_throughput ≈ (batch_size × compression_ratio) / (per_request_overhead + transfer_time)

Three families of settings move this equation: producer batching/compression, partition count, and consumer fetch sizing. Tune them together — adding partitions without enough consumer threads, or compressing without bigger batches, leaves throughput on the table.

Producer: batch harder and compress

By default the producer flushes as soon as a batch is ready. Raising batch.size and adding a small linger.ms lets batches fill before they are sent, dramatically increasing records per request. Compression then shrinks the on-wire and on-disk payload, which both saves network and lets the broker write fewer bytes.

PropertyDefaultThroughput settingEffect
batch.size16384 (16 KB)131072–262144 (128–256 KB)More records per request
linger.ms010–100Lets batches fill before sending
compression.typenonelz4 or zstd2–5x smaller payloads
buffer.memory33554432 (32 MB)67108864–134217728 (64–128 MB)Buffers more in-flight batches
acksallall (keep)Durability; pair with idempotence
max.in.flight.requests.per.connection55Keeps the pipeline full
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker1:9092,broker2:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

// Throughput recipe
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 256 * 1024);          // 256 KB batches
props.put(ProducerConfig.LINGER_MS_CONFIG, 50);                   // wait up to 50 ms
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "zstd");        // high-ratio compression
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 128L * 1024 * 1024); // 128 MB buffer
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);

try (KafkaProducer<String, String> producer = new KafkaProducer<>(props)) {
    for (int i = 0; i < 1_000_000; i++) {
        producer.send(new ProducerRecord<>("events", Integer.toString(i), payload(i)));
    }
}

Tip: zstd gives the best compression ratio and usually the highest end-to-end throughput on modern hardware; lz4 costs less CPU per byte and is the safer pick when producer cores are saturated. Always benchmark with your real payloads — text and JSON compress far better than already-compressed binary.

In Spring Boot, set the same keys under spring.kafka.producer:

spring:
  kafka:
    bootstrap-servers: broker1:9092,broker2:9092
    producer:
      acks: all
      batch-size: 262144        # 256 KB
      buffer-memory: 134217728  # 128 MB
      compression-type: zstd
      properties:
        linger.ms: 50
        enable.idempotence: true

Scale out with partitions

A single partition is processed by at most one consumer in a group, so partition count is the hard ceiling on consumer parallelism. To raise consumer-side throughput you must have at least as many partitions as you want concurrent consumers.

kafka-topics.sh --bootstrap-server broker1:9092 \
  --alter --topic events --partitions 24
**Output:**
Topic: events  PartitionCount: 24  ReplicationFactor: 3

Warning: Increasing partitions on a keyed topic changes the key-to-partition mapping, breaking per-key ordering for keys already in flight. Plan partition counts up front for keyed streams, and never shrink — Kafka does not support reducing partitions.

A practical sizing heuristic: target partitions ≥ peak consumers, and keep each partition’s sustained write rate well under broker capacity. See partition sizing for the full method.

Consumer: fetch bigger, process in parallel

Consumers also benefit from batching. Raising fetch.min.bytes tells the broker to wait until it has accumulated enough data before responding, so each fetch returns more records. fetch.max.wait.ms caps that wait so you do not starve under low load. Then run one consumer thread per partition.

PropertyDefaultThroughput settingEffect
fetch.min.bytes165536–1048576 (64 KB–1 MB)Bigger, fewer fetches
fetch.max.wait.ms500100–500Bounds wait when data is thin
max.partition.fetch.bytes10485762097152+More bytes per partition per fetch
max.poll.records5001000–2000Larger in-memory processing batches
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker1:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "events-processor");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());

props.put(ConsumerConfig.FETCH_MIN_BYTES_CONFIG, 1024 * 1024);      // wait for 1 MB
props.put(ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG, 250);           // but no longer than 250 ms
props.put(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, 4 * 1024 * 1024);
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 2000);

In Spring, run multiple containers per listener with concurrency so each thread owns a slice of the partitions:

@KafkaListener(topics = "events", groupId = "events-processor", concurrency = "12")
public void consume(List<ConsumerRecord<String, String>> batch) {
    // batch listener: process many records per poll for higher throughput
    for (ConsumerRecord<String, String> record : batch) {
        process(record.value());
    }
}

Set concurrency no higher than the partition count — extra threads simply idle.

Best Practices

  • Tune batching first: a larger batch.size plus 10–100 ms of linger.ms is usually the biggest single throughput win.
  • Always enable compression (zstd or lz4) for text/JSON workloads and benchmark the ratio with real data.
  • Raise buffer.memory so producers can keep batching during broker slowdowns instead of blocking on send().
  • Provision partitions for peak consumer parallelism up front; you can add but never remove them, and adding breaks key ordering.
  • Match consumer concurrency to partition count, and use batch listeners with a higher max.poll.records to amortize per-record processing cost.
  • Keep acks=all with idempotence enabled — modern brokers deliver high throughput without sacrificing durability.
  • Measure end-to-end (producer, broker, consumer) and change one variable at a time so you can attribute each gain.
Last updated June 1, 2026
Was this helpful?