Concurrency & Listener Containers

A single @KafkaListener method does not have to mean a single consumer thread. Spring for Apache Kafka lets one application instance run several consumers in the same group, each on its own thread, by setting concurrency. Used correctly this multiplies your throughput without deploying more pods; used carelessly it spins up threads that sit idle. Understanding how the container model maps threads to partitions is what separates a consumer that scales linearly from one that quietly bottlenecks in production.

The two container types

Spring ships two MessageListenerContainer implementations, and the relationship between them is the key to the whole concurrency model.

KafkaMessageListenerContainer is the low-level unit. It owns exactly one Kafka Consumer, runs one poll loop on one dedicated thread, and delivers records to your listener. It is single-threaded by design — one container, one consumer, one thread.
ConcurrentMessageListenerContainer is a thin manager. It does not poll anything itself; instead it creates N child KafkaMessageListenerContainer instances, where N is the concurrency value, and starts each on its own thread.

When you annotate a method with @KafkaListener, the default ConcurrentKafkaListenerContainerFactory builds a ConcurrentMessageListenerContainer for it. So concurrency = 3 means three child containers, three Kafka consumers, three threads — all joined to the same consumer group.

ConcurrentMessageListenerContainer (concurrency = 3)
 ├─ KafkaMessageListenerContainer #0  → Consumer → Thread "...-0-C-1"
 ├─ KafkaMessageListenerContainer #1  → Consumer → Thread "...-1-C-1"
 └─ KafkaMessageListenerContainer #2  → Consumer → Thread "...-2-C-1"
        (all three share group.id = "order-processing")

How threads map to partitions

Because every child container is a real consumer in the same group, Kafka’s group coordinator distributes the topic’s partitions across them through the normal rebalance protocol. This is the single most important rule:

Partitions are the unit of parallelism. The effective parallelism of a listener is min(concurrency, partitionsAssignedToThisInstance). If a topic has 4 partitions and you set concurrency = 8, four child consumers each own one partition and the other four sit idle with no work — they are not bugs, just wasted threads.

Within one application instance, partitions are spread as evenly as possible. With 6 partitions and concurrency = 3, each child container handles 2 partitions. Records from a given partition are always processed in order by a single thread, which preserves Kafka’s per-partition ordering guarantee even while you run multiple threads.

When you scale horizontally, the same arithmetic spans the whole group. Two instances each with concurrency = 3 produce 6 consumers; a 6-partition topic gives each exactly one partition. Total concurrency for a group is therefore bounded by total partition count across all instances.

Setting concurrency

You can set concurrency in three places. On the factory it becomes the default for every listener; on the annotation it overrides the factory per listener.

import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.kafka.config.ConcurrentKafkaListenerContainerFactory;
import org.springframework.kafka.core.ConsumerFactory;

@Configuration
public class KafkaConsumerConfig {

    @Bean
    public ConcurrentKafkaListenerContainerFactory<String, OrderEvent> kafkaListenerContainerFactory(
            ConsumerFactory<String, OrderEvent> consumerFactory) {

        ConcurrentKafkaListenerContainerFactory<String, OrderEvent> factory =
                new ConcurrentKafkaListenerContainerFactory<>();
        factory.setConsumerFactory(consumerFactory);
        factory.setConcurrency(3); // default for all listeners using this factory
        return factory;
    }
}

Per-listener overrides win, and the annotation value is a String (it supports property placeholders such as "${app.order.concurrency:3}"):

@KafkaListener(topics = "orders", groupId = "order-processing", concurrency = "6")
public void onMessage(OrderEvent event) {
    // processed by up to 6 threads, capped by partition count
}

The DTO used above:

public record OrderEvent(String orderId, String customerId, double amount) {}

Seeing the thread/partition relationship

Logging the current thread and partition makes the mapping concrete. With concurrency = 3 against a 6-partition topic:

import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.kafka.support.KafkaHeaders;
import org.springframework.messaging.handler.annotation.Header;
import org.springframework.stereotype.Component;

@Component
public class OrderListener {

    @KafkaListener(topics = "orders", groupId = "order-processing", concurrency = "3")
    public void onMessage(OrderEvent event,
                          @Header(KafkaHeaders.RECEIVED_PARTITION) int partition) {
        System.out.printf("thread=%s partition=%d order=%s%n",
                Thread.currentThread().getName(), partition, event.orderId());
    }
}

Output:

thread=org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1 partition=0 order=ORD-1001
thread=org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1 partition=1 order=ORD-1002
thread=org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1 partition=2 order=ORD-1003
thread=org.springframework.kafka.KafkaListenerEndpointContainer#0-2-C-1 partition=4 order=ORD-1004

Each child container keeps a stable thread name (...#0-<childIndex>-C-1) and a fixed set of partitions for the life of an assignment, so order is preserved within every partition.

Concurrency settings at a glance

Where	API / Key	Scope	Notes
Factory bean	`factory.setConcurrency(int)`	All listeners on that factory	Default when annotation omits it
Annotation	`@KafkaListener(concurrency = "N")`	Single listener	Overrides factory; supports placeholders
Effective value	`min(concurrency, partitions)`	Runtime	Excess child consumers stay idle
Property (Boot)	`spring.kafka.listener.concurrency`	Auto-configured factory	Applies when you rely on Boot’s default factory

Best Practices

Size concurrency to the topic’s partition count (per instance), never above it — extra threads only consume memory and add rebalance churn.
Plan partition count up front for your peak target throughput, since you cannot exceed partitions with threads and increasing partitions later disrupts key-based ordering.
Keep listener methods fast; concurrency adds threads but each still must finish work within max.poll.interval.ms or it triggers a rebalance.
Use a stable id and externalize concurrency via a placeholder so you can tune parallelism per environment without recompiling.
Remember that per-partition ordering still holds with high concurrency — do not assume global ordering across partitions just because one thread handles many.
When horizontally scaling, account for total consumers across all instances against total partitions; idle instances are a sign you have more capacity than partitions.