Concurrency & Listener Containers
A single @KafkaListener method does not have to mean a single consumer thread. Spring for Apache Kafka lets one application instance run several consumers in the same group, each on its own thread, by setting concurrency. Used correctly this multiplies your throughput without deploying more pods; used carelessly it spins up threads that sit idle. Understanding how the container model maps threads to partitions is what separates a consumer that scales linearly from one that quietly bottlenecks in production.
The two container types
Spring ships two MessageListenerContainer implementations, and the relationship between them is the key to the whole concurrency model.
KafkaMessageListenerContaineris the low-level unit. It owns exactly one KafkaConsumer, runs one poll loop on one dedicated thread, and delivers records to your listener. It is single-threaded by design — one container, one consumer, one thread.ConcurrentMessageListenerContaineris a thin manager. It does not poll anything itself; instead it creates N childKafkaMessageListenerContainerinstances, where N is theconcurrencyvalue, and starts each on its own thread.
When you annotate a method with @KafkaListener, the default ConcurrentKafkaListenerContainerFactory builds a ConcurrentMessageListenerContainer for it. So concurrency = 3 means three child containers, three Kafka consumers, three threads — all joined to the same consumer group.
ConcurrentMessageListenerContainer (concurrency = 3)
├─ KafkaMessageListenerContainer #0 → Consumer → Thread "...-0-C-1"
├─ KafkaMessageListenerContainer #1 → Consumer → Thread "...-1-C-1"
└─ KafkaMessageListenerContainer #2 → Consumer → Thread "...-2-C-1"
(all three share group.id = "order-processing")
How threads map to partitions
Because every child container is a real consumer in the same group, Kafka’s group coordinator distributes the topic’s partitions across them through the normal rebalance protocol. This is the single most important rule:
Partitions are the unit of parallelism. The effective parallelism of a listener is
min(concurrency, partitionsAssignedToThisInstance). If a topic has 4 partitions and you setconcurrency = 8, four child consumers each own one partition and the other four sit idle with no work — they are not bugs, just wasted threads.
Within one application instance, partitions are spread as evenly as possible. With 6 partitions and concurrency = 3, each child container handles 2 partitions. Records from a given partition are always processed in order by a single thread, which preserves Kafka’s per-partition ordering guarantee even while you run multiple threads.
When you scale horizontally, the same arithmetic spans the whole group. Two instances each with concurrency = 3 produce 6 consumers; a 6-partition topic gives each exactly one partition. Total concurrency for a group is therefore bounded by total partition count across all instances.
Setting concurrency
You can set concurrency in three places. On the factory it becomes the default for every listener; on the annotation it overrides the factory per listener.
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.kafka.config.ConcurrentKafkaListenerContainerFactory;
import org.springframework.kafka.core.ConsumerFactory;
@Configuration
public class KafkaConsumerConfig {
@Bean
public ConcurrentKafkaListenerContainerFactory<String, OrderEvent> kafkaListenerContainerFactory(
ConsumerFactory<String, OrderEvent> consumerFactory) {
ConcurrentKafkaListenerContainerFactory<String, OrderEvent> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory);
factory.setConcurrency(3); // default for all listeners using this factory
return factory;
}
}
Per-listener overrides win, and the annotation value is a String (it supports property placeholders such as "${app.order.concurrency:3}"):
@KafkaListener(topics = "orders", groupId = "order-processing", concurrency = "6")
public void onMessage(OrderEvent event) {
// processed by up to 6 threads, capped by partition count
}
The DTO used above:
public record OrderEvent(String orderId, String customerId, double amount) {}
Seeing the thread/partition relationship
Logging the current thread and partition makes the mapping concrete. With concurrency = 3 against a 6-partition topic:
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.kafka.support.KafkaHeaders;
import org.springframework.messaging.handler.annotation.Header;
import org.springframework.stereotype.Component;
@Component
public class OrderListener {
@KafkaListener(topics = "orders", groupId = "order-processing", concurrency = "3")
public void onMessage(OrderEvent event,
@Header(KafkaHeaders.RECEIVED_PARTITION) int partition) {
System.out.printf("thread=%s partition=%d order=%s%n",
Thread.currentThread().getName(), partition, event.orderId());
}
}
Output:
thread=org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1 partition=0 order=ORD-1001
thread=org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1 partition=1 order=ORD-1002
thread=org.springframework.kafka.KafkaListenerEndpointContainer#0-1-C-1 partition=2 order=ORD-1003
thread=org.springframework.kafka.KafkaListenerEndpointContainer#0-2-C-1 partition=4 order=ORD-1004
Each child container keeps a stable thread name (...#0-<childIndex>-C-1) and a fixed set of partitions for the life of an assignment, so order is preserved within every partition.
Concurrency settings at a glance
| Where | API / Key | Scope | Notes |
|---|---|---|---|
| Factory bean | factory.setConcurrency(int) | All listeners on that factory | Default when annotation omits it |
| Annotation | @KafkaListener(concurrency = "N") | Single listener | Overrides factory; supports placeholders |
| Effective value | min(concurrency, partitions) | Runtime | Excess child consumers stay idle |
| Property (Boot) | spring.kafka.listener.concurrency | Auto-configured factory | Applies when you rely on Boot’s default factory |
Best Practices
- Size
concurrencyto the topic’s partition count (per instance), never above it — extra threads only consume memory and add rebalance churn. - Plan partition count up front for your peak target throughput, since you cannot exceed partitions with threads and increasing partitions later disrupts key-based ordering.
- Keep listener methods fast; concurrency adds threads but each still must finish work within
max.poll.interval.msor it triggers a rebalance. - Use a stable
idand externalizeconcurrencyvia a placeholder so you can tune parallelism per environment without recompiling. - Remember that per-partition ordering still holds with high concurrency — do not assume global ordering across partitions just because one thread handles many.
- When horizontally scaling, account for total consumers across all instances against total partitions; idle instances are a sign you have more capacity than partitions.