Cluster Sizing & Capacity Planning
Sizing a Kafka cluster is the difference between a platform that absorbs traffic spikes gracefully and one that falls over under load or quietly bankrupts you on cloud storage. The goal is to translate business requirements — how many messages, how big, and how long you keep them — into concrete numbers of brokers and the CPU, memory, disk, and network each one needs. Get this right early, because rebalancing a poorly sized cluster under production load is one of the most painful operations you can perform.
Start with throughput and retention
Every sizing exercise begins with two inputs: how much data flows in per second (write throughput), and how long you keep it (retention). Multiply average message size by message rate to get raw bytes per second, then factor in read fan-out — every consumer group re-reads the topic, so read throughput is often several times write throughput.
Capture your baseline as a small set of variables:
Write throughput (W) = avg message size x produce rate
Read fan-out (F) = number of independent consumer groups
Read throughput (R) = W x F
Retention (T) = how long messages live on disk (seconds)
Replication factor (RF)= number of copies per partition (usually 3)
Always size for peak throughput, not average. A daily batch job or a marketing campaign can produce 5-10x your steady-state rate, and a cluster sized for the average will fall behind exactly when it matters most.
Calculate storage
Disk is usually the first constraint you hit. Total stored bytes is write throughput multiplied by retention, multiplied by the replication factor (every message is stored RF times across brokers). Add headroom so you never run a broker above ~70% disk utilization — a full disk takes a broker offline and can cascade.
Raw data = W x T
Replicated data = W x T x RF
Provisioned = Replicated data / 0.70 (30% headroom)
Per broker = Provisioned / number_of_brokers
Brokers for throughput and partitions
You need enough brokers to satisfy two independent limits: aggregate network/disk throughput and total partition count. A single broker comfortably handles tens of MB/s to low hundreds of MB/s depending on disk and NIC, and roughly 2,000-4,000 partitions before metadata overhead, recovery time, and leader-election latency degrade. Take the larger of the two broker counts.
Brokers (throughput) = (W + R) x RF / per_broker_throughput
Brokers (partitions) = total_partitions x RF / partitions_per_broker
Brokers = max(throughput_brokers, partition_brokers, RF) # never fewer than RF
| Resource | Per-broker rule of thumb | Notes |
|---|---|---|
| CPU | 8-16 vCPU | TLS and compression are CPU-heavy; size up if both are on |
| RAM | 32-64 GB | Most goes to OS page cache, not heap |
| JVM heap | 6-8 GB | Larger heaps hurt GC; let the OS cache do the work |
| Disk | 70% max utilization | NVMe/SSD for low-latency tiers |
| Network | 10 GbE minimum | Replication doubles outbound traffic |
Memory: leave it for the page cache
Kafka deliberately keeps a small JVM heap (6-8 GB) and relies on the Linux page cache for read performance. Consumers that read recently produced data are served from RAM rather than disk, which is why you want a lot of free memory. As a guideline, the page cache should comfortably hold your “hot” window — typically the last few minutes of data across all partitions on the broker.
# broker JVM heap (set via KAFKA_HEAP_OPTS, not server.properties)
# -Xms6g -Xmx6g
# everything else (~80-90% of system RAM) stays free for the OS page cache
Disk type and IOPS
Throughput-oriented workloads (large sequential writes, long retention) can use cost-effective volumes, but latency-sensitive workloads need SSD/NVMe. On cloud block storage, provisioned IOPS and per-volume throughput caps matter as much as capacity — a large but slow volume will throttle a broker long before it fills.
| Volume choice | When to use | Trade-off |
|---|---|---|
| Local NVMe | Lowest latency, highest IOPS | Ephemeral; rely on replication for durability |
| Provisioned-IOPS SSD | Latency-sensitive, high throughput | Most expensive |
| General-purpose SSD | Balanced default | IOPS may be capacity-linked |
| Throughput-optimized HDD | Cold/long-retention tiers | Poor random read latency |
Never put two brokers’ data on the same physical disk or the same cloud volume. You lose the independent-failure assumption that replication depends on.
Network bandwidth
Outbound network is frequently the silent bottleneck. A produced message is replicated to RF-1 followers and read by F consumer groups, so a single inbound byte can generate several outbound bytes. Size NICs against the worst case.
Inbound per broker = W / brokers
Outbound per broker = (replication: W x (RF-1) + reads: R) / brokers
Worked example
Suppose you ingest 50 MB/s at peak, retain for 7 days, run RF=3, and have 4 consumer groups. Read throughput is 50 x 4 = 200 MB/s.
Replicated storage = 50 MB/s x 604,800 s x 3 ≈ 90.7 TB
Provisioned (70%) = 90.7 TB / 0.70 ≈ 129.6 TB
Aggregate traffic = (50 + 200) MB/s x 3 (RF) = 750 MB/s
Per broker @150MB/s, @8TB disk:
throughput_brokers = 750 / 150 = 5
storage_brokers = 129.6 / 8 = ~17
Brokers = max(5, 17, 3) = 17
Output:
Recommended starting point:
Brokers : 18 (round up + 1 for headroom/maintenance)
CPU : 16 vCPU each
RAM : 64 GB each (8 GB heap)
Disk : 8 TB NVMe/SSD each
Network : 10 GbE
In this example storage, not throughput, drives the broker count — which is typical for long-retention clusters. If retention dropped to 1 day, the storage requirement would fall ~7x and throughput would become the binding constraint instead.
Validate the topic layout
Once you know the broker count, confirm a sample topic fits. Use the standard tooling to inspect partition distribution and run a load test before going live.
# create a sized topic
kafka-topics.sh --bootstrap-server broker:9092 \
--create --topic orders --partitions 36 --replication-factor 3
# verify leader/replica spread across brokers
kafka-topics.sh --bootstrap-server broker:9092 --describe --topic orders
# load test to confirm per-broker throughput assumptions
kafka-producer-perf-test.sh --topic orders --num-records 5000000 \
--record-size 1024 --throughput -1 \
--producer-props bootstrap.servers=broker:9092 acks=all
Best Practices
- Size for peak throughput and add 30% disk + one spare broker so a single failure or maintenance window never pushes you over capacity.
- Keep the JVM heap small (6-8 GB) and give the rest of RAM to the OS page cache — this is the single biggest read-performance lever.
- Treat partition count as a budget: aim for under ~4,000 partitions per broker including replicas to keep recovery and leader election fast.
- Provision IOPS and network bandwidth explicitly on cloud volumes; capacity alone does not guarantee throughput.
- Never co-locate broker data on shared physical disks or volumes — independent failure domains are what make replication meaningful.
- Re-run the calculation whenever retention, RF, or consumer fan-out changes; these inputs dominate the result more than raw produce rate.
- Load-test with
kafka-producer-perf-test.shagainst your real hardware before committing the design to production.