Cluster Sizing & Capacity Planning

Sizing a Kafka cluster is the difference between a platform that absorbs traffic spikes gracefully and one that falls over under load or quietly bankrupts you on cloud storage. The goal is to translate business requirements — how many messages, how big, and how long you keep them — into concrete numbers of brokers and the CPU, memory, disk, and network each one needs. Get this right early, because rebalancing a poorly sized cluster under production load is one of the most painful operations you can perform.

Start with throughput and retention

Every sizing exercise begins with two inputs: how much data flows in per second (write throughput), and how long you keep it (retention). Multiply average message size by message rate to get raw bytes per second, then factor in read fan-out — every consumer group re-reads the topic, so read throughput is often several times write throughput.

Capture your baseline as a small set of variables:

Write throughput (W)   = avg message size x produce rate
Read fan-out (F)       = number of independent consumer groups
Read throughput (R)    = W x F
Retention (T)          = how long messages live on disk (seconds)
Replication factor (RF)= number of copies per partition (usually 3)

Always size for peak throughput, not average. A daily batch job or a marketing campaign can produce 5-10x your steady-state rate, and a cluster sized for the average will fall behind exactly when it matters most.

Calculate storage

Disk is usually the first constraint you hit. Total stored bytes is write throughput multiplied by retention, multiplied by the replication factor (every message is stored RF times across brokers). Add headroom so you never run a broker above ~70% disk utilization — a full disk takes a broker offline and can cascade.

Raw data        = W x T
Replicated data = W x T x RF
Provisioned     = Replicated data / 0.70   (30% headroom)
Per broker      = Provisioned / number_of_brokers

Brokers for throughput and partitions

You need enough brokers to satisfy two independent limits: aggregate network/disk throughput and total partition count. A single broker comfortably handles tens of MB/s to low hundreds of MB/s depending on disk and NIC, and roughly 2,000-4,000 partitions before metadata overhead, recovery time, and leader-election latency degrade. Take the larger of the two broker counts.

Brokers (throughput) = (W + R) x RF / per_broker_throughput
Brokers (partitions) = total_partitions x RF / partitions_per_broker
Brokers = max(throughput_brokers, partition_brokers, RF)   # never fewer than RF

Resource	Per-broker rule of thumb	Notes
CPU	8-16 vCPU	TLS and compression are CPU-heavy; size up if both are on
RAM	32-64 GB	Most goes to OS page cache, not heap
JVM heap	6-8 GB	Larger heaps hurt GC; let the OS cache do the work
Disk	70% max utilization	NVMe/SSD for low-latency tiers
Network	10 GbE minimum	Replication doubles outbound traffic

Memory: leave it for the page cache

Kafka deliberately keeps a small JVM heap (6-8 GB) and relies on the Linux page cache for read performance. Consumers that read recently produced data are served from RAM rather than disk, which is why you want a lot of free memory. As a guideline, the page cache should comfortably hold your “hot” window — typically the last few minutes of data across all partitions on the broker.

# broker JVM heap (set via KAFKA_HEAP_OPTS, not server.properties)
# -Xms6g -Xmx6g

# everything else (~80-90% of system RAM) stays free for the OS page cache

Disk type and IOPS

Throughput-oriented workloads (large sequential writes, long retention) can use cost-effective volumes, but latency-sensitive workloads need SSD/NVMe. On cloud block storage, provisioned IOPS and per-volume throughput caps matter as much as capacity — a large but slow volume will throttle a broker long before it fills.

Volume choice	When to use	Trade-off
Local NVMe	Lowest latency, highest IOPS	Ephemeral; rely on replication for durability
Provisioned-IOPS SSD	Latency-sensitive, high throughput	Most expensive
General-purpose SSD	Balanced default	IOPS may be capacity-linked
Throughput-optimized HDD	Cold/long-retention tiers	Poor random read latency

Never put two brokers’ data on the same physical disk or the same cloud volume. You lose the independent-failure assumption that replication depends on.

Network bandwidth

Outbound network is frequently the silent bottleneck. A produced message is replicated to RF-1 followers and read by F consumer groups, so a single inbound byte can generate several outbound bytes. Size NICs against the worst case.

Inbound  per broker = W / brokers
Outbound per broker = (replication: W x (RF-1) + reads: R) / brokers

Worked example

Suppose you ingest 50 MB/s at peak, retain for 7 days, run RF=3, and have 4 consumer groups. Read throughput is 50 x 4 = 200 MB/s.

Replicated storage = 50 MB/s x 604,800 s x 3        ≈ 90.7 TB
Provisioned (70%)  = 90.7 TB / 0.70                 ≈ 129.6 TB
Aggregate traffic  = (50 + 200) MB/s x 3 (RF)       = 750 MB/s
Per broker @150MB/s, @8TB disk:
  throughput_brokers = 750 / 150  = 5
  storage_brokers    = 129.6 / 8  = ~17
Brokers = max(5, 17, 3) = 17

Output:

Recommended starting point:
  Brokers      : 18  (round up + 1 for headroom/maintenance)
  CPU          : 16 vCPU each
  RAM          : 64 GB each (8 GB heap)
  Disk         : 8 TB NVMe/SSD each
  Network      : 10 GbE

In this example storage, not throughput, drives the broker count — which is typical for long-retention clusters. If retention dropped to 1 day, the storage requirement would fall ~7x and throughput would become the binding constraint instead.

Validate the topic layout

Once you know the broker count, confirm a sample topic fits. Use the standard tooling to inspect partition distribution and run a load test before going live.

# create a sized topic
kafka-topics.sh --bootstrap-server broker:9092 \
  --create --topic orders --partitions 36 --replication-factor 3

# verify leader/replica spread across brokers
kafka-topics.sh --bootstrap-server broker:9092 --describe --topic orders

# load test to confirm per-broker throughput assumptions
kafka-producer-perf-test.sh --topic orders --num-records 5000000 \
  --record-size 1024 --throughput -1 \
  --producer-props bootstrap.servers=broker:9092 acks=all

Best Practices

Size for peak throughput and add 30% disk + one spare broker so a single failure or maintenance window never pushes you over capacity.
Keep the JVM heap small (6-8 GB) and give the rest of RAM to the OS page cache — this is the single biggest read-performance lever.
Treat partition count as a budget: aim for under ~4,000 partitions per broker including replicas to keep recovery and leader election fast.
Provision IOPS and network bandwidth explicitly on cloud volumes; capacity alone does not guarantee throughput.
Never co-locate broker data on shared physical disks or volumes — independent failure domains are what make replication meaningful.
Re-run the calculation whenever retention, RF, or consumer fan-out changes; these inputs dominate the result more than raw produce rate.
Load-test with kafka-producer-perf-test.sh against your real hardware before committing the design to production.