Running on Kubernetes (Strimzi)

Running Kafka on Kubernetes by hand means juggling StatefulSets, headless services, persistent volumes, and config maps — and then reinventing all the operational logic for rolling upgrades, broker reconfiguration, and certificate rotation. Strimzi replaces that toil with a Kubernetes operator: you declare the cluster you want as custom resources, and the operator reconciles reality to match. This page walks through installing Strimzi, defining a KRaft-based Kafka cluster, managing topics and users declaratively, and how the operator performs safe rolling updates.

How Strimzi works

Strimzi extends the Kubernetes API with Custom Resource Definitions (CRDs) such as Kafka, KafkaNodePool, KafkaTopic, and KafkaUser. The Cluster Operator watches these resources and creates the underlying StatefulSets (or StrimziPodSets), services, and volumes. Two other operators — the Topic Operator and User Operator — run inside the cluster to reconcile KafkaTopic and KafkaUser resources against the running brokers.

Because the desired state lives in version-controlled YAML, Strimzi fits naturally into GitOps workflows. You never run kafka-topics.sh against production again — you commit a KafkaTopic and let the operator apply it.

Installing the operator

Install the Cluster Operator into a dedicated namespace. The simplest path applies the published install bundle; for production, prefer the Helm chart or OperatorHub so upgrades are managed.

kubectl create namespace kafka

# Install the Cluster Operator, scoped to the "kafka" namespace
kubectl apply -f 'https://strimzi.io/install/latest?namespace=kafka' -n kafka

# Wait for the operator to become ready
kubectl wait deployment/strimzi-cluster-operator \
  --for=condition=Available --timeout=120s -n kafka

Output:

deployment.apps/strimzi-cluster-operator condition met

Defining a Kafka cluster (KRaft)

Modern Strimzi runs Kafka in KRaft mode, eliminating ZooKeeper. Brokers and controllers are described with KafkaNodePool resources, while the Kafka resource holds cluster-wide configuration like listeners. The strimzi.io/kraft: enabled annotation activates KRaft, and strimzi.io/node-pools: enabled opts into node pools.

The example below defines a controller pool and a broker pool of three replicas each, with persistent storage and two listeners — a plaintext internal listener and a TLS-secured external one.

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
  name: controller
  namespace: kafka
  labels:
    strimzi.io/cluster: prod-cluster
spec:
  replicas: 3
  roles:
    - controller
  storage:
    type: persistent-claim
    size: 20Gi
    deleteClaim: false
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
  name: broker
  namespace: kafka
  labels:
    strimzi.io/cluster: prod-cluster
spec:
  replicas: 3
  roles:
    - broker
  storage:
    type: persistent-claim
    size: 500Gi
    class: fast-ssd
    deleteClaim: false
---
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: prod-cluster
  namespace: kafka
  annotations:
    strimzi.io/kraft: enabled
    strimzi.io/node-pools: enabled
spec:
  kafka:
    version: 3.9.0
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      default.replication.factor: 3
      min.insync.replicas: 2
  entityOperator:
    topicOperator: {}
    userOperator: {}

Apply it and watch the operator build the cluster:

kubectl apply -f kafka-cluster.yaml
kubectl wait kafka/prod-cluster --for=condition=Ready --timeout=300s -n kafka

Keep controllers on a dedicated node pool rather than combining controller and broker roles on the same pods. Isolating the metadata quorum protects cluster stability during heavy broker load and makes scaling brokers independent of the controller count.

Managing topics declaratively

With the Topic Operator enabled, you create topics by applying KafkaTopic resources. The operator keeps Kafka and Kubernetes in sync — edit the YAML to change partitions or retention and the change is applied to the broker.

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: orders
  namespace: kafka
  labels:
    strimzi.io/cluster: prod-cluster
spec:
  partitions: 12
  replicas: 3
  config:
    retention.ms: "604800000"
    min.insync.replicas: "2"
    cleanup.policy: "delete"

Partition counts can only be increased, never decreased. Lowering partitions in a KafkaTopic is rejected by the operator — plan partitioning up front.

Managing users and access

When a listener has authentication enabled, the User Operator provisions credentials from KafkaUser resources. The operator generates the secret (TLS certificate or SCRAM password) into a Kubernetes Secret and configures matching ACLs on the brokers.

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
  name: orders-service
  namespace: kafka
  labels:
    strimzi.io/cluster: prod-cluster
spec:
  authentication:
    type: scram-sha-512
  authorization:
    type: simple
    acls:
      - resource:
          type: topic
          name: orders
          patternType: literal
        operations: [Read, Write, Describe]
        host: "*"

The generated credentials land in a secret named after the user:

kubectl get secret orders-service -n kafka -o jsonpath='{.data.password}' | base64 -d

How the operator handles rolling updates

The Cluster Operator owns the lifecycle of every broker pod. When you change the Kafka resource — a new version, a config tweak, or a certificate renewal — the operator rolls pods one at a time. Before restarting a broker it verifies the broker is not hosting any under-replicated partitions and that taking it down will not drop any partition below min.insync.replicas. If a restart would break availability, the operator waits.

Configuration changes that Kafka supports dynamically are applied via the Admin API without a restart at all; only changes requiring a process restart trigger a rolling bounce. Version upgrades follow Kafka’s two-phase protocol: binaries roll first, then the inter.broker.protocol.version and metadata version are bumped once every broker is on the new release. This sequencing means upgrades are zero-downtime as long as your topics have replication factor of at least 2 and clients retry.

CR change	Operator action
Dynamic config (e.g. some topic defaults)	Applied via Admin API, no restart
Static broker config	Rolling restart, availability-checked
`version` bump	Two-phase rolling upgrade
TLS certificate renewal	Rolling restart, automatic
`replicas` increase in node pool	New pods added, no existing pod restart

Best Practices

Use separate KafkaNodePool resources for controllers and brokers so you can scale and upgrade each role independently.
Pin spec.kafka.version explicitly and upgrade deliberately; never rely on a floating version in production.
Store all CRs in Git and apply them through a GitOps pipeline so the cluster state is auditable and reproducible.
Set deleteClaim: false on storage to prevent accidental data loss when a node pool or cluster resource is deleted.
Always run replication factor 3 with min.insync.replicas: 2 so the availability-aware rolling restart logic can actually keep the cluster online.
Enable the Entity Operator and manage topics and users as KafkaTopic/KafkaUser resources rather than imperative CLI commands.
Configure pod anti-affinity and spread brokers across availability zones so a single node or zone failure cannot take down a quorum.