Auditing & Compliance
Once Kafka carries production data, security is no longer just about keeping intruders out — you must also prove who did what, protect personally identifiable information (PII), and satisfy regulations like GDPR, HIPAA, or PCI-DSS. Kafka gives you the raw material for this through authorizer audit logging, schema-level masking, and disk encryption, but it is up to you to assemble these into a defensible compliance posture. This page covers practical auditing, PII handling, and encryption at rest for a modern KRaft-mode cluster.
Audit logging with the authorizer
Every authorization decision in Kafka flows through an authorizer, and the built-in StandardAuthorizer (KRaft) emits a log line for each allowed or denied request. By default only denials are logged, but you can capture successful operations too, which is what auditors usually want. Configure this in the broker server.properties:
authorizer.class.name=org.apache.kafka.metadata.authorizer.StandardAuthorizer
super.users=User:admin
# Log allowed requests, not just denials
log4j.logger.kafka.authorizer.logger=INFO, authorizerAppender
log4j.additivity.kafka.authorizer.logger=false
Route the kafka.authorizer.logger to a dedicated, write-protected appender so audit records live in their own file with their own retention:
log4j.appender.authorizerAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.authorizerAppender.File=${kafka.logs.dir}/kafka-authorizer.log
log4j.appender.authorizerAppender.DatePattern='.'yyyy-MM-dd
log4j.appender.authorizerAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.authorizerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
A typical entry tells you the principal, operation, resource, and the decision:
Output:
[2026-06-01 10:42:17,003] INFO Principal = User:orders-svc is Allowed Operation = Write
from host = 10.4.2.11 on resource = Topic:LITERAL:orders for request = Produce (kafka.authorizer.logger)
[2026-06-01 10:42:18,551] INFO Principal = User:analytics is Denied Operation = Read
from host = 10.4.9.30 on resource = Topic:LITERAL:orders for request = Fetch (kafka.authorizer.logger)
Tip: Ship
kafka-authorizer.logto a centralized, append-only SIEM (Splunk, Elastic, Loki) immediately. Logs that live only on the broker can be tampered with or lost when a node is replaced.
For richer audit trails, set INFO on kafka.request.logger to capture request-level metadata, and correlate with clientId set by each application so you can attribute activity to a service rather than just a TLS principal.
Handling PII in messages
Kafka is an immutable, append-only log, which collides with regulations that demand data minimization and deletion. The first defense is to never put raw PII on a topic you do not control. Use these layered strategies.
Mask fields with a Single Message Transform
If you use Kafka Connect, you can drop or mask sensitive fields before they ever land on a topic using the MaskField SMT:
{
"transforms": "maskPII",
"transforms.maskPII.type": "org.apache.kafka.connect.transforms.MaskField$Value",
"transforms.maskPII.fields": "ssn,creditCard",
"transforms.maskPII.replacement": "****"
}
For producer-side control, apply masking in application code before sending. With Spring for Apache Kafka and a Java record event, redact at the boundary:
public record CustomerEvent(String id, String email, String maskedSsn) {
public static CustomerEvent of(String id, String email, String rawSsn) {
String masked = "***-**-" + rawSsn.substring(rawSsn.length() - 4);
return new CustomerEvent(id, email, masked);
}
}
@Service
public class CustomerPublisher {
private final KafkaTemplate<String, CustomerEvent> kafkaTemplate;
public CustomerPublisher(KafkaTemplate<String, CustomerEvent> kafkaTemplate) {
this.kafkaTemplate = kafkaTemplate;
}
public void publish(String id, String email, String rawSsn) {
kafkaTemplate.send("customers", id, CustomerEvent.of(id, email, rawSsn));
}
}
Separate topics and tokenization
Keep PII in dedicated topics with tight ACLs and short retention, and emit only a token or surrogate key on broadly consumed topics. This limits blast radius: revoking access to one topic removes access to the sensitive data without touching the main event stream.
| Strategy | What it does | Best for |
|---|---|---|
| Field masking (SMT/code) | Replaces/redacts values before storage | Fields never needed downstream |
| Separate PII topic + ACLs | Isolates raw data behind tight permissions | Data some consumers legitimately need |
| Tokenization / vault lookup | Stores a reference, not the value | Cross-system de-identification |
| Encryption (envelope) | Per-field ciphertext in the payload | Selective, reversible protection |
Right-to-be-forgotten and immutable logs
Because partitions are immutable, you cannot edit a record in place to honor a deletion request. The two workable patterns are:
- Compaction with tombstones — use a compacted topic keyed by subject ID, then produce a record with a
nullvalue (a tombstone). After the cleaner runs and thedelete.retention.mswindow passes, the prior values are removed. - Crypto-shredding — encrypt each subject’s PII with a per-subject key, then delete the key. The ciphertext remains but becomes unrecoverable, satisfying erasure without rewriting the log.
# Create a compacted topic so tombstones can delete keyed PII
kafka-topics.sh --bootstrap-server localhost:9092 --create \
--topic customer-pii \
--config cleanup.policy=compact \
--config delete.retention.ms=86400000 \
--config min.cleanable.dirty.ratio=0.1
Warning: A tombstone only removes records with the same key. If PII is scattered across messages with different keys, compaction will not erase it — design your keying for deletability from day one.
Encryption at rest
Kafka has no native broker-side disk encryption, so you protect data at rest at the storage layer. The standard approach is full-disk or volume encryption on the log directories.
- LUKS / dm-crypt for on-prem disks holding
log.dirs. - Cloud-managed encryption such as AWS EBS encryption, GCP CMEK, or Azure disk encryption — enable it on the volumes before formatting and mounting them as
log.dirs.
This protects against stolen disks and offline access but does not protect against a compromised broker process, which sees plaintext. For that threat model, combine volume encryption with end-to-end (envelope) encryption in the producer/consumer, so brokers only ever store ciphertext. Pair encryption at rest with TLS for encryption in transit to close the gap on the wire.
Best Practices
- Enable allowed-and-denied authorizer logging and ship it to an append-only SIEM with retention that meets your regulatory window.
- Set a meaningful
clientIdper service so audit logs attribute activity to applications, not anonymous principals. - Never write raw PII to broadly consumed topics — mask, tokenize, or isolate it behind dedicated topics with strict ACLs.
- Design topic keys for deletability so compaction tombstones can actually erase a subject’s data.
- Prefer crypto-shredding for right-to-be-forgotten when rewriting history is impractical.
- Always enable volume-level encryption on
log.dirs, and add end-to-end encryption when brokers must not see plaintext. - Document your data flows and retention settings — auditors will ask you to prove, not just assert, your controls.