Protobuf Serialization
Protocol Buffers (Protobuf) is Google’s compact, strongly-typed binary serialization format, and it is a first-class citizen in the Confluent Kafka ecosystem alongside Avro. You define your message contract in a .proto file, generate fast Java classes from it, and let KafkaProtobufSerializer/KafkaProtobufDeserializer register and resolve schemas through a Schema Registry. For teams already using Protobuf for gRPC or internal RPC, reusing those definitions on Kafka means one contract spanning both the synchronous and event-driven sides of a system.
Defining a .proto schema
A Protobuf schema is a .proto file, conventionally kept under src/main/proto/. It declares a syntax, a package, and one or more message types with numbered fields. The field numbers — not the names — are what gets written on the wire, which is the foundation of Protobuf’s evolution model.
syntax = "proto3";
package com.devcraftly.events;
option java_package = "com.devcraftly.events";
option java_outer_classname = "OrderProto";
option java_multiple_files = true;
message OrderPlaced {
string order_id = 1;
string customer_id = 2;
int64 amount_cents = 3;
string currency = 4;
int64 placed_at = 5;
}
Generating Java classes
Protobuf records are not hand-written; the protoc compiler turns each .proto into an immutable, builder-based Java class. With Maven the protobuf-maven-plugin binds to the generate-sources phase; with Gradle the com.google.protobuf plugin does the same.
<plugin>
<groupId>org.xolstice.maven.plugins</groupId>
<artifactId>protobuf-maven-plugin</artifactId>
<version>0.6.1</version>
<configuration>
<protocArtifact>com.google.protobuf:protoc:3.25.5:exe:${os.detected.classifier}</protocArtifact>
</configuration>
<executions>
<execution>
<goals><goal>compile</goal></goals>
</execution>
</executions>
</plugin>
This produces com.devcraftly.events.OrderPlaced, a class with typed getters and a fluent newBuilder(). Add the Confluent serializer dependency (io.confluent:kafka-protobuf-serializer) and the confluent Maven repository so KafkaProtobufSerializer is on the classpath.
The wire format
Like Avro, Protobuf records on Kafka are not fully self-describing. The serializer registers the schema, receives an integer ID, and prefixes the payload with a 5-byte header. Protobuf adds one extra detail: a message-index varint that identifies which message type within the .proto was used, since a single file can declare several.
[ magic byte: 0x00 ][ 4-byte schema ID ][ message-index varint(s) ][ Protobuf payload ]
Configuring producer and consumer
Wire the Confluent serializers and point them at the registry. On the consumer, you must tell the deserializer which generated class to materialize via specific.protobuf.value.type; otherwise it returns a DynamicMessage.
spring:
kafka:
bootstrap-servers: localhost:9092
producer:
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: io.confluent.kafka.serializers.protobuf.KafkaProtobufSerializer
properties:
schema.registry.url: http://localhost:8081
auto.register.schemas: false
consumer:
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: io.confluent.kafka.serializers.protobuf.KafkaProtobufDeserializer
properties:
schema.registry.url: http://localhost:8081
specific.protobuf.value.type: com.devcraftly.events.OrderPlaced
Producing and consuming
With the generated class in hand, the application code is straightforward. Build the message, send it, and Spring’s KafkaTemplate handles serialization through the configured serializer.
@Service
public class OrderProducer {
private final KafkaTemplate<String, OrderPlaced> kafkaTemplate;
public OrderProducer(KafkaTemplate<String, OrderPlaced> kafkaTemplate) {
this.kafkaTemplate = kafkaTemplate;
}
public void publish(String orderId, String customerId, long amountCents) {
OrderPlaced event = OrderPlaced.newBuilder()
.setOrderId(orderId)
.setCustomerId(customerId)
.setAmountCents(amountCents)
.setCurrency("USD")
.setPlacedAt(System.currentTimeMillis())
.build();
kafkaTemplate.send("orders", orderId, event);
}
}
@Component
public class OrderConsumer {
@KafkaListener(topics = "orders", groupId = "billing")
public void onOrder(OrderPlaced event) {
System.out.printf("Order %s for customer %s: %d %s%n",
event.getOrderId(), event.getCustomerId(),
event.getAmountCents(), event.getCurrency());
}
}
Output:
Order ord-9341 for customer cust-77: 4999 USD
Protobuf vs Avro
Both formats are compact, schema-backed, and registry-integrated. The choice usually comes down to your existing toolchain and how you think about evolution.
| Aspect | Protobuf | Avro |
|---|---|---|
| Schema language | .proto IDL | JSON (.avsc) |
| Identity of fields | Field numbers | Field names |
| Code generation | Always (protoc) | Generated or GenericRecord |
| Cross-protocol reuse | Shared with gRPC | Kafka-centric |
| Default handling | Implicit defaults (no null) | Explicit defaults + unions |
| Evolution model | Add fields with new numbers | Add fields with defaults |
In proto3 there is no true
nullfor scalar fields — an unsetint64reads back as0and an unsetstringas"". If you must distinguish “absent” from “zero”, use a wrapper type such asgoogle.protobuf.Int64Valueoroptional(proto3.15+) rather than relying on the zero value.
Inspecting the registered schema
The default subject for a topic’s value is <topic>-value. Confirm Protobuf was registered with the correct type.
curl -s http://localhost:8081/subjects/orders-value/versions/latest | jq '.schemaType, .version'
Output:
"PROTOBUF"
1
Best Practices
- Keep
.protofiles in version control as the contract source of truth and generate classes at build time; never edit generated code. - Never reuse or renumber an existing field tag — assign a new number for every added field and
reservedretired ones to keep the wire format compatible. - Set
auto.register.schemas: falsein production and register schemas through CI so incompatible changes fail before deployment. - Set
specific.protobuf.value.typeon consumers to get typed messages instead of an untypedDynamicMessage. - Treat proto3 zero values deliberately: use wrapper types or
optionalwhen “unset” and “zero” must differ. - Reuse the same
.protodefinitions across gRPC and Kafka where it makes sense, but pin a per-subject compatibility mode (usuallyBACKWARD) so the registry enforces the rules you expect.