Apache Kafka in depth - partitions, consumer groups, producer/consumer code, topic configuration, operational notes

Kafka

Kafka is a distributed event streaming platform built around a partitioned, replicated, append-only log. Producers write to topics; consumers read from offsets they control; messages stick around for a retention period (or forever).

Core Vocabulary

Term	What it is
Topic	A named log; the unit of categorisation
Partition	A sub-log within a topic; ordering is per-partition
Broker	A Kafka server; a cluster has many
Producer	Writes records to topics
Consumer	Reads records by partition + offset
Consumer group	A set of consumers cooperating; each partition assigned to one consumer in the group
Offset	A consumer's position in a partition
Replication factor	How many copies of each partition exist across brokers
ISR (in-sync replicas)	Replicas keeping up with the leader
Controller	The broker coordinating the cluster (KRaft-mode; ZooKeeper-free)

Partitions and Ordering

Every topic has N partitions. The producer chooses one per message:

No key → round-robin across partitions
Key set → hash(key) % partitions → same key always lands in the same partition

Ordering is guaranteed within a partition, not across. If you need "all events for user X in order," set the user ID as the key — every event for that user goes to the same partition.

key=user-7      ┐
key=user-7      │── partition 2 (ordered)
key=user-7      ┘

key=user-9      ┐── partition 0
key=user-9      ┘

Consumer Groups

A consumer group is the unit of parallelism:

Topic: orders (6 partitions)        Consumer Group A (3 consumers)
                                    Consumer A1 ── partitions 0,1
                                    Consumer A2 ── partitions 2,3
                                    Consumer A3 ── partitions 4,5

Rules to internalise:

One partition is read by exactly one consumer in a group. If you add more consumers than partitions, the extras sit idle.
Two groups are independent. "Order processor" and "audit logger" can both read every message at their own pace.
Rebalancing happens when consumers join, leave, or crash — partitions are reassigned. Briefly disruptive; tune with partition.assignment.strategy.

To scale processing, add partitions (up-front) and add consumers to the group (any time, up to the partition count).

Producer

// Node.js — kafkajs
const { Kafka, CompressionTypes } = require('kafkajs');

const kafka = new Kafka({
  clientId: 'order-service',
  brokers: ['kafka-1:9092', 'kafka-2:9092', 'kafka-3:9092'],
});

const producer = kafka.producer({
  idempotent: true,              // exactly-once on a single producer session
  maxInFlightRequests: 5,
});

await producer.connect();

await producer.send({
  topic: 'orders',
  compression: CompressionTypes.LZ4,
  messages: [
    {
      key: order.userId,                                  // partition by user
      value: JSON.stringify(order),
      headers: { 'content-type': 'application/json' },
    },
  ],
  acks: -1,                       // wait for all in-sync replicas
});

Producer Knobs Worth Knowing

Setting	What it controls	Sensible default
`acks`	Durability — `0` fire-and-forget, `1` leader only, `-1`/`all` all ISRs	`all` for production
`idempotent: true`	Producer dedupes retries within a session	`true`
`maxInFlightRequests`	Parallel unacked requests per broker	`5` (with idempotence)
`compression`	`none`, `gzip`, `snappy`, `lz4`, `zstd`	`lz4` or `zstd`
`linger.ms`	Wait this long for more messages to batch	5-50ms for throughput
`batch.size`	Bytes per batch	16-64 KB
`retries`	Retry attempts on transient failure	Many; rely on `delivery.timeout.ms` for overall deadline

Consumer

// Node.js — kafkajs
const consumer = kafka.consumer({
  groupId: 'order-processor',
  sessionTimeout: 30000,
  heartbeatInterval: 3000,
});

await consumer.connect();
await consumer.subscribe({ topic: 'orders', fromBeginning: false });

await consumer.run({
  autoCommit: false,                              // commit manually for safety
  eachBatchAutoResolve: false,
  partitionsConsumedConcurrently: 3,

  eachBatch: async ({ batch, resolveOffset, heartbeat, commitOffsetsIfNecessary }) => {
    for (const message of batch.messages) {
      await processOrder(JSON.parse(message.value.toString()));
      resolveOffset(message.offset);
      await heartbeat();
    }
    await commitOffsetsIfNecessary();
  },
});

Commit Patterns

Pattern	Behavior	Trade-off
`autoCommit: true` (default)	Periodic offset commit	At-least-once; you'll re-process on crash
Commit after processing	Manual commit per batch	At-least-once, fewer duplicates
Commit before processing	Rare	At-most-once; messages lost on crash
External offset store (DB) + transactional outbox	Commit alongside DB writes	Truly exactly-once with care

Design every consumer to be idempotent. "Exactly once" is a property of the system, not a checkbox.

Topic Configuration

kafka-topics --create \
  --bootstrap-server kafka:9092 \
  --topic orders \
  --partitions 12 \
  --replication-factor 3 \
  --config retention.ms=604800000 \           # 7 days
  --config cleanup.policy=delete \
  --config min.insync.replicas=2 \
  --config compression.type=producer

Numbers that matter:

Config	Effect
`partitions`	Caps your consumer parallelism. Hard to increase later (breaks key-based ordering for existing data). Start with more than you think you need.
`replication.factor`	Durability. 3 is the production minimum; tolerates one broker loss.
`min.insync.replicas`	Producer `acks=all` requires this many in-sync replicas. Set to `replication.factor - 1` (e.g. 2 with RF 3).
`retention.ms` / `retention.bytes`	When to delete old segments.
`cleanup.policy=delete`	Time-based deletion (the default).
`cleanup.policy=compact`	Keep the latest value per key forever — perfect for state snapshots (current-balance, user-profile topics).

Schemas

Producers and consumers drift. Use a schema registry (Confluent Schema Registry, Apicurio) with Avro / Protobuf / JSON-Schema:

const { SchemaRegistry } = require('@kafkajs/confluent-schema-registry');

const registry = new SchemaRegistry({ host: 'http://schema-registry:8081' });
const schemaId = await registry.getLatestSchemaId('orders-value');

await producer.send({
  topic: 'orders',
  messages: [{
    key: order.userId,
    value: await registry.encode(schemaId, order),
  }],
});

The registry rejects publishes that break compatibility (forward / backward / full) — schema drift becomes a build-time error instead of a 3 a.m. parse error.

Kafka Streams and ksqlDB

For computation on top of Kafka topics — joins, windowed aggregations, materialised views — Kafka has its own stream-processing layer:

Kafka Streams — a Java library; runs in your app's JVM
ksqlDB — SQL on Kafka topics; runs as a separate cluster
Flink — the heavy-duty alternative for complex streaming, exactly-once semantics, large state

Decision: small in-app aggregations? Streams. Bigger, cross-team? Flink.

Operational Notes

Cluster Sizing

Lever	Guidance
Brokers	Start with 3 (tolerates one loss); scale by partitions × throughput
Partitions per broker	Keep under ~4000 total partitions per broker for healthy ops
Disk	Fast SSD; sequential write workload; size for retention × throughput
Network	10 GbE+ for clusters above ~100 MB/s aggregate
JVM heap	6 GB is plenty; the OS page cache does the heavy lifting

KRaft vs ZooKeeper

Kafka has moved off ZooKeeper (in KRaft mode) — don't deploy a new cluster with ZooKeeper. KRaft is GA and the future; the controller role is built into Kafka brokers.

Tools You'll Want

Tool	Use
`kafka-topics`, `kafka-console-producer`, `kafka-console-consumer`	First-line debugging
`kafka-consumer-groups --describe`	See lag per partition
Kafka UI (`provectuslabs/kafka-ui`)	Web UI for topics, messages, consumer groups
Cruise Control	Partition rebalancing
Burrow	Consumer lag monitoring

Lag Is the Number to Watch

A consumer's lag is "how many records behind the latest." Growing lag = your consumer can't keep up. Alert on it.

docker compose exec kafka kafka-consumer-groups \
  --bootstrap-server kafka:9092 \
  --group order-processor --describe

GROUP            TOPIC   PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG
order-processor  orders  0          12453           12500            47
order-processor  orders  1          12200           12500           300

Kafka

Kafka

Core Vocabulary

Partitions and Ordering

Consumer Groups

Producer

Producer Knobs Worth Knowing

Consumer

Commit Patterns

Topic Configuration

Schemas

Kafka Streams and ksqlDB

Operational Notes

Cluster Sizing

KRaft vs ZooKeeper

Tools You'll Want

Lag Is the Number to Watch

Best Practices Checklist

On this page