Consumer Groups, Offsets, and the __consumer_offsets Topic
What Is a Consumer Group?
A consumer group is a set of consumer instances that jointly consume a topic. Kafka assigns each partition to exactly one consumer within the group at a time. This is what enables parallel processing: multiple consumers in the same group read different partitions simultaneously.
flowchart LR
subgraph Topic["Topic: orders — 4 partitions"]
P0["Partition 0"]
P1["Partition 1"]
P2["Partition 2"]
P3["Partition 3"]
end
subgraph CG["Consumer Group: inventory-service"]
C1["Consumer Instance 1"]
C2["Consumer Instance 2"]
end
P0 --> C1
P1 --> C1
P2 --> C2
P3 --> C2
Note1["2 consumers, 4 partitions\n→ each consumer handles 2 partitions\n→ 2× throughput vs single consumer"]
Every consumer instance identifies itself with a group.id. All instances with the same group.id form one consumer group. Kafka’s group coordinator (a broker) manages the assignment.
Multiple Consumer Groups: Independent Reads
Different consumer groups are completely independent. Each group maintains its own offset per partition — meaning the same events are delivered to every group that subscribes to the topic.
flowchart LR
subgraph Topic["Topic: orders"]
P0["Partition 0"]
P1["Partition 1"]
end
subgraph CG1["Consumer Group: inventory-service\n(at offset 150 on P0)"]
Inv["Inventory Consumer"]
end
subgraph CG2["Consumer Group: notification-service\n(at offset 90 on P0)"]
Notif["Notification Consumer"]
end
subgraph CG3["Consumer Group: analytics-service\n(at offset 0 — replaying)"]
Analytics["Analytics Consumer"]
end
P0 --> Inv
P0 --> Notif
P0 --> Analytics
P1 --> Inv
P1 --> Notif
P1 --> Analytics
This is fundamentally different from a traditional message queue where a message is consumed by exactly one consumer. In Kafka, every consumer group gets every message — making it behave like a publish-subscribe system at the group level.
Offsets: Tracking Position in the Log
An offset is a sequential integer identifying a record’s position within a partition. Each partition has its own independent offset sequence starting at 0.
flowchart LR
subgraph P0["Partition 0"]
direction LR
R0["Offset 0\nOrderPlaced\ncust-1"]:::consumed
R1["Offset 1\nOrderPlaced\ncust-3"]:::consumed
R2["Offset 2\nOrderCancelled\ncust-1"]:::consumed
R3["Offset 3\nOrderPlaced\ncust-7"]:::fetched
R4["Offset 4\nOrderShipped\ncust-3"]:::available
LEO["← LEO = 5"]
R0 --> R1 --> R2 --> R3 --> R4 --> LEO
end
classDef consumed fill:#86efac,stroke:#16a34a,color:#000
classDef fetched fill:#fde68a,stroke:#d97706,color:#000
classDef available fill:#e5e7eb,stroke:#9ca3af,color:#000
- Green (0–2): consumed and committed — consumer has confirmed processing
- Yellow (3): fetched but not yet committed — consumer is processing
- Grey (4): not yet fetched
- LEO (Log End Offset): the next offset to be written (=5 here)
The consumer’s committed offset is the position Kafka stores on behalf of the consumer. On restart, the consumer resumes from its committed offset — it will re-receive offset 3 (the last uncommitted record).
The __consumer_offsets Topic
Kafka stores committed offsets in an internal topic called __consumer_offsets. It is a regular Kafka topic with 50 partitions by default, using log compaction to retain only the latest offset per (group, topic, partition) key.
flowchart TD
Consumer["Consumer\ngroup: inventory-service"]
Commit["OffsetCommitRequest\ngroup=inventory-service\ntopic=orders\npartition=0\noffset=42"]
CO["__consumer_offsets\n(internal topic, 50 partitions)"]
Restart["Consumer restarts"]
Fetch["OffsetFetchRequest\ngroup=inventory-service\ntopic=orders\npartition=0"]
Resume["Resume from offset 42"]
Consumer -->|"processes record 42,\ncommits offset 43"| Commit
Commit -->|persisted in| CO
Restart -->|on startup| Fetch
Fetch -->|reads from| CO
CO -->|"returns 43"| Resume
The key stored is (group.id, topic, partition). The value is the committed offset plus metadata (timestamp, leader epoch, optional commit metadata string).
Offset Commit Strategies
Auto-Commit (Default)
With enable.auto.commit=true, the consumer automatically commits the highest fetched offset at a fixed interval (auto.commit.interval.ms, default 5000ms).
sequenceDiagram
participant Consumer
participant Kafka
Consumer->>Kafka: fetch(partition=0, offset=10)
Kafka-->>Consumer: records 10, 11, 12, 13, 14
Consumer->>Consumer: process records (no explicit commit)
Note over Consumer: 5 seconds pass
Consumer->>Kafka: commitOffset(partition=0, offset=15) ← auto
Consumer->>Kafka: fetch(partition=0, offset=15)
Kafka-->>Consumer: records 15, 16...
Note over Consumer,Kafka: Problem: if consumer crashes at offset 12,\nit restarts at offset 10 (last auto-committed)\n→ records 10-12 are re-processed (at-least-once)
Auto-commit gives at-least-once delivery semantics — records may be re-processed after a crash. It does not give at-most-once (you could miss records) or exactly-once (without transactions).
Manual Commit
With enable.auto.commit=false, the application controls when offsets are committed:
sequenceDiagram
participant Consumer
participant Kafka
Consumer->>Kafka: fetch(partition=0, offset=10)
Kafka-->>Consumer: records 10, 11, 12
Consumer->>Consumer: process record 10 ✓
Consumer->>Consumer: process record 11 ✓
Consumer->>Consumer: process record 12 ✓
Consumer->>Kafka: commitSync(partition=0, offset=13)
Note over Consumer,Kafka: Offset committed only AFTER\nsuccessful processing\n→ at-least-once (safer than auto-commit)
Manual commit is explored in depth in the Offset Management article.
Rebalancing: Partitions Moving Between Consumers
When a consumer joins or leaves a group, Kafka rebalances — it reassigns partitions among the remaining consumers. During rebalancing, all consumers in the group stop consuming (stop-the-world).
sequenceDiagram
participant Coordinator as Group Coordinator\n(a broker)
participant C1 as Consumer 1
participant C2 as Consumer 2
participant C3 as Consumer 3 (new)
Note over C1,C2: Before: C1 owns P0+P1, C2 owns P2+P3
C3->>Coordinator: JoinGroup request
Coordinator->>C1: Revoke all partitions
Coordinator->>C2: Revoke all partitions
C1->>C1: Commit offsets for P0, P1
C2->>C2: Commit offsets for P2, P3
Coordinator->>Coordinator: Calculate new assignment
Coordinator->>C1: Assign P0, P1
Coordinator->>C2: Assign P2
Coordinator->>C3: Assign P3
Note over C1,C3: After: C1→P0+P1, C2→P2, C3→P3\nAll consumers resume
Rebalancing causes a brief pause in processing (typically 1–30 seconds depending on configuration). For long-running processing jobs, this pause can be significant. The max.poll.interval.ms setting (default 5 minutes) controls how long the coordinator waits before declaring a consumer dead.
Cooperative (Incremental) Rebalancing
Since Kafka 2.4, cooperative rebalancing (CooperativeStickyAssignor) avoids the stop-the-world pause. Only the partitions that need to move are revoked — consumers keep their other partitions running during the transition.
sequenceDiagram
participant C1 as Consumer 1\n(owns P0, P1)
participant C2 as Consumer 2\n(owns P2, P3)
participant C3 as Consumer 3 (joining)
participant Coord as Coordinator
C3->>Coord: JoinGroup
Note over C1,C2: Round 1: Only revoke P3 from C2
Coord->>C2: Revoke P3 (keep P2)
C2->>Coord: RevokeComplete
Coord->>C3: Assign P3
Note over C1,C3: C1 and C2 keep consuming during this\n→ no stop-the-world
Enable cooperative rebalancing in Spring Kafka:
spring.kafka.consumer.properties.partition.assignment.strategy=\
org.apache.kafka.clients.consumer.CooperativeStickyAssignor
Consumer Group State Machine
A consumer group goes through several states managed by the group coordinator:
stateDiagram-v2
[*] --> Empty : No members
Empty --> PreparingRebalance : Consumer joins
PreparingRebalance --> CompletingRebalance : All members joined (JoinGroup phase)
CompletingRebalance --> Stable : Leader sends assignment (SyncGroup phase)
Stable --> PreparingRebalance : Member joins/leaves/heartbeat timeout
Stable --> Empty : All members leave
Empty --> Dead : Group expires (offsets deleted after offsets.retention.minutes)
- Empty: no consumers, but offsets may still be stored
- PreparingRebalance: waiting for all members to join
- CompletingRebalance: leader calculates and submits assignment
- Stable: normal operation — partitions assigned, consumers processing
- Dead: group has been deleted (offsets cleaned up)
What Happens on Consumer Restart
flowchart TD
Start["Consumer starts\ngroup.id = inventory-service"]
FetchOffset["Fetch committed offset from\n__consumer_offsets\n(group, topic, partition)"]
HasOffset{Offset exists?}
NoOffset["No committed offset\n→ apply auto.offset.reset"]
Resume["Resume from committed offset"]
AutoReset{auto.offset.reset}
Earliest["earliest → read from offset 0\n(re-process entire topic)"]
Latest["latest → read only new records\n(skip historical events)"]
NoneReset["none → throw exception"]
Start --> FetchOffset
FetchOffset --> HasOffset
HasOffset -->|Yes| Resume
HasOffset -->|No| NoOffset
NoOffset --> AutoReset
AutoReset -->|earliest| Earliest
AutoReset -->|latest| Latest
AutoReset -->|none| NoneReset
auto.offset.reset only applies when no committed offset exists for the group — typically the first time a consumer group reads a topic, or after offsets expire (controlled by offsets.retention.minutes, default 7 days).
Lag: The Distance Between Producer and Consumer
Consumer lag is the difference between the latest offset (LEO) and the consumer’s committed offset. It tells you how far behind a consumer is.
flowchart LR
subgraph Topic["orders — partition 0"]
direction LR
C["Committed offset: 85"]
H["Latest offset (LEO): 99"]
end
Lag["Lag = 99 − 85 = 14 records behind"]
C -.->|"14 records unprocessed"| Lag
H -.-> Lag
High lag means the consumer is falling behind the producer. Causes:
- Consumer processing is too slow (CPU-bound, database calls, external API calls)
- Consumer has too few instances (fewer instances than partitions)
- Consumer is stuck (exception, deadlock)
Monitor lag with kafka-consumer-groups.sh --describe or Micrometer metrics. Alert when lag grows continuously. Covered in depth in Monitoring.
Key Takeaways
- A consumer group is a set of instances sharing work: each partition is assigned to exactly one instance in the group
- Different consumer groups are independent — every group gets every message, enabling pub-sub semantics
- Offsets are sequential position numbers per partition; committed offsets are stored in
__consumer_offsets - Auto-commit (default) periodically commits offsets — gives at-least-once delivery semantics
- Manual commit gives precise control — commit after successful processing
- Rebalancing reassigns partitions when the group membership changes; cooperative rebalancing minimizes the pause
auto.offset.resetcontrols where a new consumer group starts reading (earliest, latest, or exception)- Consumer lag measures how far behind the consumer is relative to the producer’s latest offset
Next: KRaft Mode: Running Kafka Without ZooKeeper — the new consensus protocol that eliminates the ZooKeeper dependency and simplifies Kafka cluster operations.