Consumer Groups, Offsets, and the __consumer_offsets Topic

What Is a Consumer Group?

A consumer group is a set of consumer instances that jointly consume a topic. Kafka assigns each partition to exactly one consumer within the group at a time. This is what enables parallel processing: multiple consumers in the same group read different partitions simultaneously.

flowchart LR
    subgraph Topic["Topic: orders — 4 partitions"]
        P0["Partition 0"]
        P1["Partition 1"]
        P2["Partition 2"]
        P3["Partition 3"]
    end

    subgraph CG["Consumer Group: inventory-service"]
        C1["Consumer Instance 1"]
        C2["Consumer Instance 2"]
    end

    P0 --> C1
    P1 --> C1
    P2 --> C2
    P3 --> C2

    Note1["2 consumers, 4 partitions\n→ each consumer handles 2 partitions\n→ 2× throughput vs single consumer"]

Every consumer instance identifies itself with a group.id. All instances with the same group.id form one consumer group. Kafka’s group coordinator (a broker) manages the assignment.


Multiple Consumer Groups: Independent Reads

Different consumer groups are completely independent. Each group maintains its own offset per partition — meaning the same events are delivered to every group that subscribes to the topic.

flowchart LR
    subgraph Topic["Topic: orders"]
        P0["Partition 0"]
        P1["Partition 1"]
    end

    subgraph CG1["Consumer Group: inventory-service\n(at offset 150 on P0)"]
        Inv["Inventory Consumer"]
    end

    subgraph CG2["Consumer Group: notification-service\n(at offset 90 on P0)"]
        Notif["Notification Consumer"]
    end

    subgraph CG3["Consumer Group: analytics-service\n(at offset 0 — replaying)"]
        Analytics["Analytics Consumer"]
    end

    P0 --> Inv
    P0 --> Notif
    P0 --> Analytics
    P1 --> Inv
    P1 --> Notif
    P1 --> Analytics

This is fundamentally different from a traditional message queue where a message is consumed by exactly one consumer. In Kafka, every consumer group gets every message — making it behave like a publish-subscribe system at the group level.


Offsets: Tracking Position in the Log

An offset is a sequential integer identifying a record’s position within a partition. Each partition has its own independent offset sequence starting at 0.

flowchart LR
    subgraph P0["Partition 0"]
        direction LR
        R0["Offset 0\nOrderPlaced\ncust-1"]:::consumed
        R1["Offset 1\nOrderPlaced\ncust-3"]:::consumed
        R2["Offset 2\nOrderCancelled\ncust-1"]:::consumed
        R3["Offset 3\nOrderPlaced\ncust-7"]:::fetched
        R4["Offset 4\nOrderShipped\ncust-3"]:::available
        LEO["← LEO = 5"]
        R0 --> R1 --> R2 --> R3 --> R4 --> LEO
    end

classDef consumed fill:#86efac,stroke:#16a34a,color:#000
classDef fetched  fill:#fde68a,stroke:#d97706,color:#000
classDef available fill:#e5e7eb,stroke:#9ca3af,color:#000
  • Green (0–2): consumed and committed — consumer has confirmed processing
  • Yellow (3): fetched but not yet committed — consumer is processing
  • Grey (4): not yet fetched
  • LEO (Log End Offset): the next offset to be written (=5 here)

The consumer’s committed offset is the position Kafka stores on behalf of the consumer. On restart, the consumer resumes from its committed offset — it will re-receive offset 3 (the last uncommitted record).


The __consumer_offsets Topic

Kafka stores committed offsets in an internal topic called __consumer_offsets. It is a regular Kafka topic with 50 partitions by default, using log compaction to retain only the latest offset per (group, topic, partition) key.

flowchart TD
    Consumer["Consumer\ngroup: inventory-service"]
    Commit["OffsetCommitRequest\ngroup=inventory-service\ntopic=orders\npartition=0\noffset=42"]
    CO["__consumer_offsets\n(internal topic, 50 partitions)"]
    Restart["Consumer restarts"]
    Fetch["OffsetFetchRequest\ngroup=inventory-service\ntopic=orders\npartition=0"]
    Resume["Resume from offset 42"]

    Consumer -->|"processes record 42,\ncommits offset 43"| Commit
    Commit -->|persisted in| CO
    Restart -->|on startup| Fetch
    Fetch -->|reads from| CO
    CO -->|"returns 43"| Resume

The key stored is (group.id, topic, partition). The value is the committed offset plus metadata (timestamp, leader epoch, optional commit metadata string).


Offset Commit Strategies

Auto-Commit (Default)

With enable.auto.commit=true, the consumer automatically commits the highest fetched offset at a fixed interval (auto.commit.interval.ms, default 5000ms).

sequenceDiagram
    participant Consumer
    participant Kafka

    Consumer->>Kafka: fetch(partition=0, offset=10)
    Kafka-->>Consumer: records 10, 11, 12, 13, 14
    Consumer->>Consumer: process records (no explicit commit)
    Note over Consumer: 5 seconds pass
    Consumer->>Kafka: commitOffset(partition=0, offset=15) ← auto
    Consumer->>Kafka: fetch(partition=0, offset=15)
    Kafka-->>Consumer: records 15, 16...

    Note over Consumer,Kafka: Problem: if consumer crashes at offset 12,\nit restarts at offset 10 (last auto-committed)\n→ records 10-12 are re-processed (at-least-once)

Auto-commit gives at-least-once delivery semantics — records may be re-processed after a crash. It does not give at-most-once (you could miss records) or exactly-once (without transactions).

Manual Commit

With enable.auto.commit=false, the application controls when offsets are committed:

sequenceDiagram
    participant Consumer
    participant Kafka

    Consumer->>Kafka: fetch(partition=0, offset=10)
    Kafka-->>Consumer: records 10, 11, 12
    Consumer->>Consumer: process record 10 ✓
    Consumer->>Consumer: process record 11 ✓
    Consumer->>Consumer: process record 12 ✓
    Consumer->>Kafka: commitSync(partition=0, offset=13)
    Note over Consumer,Kafka: Offset committed only AFTER\nsuccessful processing\n→ at-least-once (safer than auto-commit)

Manual commit is explored in depth in the Offset Management article.


Rebalancing: Partitions Moving Between Consumers

When a consumer joins or leaves a group, Kafka rebalances — it reassigns partitions among the remaining consumers. During rebalancing, all consumers in the group stop consuming (stop-the-world).

sequenceDiagram
    participant Coordinator as Group Coordinator\n(a broker)
    participant C1 as Consumer 1
    participant C2 as Consumer 2
    participant C3 as Consumer 3 (new)

    Note over C1,C2: Before: C1 owns P0+P1, C2 owns P2+P3
    C3->>Coordinator: JoinGroup request
    Coordinator->>C1: Revoke all partitions
    Coordinator->>C2: Revoke all partitions
    C1->>C1: Commit offsets for P0, P1
    C2->>C2: Commit offsets for P2, P3
    Coordinator->>Coordinator: Calculate new assignment
    Coordinator->>C1: Assign P0, P1
    Coordinator->>C2: Assign P2
    Coordinator->>C3: Assign P3
    Note over C1,C3: After: C1→P0+P1, C2→P2, C3→P3\nAll consumers resume

Rebalancing causes a brief pause in processing (typically 1–30 seconds depending on configuration). For long-running processing jobs, this pause can be significant. The max.poll.interval.ms setting (default 5 minutes) controls how long the coordinator waits before declaring a consumer dead.

Cooperative (Incremental) Rebalancing

Since Kafka 2.4, cooperative rebalancing (CooperativeStickyAssignor) avoids the stop-the-world pause. Only the partitions that need to move are revoked — consumers keep their other partitions running during the transition.

sequenceDiagram
    participant C1 as Consumer 1\n(owns P0, P1)
    participant C2 as Consumer 2\n(owns P2, P3)
    participant C3 as Consumer 3 (joining)
    participant Coord as Coordinator

    C3->>Coord: JoinGroup
    Note over C1,C2: Round 1: Only revoke P3 from C2
    Coord->>C2: Revoke P3 (keep P2)
    C2->>Coord: RevokeComplete
    Coord->>C3: Assign P3
    Note over C1,C3: C1 and C2 keep consuming during this\n→ no stop-the-world

Enable cooperative rebalancing in Spring Kafka:

spring.kafka.consumer.properties.partition.assignment.strategy=\
  org.apache.kafka.clients.consumer.CooperativeStickyAssignor

Consumer Group State Machine

A consumer group goes through several states managed by the group coordinator:

stateDiagram-v2
    [*] --> Empty : No members
    Empty --> PreparingRebalance : Consumer joins
    PreparingRebalance --> CompletingRebalance : All members joined (JoinGroup phase)
    CompletingRebalance --> Stable : Leader sends assignment (SyncGroup phase)
    Stable --> PreparingRebalance : Member joins/leaves/heartbeat timeout
    Stable --> Empty : All members leave
    Empty --> Dead : Group expires (offsets deleted after offsets.retention.minutes)
  • Empty: no consumers, but offsets may still be stored
  • PreparingRebalance: waiting for all members to join
  • CompletingRebalance: leader calculates and submits assignment
  • Stable: normal operation — partitions assigned, consumers processing
  • Dead: group has been deleted (offsets cleaned up)

What Happens on Consumer Restart

flowchart TD
    Start["Consumer starts\ngroup.id = inventory-service"]
    FetchOffset["Fetch committed offset from\n__consumer_offsets\n(group, topic, partition)"]
    HasOffset{Offset exists?}
    NoOffset["No committed offset\n→ apply auto.offset.reset"]
    Resume["Resume from committed offset"]

    AutoReset{auto.offset.reset}
    Earliest["earliest → read from offset 0\n(re-process entire topic)"]
    Latest["latest → read only new records\n(skip historical events)"]
    NoneReset["none → throw exception"]

    Start --> FetchOffset
    FetchOffset --> HasOffset
    HasOffset -->|Yes| Resume
    HasOffset -->|No| NoOffset
    NoOffset --> AutoReset
    AutoReset -->|earliest| Earliest
    AutoReset -->|latest| Latest
    AutoReset -->|none| NoneReset

auto.offset.reset only applies when no committed offset exists for the group — typically the first time a consumer group reads a topic, or after offsets expire (controlled by offsets.retention.minutes, default 7 days).


Lag: The Distance Between Producer and Consumer

Consumer lag is the difference between the latest offset (LEO) and the consumer’s committed offset. It tells you how far behind a consumer is.

flowchart LR
    subgraph Topic["orders — partition 0"]
        direction LR
        C["Committed offset: 85"]
        H["Latest offset (LEO): 99"]
    end
    Lag["Lag = 99 − 85 = 14 records behind"]
    C -.->|"14 records unprocessed"| Lag
    H -.-> Lag

High lag means the consumer is falling behind the producer. Causes:

  • Consumer processing is too slow (CPU-bound, database calls, external API calls)
  • Consumer has too few instances (fewer instances than partitions)
  • Consumer is stuck (exception, deadlock)

Monitor lag with kafka-consumer-groups.sh --describe or Micrometer metrics. Alert when lag grows continuously. Covered in depth in Monitoring.


Key Takeaways

  • A consumer group is a set of instances sharing work: each partition is assigned to exactly one instance in the group
  • Different consumer groups are independent — every group gets every message, enabling pub-sub semantics
  • Offsets are sequential position numbers per partition; committed offsets are stored in __consumer_offsets
  • Auto-commit (default) periodically commits offsets — gives at-least-once delivery semantics
  • Manual commit gives precise control — commit after successful processing
  • Rebalancing reassigns partitions when the group membership changes; cooperative rebalancing minimizes the pause
  • auto.offset.reset controls where a new consumer group starts reading (earliest, latest, or exception)
  • Consumer lag measures how far behind the consumer is relative to the producer’s latest offset

Next: KRaft Mode: Running Kafka Without ZooKeeper — the new consensus protocol that eliminates the ZooKeeper dependency and simplifies Kafka cluster operations.