KRaft Mode: Running Kafka Without ZooKeeper

Why ZooKeeper Had to Go

For the first decade of Kafka’s existence, every Kafka cluster required an Apache ZooKeeper cluster to manage metadata: controller election, topic configurations, partition leadership, access control lists, and consumer group state.

This created real problems:

flowchart TB
    subgraph OldArchitecture["Old Architecture: Kafka + ZooKeeper"]
        subgraph ZK["ZooKeeper Cluster (3+ nodes)"]
            Z1["ZK Node 1"]
            Z2["ZK Node 2"]
            Z3["ZK Node 3"]
        end
        subgraph Kafka["Kafka Cluster"]
            K1["Broker 1\n(Controller)"]
            K2["Broker 2"]
            K3["Broker 3"]
        end
        K1 <-->|metadata| ZK
        K2 <-->|metadata| ZK
        K3 <-->|metadata| ZK
    end

    Problems["Problems:\n• Must operate TWO distributed systems\n• ZooKeeper has different ops model\n• Controller must sync from ZK on startup\n• Metadata changes are slow (ZK round-trips)\n• Max ~200k partitions per cluster"]
    OldArchitecture --- Problems

Operational burden was the biggest issue: teams had to learn, monitor, and maintain ZooKeeper as a separate system alongside Kafka. ZooKeeper also became a bottleneck for large clusters — the metadata propagation path was slow (ZooKeeper → Controller → all brokers), limiting partition counts and rebalance speed.


KRaft: Kafka Raft Metadata Protocol

KRaft (Kafka Raft) replaces ZooKeeper with a built-in consensus protocol based on the Raft algorithm. Metadata is stored in a special internal topic (__cluster_metadata) that is itself managed by Kafka — no external system needed.

flowchart TB
    subgraph KRaftArchitecture["KRaft Architecture (Kafka 3.3+)"]
        subgraph Controllers["Controller Quorum (Raft)"]
            C1["Broker 1\n⭐ Active Controller\n(Raft Leader)"]
            C2["Broker 2\n(Raft Follower)"]
            C3["Broker 3\n(Raft Follower)"]
        end
        subgraph DataBrokers["Data Brokers"]
            D4["Broker 4"]
            D5["Broker 5"]
            D6["Broker 6"]
        end
        C1 <-->|Raft consensus| C2
        C1 <-->|Raft consensus| C3
        C1 -->|metadata push| D4
        C1 -->|metadata push| D5
        C1 -->|metadata push| D6
    end

    Benefits["Benefits:\n• One system to operate\n• Metadata stored in Kafka itself\n• Controller knows current state (no sync on startup)\n• Millions of partitions per cluster\n• Faster rebalances and failovers"]
    KRaftArchitecture --- Benefits

Roles: Combined vs Separated

In KRaft mode, each broker has one or more process roles:

flowchart LR
    subgraph Combined["Combined Mode\n(small clusters, dev/test)"]
        CB1["Broker 1\nroles: broker, controller"]
        CB2["Broker 2\nroles: broker, controller"]
        CB3["Broker 3\nroles: broker, controller"]
    end

    subgraph Separated["Dedicated Mode\n(production, large clusters)"]
        subgraph ControlPlane["Controller Nodes (3)"]
            CC1["Node 1\nroles: controller"]
            CC2["Node 2\nroles: controller"]
            CC3["Node 3\nroles: controller"]
        end
        subgraph DataPlane["Broker Nodes (N)"]
            DB1["Node 4\nroles: broker"]
            DB2["Node 5\nroles: broker"]
            DB3["Node 6\nroles: broker"]
        end
    end
  • Combined: every node is both a controller and a broker — simpler but the controller quorum shares resources with data workloads
  • Dedicated: controller nodes are separate from data brokers — recommended for production clusters with high throughput, as controller nodes are not impacted by data plane load

The __cluster_metadata Topic

The controller quorum stores all cluster metadata in a special internal Kafka topic: __cluster_metadata. This topic:

  • Has exactly 1 partition (all metadata is ordered)
  • Uses Raft for replication among controllers (not the standard ISR mechanism)
  • Is not accessible to regular producers/consumers
  • Contains: broker registrations, topic/partition configurations, leadership records, ISR changes, access control entries
sequenceDiagram
    participant Admin as Admin (kafka-topics.sh)
    participant AC as Active Controller\n(Raft Leader)
    participant FC1 as Follower Controller 1
    participant FC2 as Follower Controller 2
    participant Broker as Data Broker

    Admin->>AC: Create topic "orders" (3 partitions, RF=3)
    AC->>AC: Append to __cluster_metadata
    AC->>FC1: Raft replicate
    AC->>FC2: Raft replicate
    FC1-->>AC: Ack
    FC2-->>AC: Ack
    AC->>Broker: Push metadata update (new partitions)
    AC-->>Admin: Topic created

Because metadata is stored in Kafka itself, the active controller always has the full, up-to-date metadata in memory. There is no synchronization lag on startup — a new controller becomes active in seconds.


Controller Failover in KRaft

sequenceDiagram
    participant AC as Active Controller\n(Node 1, Raft term 1)
    participant F1 as Follower Controller\n(Node 2)
    participant F2 as Follower Controller\n(Node 3)
    participant Broker as Data Broker

    Note over AC: Node 1 crashes
    F1->>F1: Heartbeat timeout detected
    F1->>F2: RequestVote (term 2, I want to be leader)
    F2-->>F1: VoteGranted
    F1->>F1: Become Active Controller (term 2)
    F1->>Broker: Notify: new controller is Node 2
    Note over F1,Broker: Failover in seconds\n(Raft is fast)

Raft requires a quorum (majority) of controllers to be alive to elect a leader. With 3 controllers, the cluster tolerates 1 controller failure. With 5 controllers, it tolerates 2.


KRaft Configuration Reference

Every KRaft broker needs these properties:

# Unique identifier for this node
node.id=1

# Roles this node plays
process.roles=broker,controller

# Raft voters: nodeId@host:controllerPort for every controller
controller.quorum.voters=1@kafka1:9093,2@kafka2:9093,3@kafka3:9093

# Listener names
listeners=PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
advertised.listeners=PLAINTEXT://kafka1:9092
listener.security.protocol.map=PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT
controller.listener.names=CONTROLLER

# Log directory
log.dirs=/var/kafka/data

# Replication settings (covered in later articles)
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2

Before starting a fresh cluster, generate a unique cluster UUID:

# Generate cluster UUID (do this once per cluster)
KAFKA_CLUSTER_ID="$(kafka-storage.sh random-uuid)"

# Format storage on every broker node with the same UUID
kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c /path/to/server.properties

The storage format step writes the cluster ID to the log directory. Without it, Kafka refuses to start.


KRaft in Docker Compose (used throughout this series)

version: '3.8'
services:
  kafka:
    image: confluentinc/cp-kafka:7.6.0
    hostname: kafka
    container_name: kafka
    ports:
      - "9092:9092"
      - "9093:9093"
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_AUTO_CREATE_TOPICS_ENABLE: "false"
      KAFKA_LOG_DIRS: /var/lib/kafka/data
      CLUSTER_ID: "MkU3OEVBNTcwNTJENDM2Qk"
    volumes:
      - kafka-data:/var/lib/kafka/data

volumes:
  kafka-data:

The CLUSTER_ID is a pre-generated UUID. The Confluent image handles the kafka-storage format step automatically when CLUSTER_ID is set.

Start Kafka:

docker compose up -d
docker compose logs -f kafka  # watch for "started" message

Verify it is running:

docker exec kafka kafka-broker-api-versions.sh --bootstrap-server localhost:9092

ZooKeeper Mode vs KRaft Mode: Migration

If you are migrating an existing cluster from ZooKeeper mode to KRaft, Kafka provides a migration path (available from Kafka 3.5+). The migration is done in phases:

flowchart LR
    ZK["ZooKeeper Mode\n(all brokers)"]
    Bridge["Bridge Mode\n(ZK + KRaft controllers\nrunning together)"]
    KRaft["KRaft Mode\n(ZooKeeper decommissioned)"]

    ZK -->|"Phase 1:\ndeploy KRaft controllers"| Bridge
    Bridge -->|"Phase 2:\nbrokers migrate to KRaft"| KRaft

For new deployments, always use KRaft mode. ZooKeeper mode is removed in Kafka 4.0.


Key Takeaways

  • ZooKeeper was Kafka’s external metadata manager — KRaft replaces it with a built-in Raft consensus protocol
  • In KRaft mode, metadata is stored in __cluster_metadata, a special internal Kafka topic
  • Each node has a process.roles: broker, controller, or both (broker,controller)
  • The controller quorum uses Raft — typically 3 or 5 controller nodes; tolerates floor(N/2) failures
  • Combined mode (broker+controller on same node) is fine for development; use dedicated controllers in production
  • Format storage with a shared CLUSTER_ID before starting any node in a new cluster
  • For this series, a single Docker Compose node in combined mode is used for simplicity

Next: Starting a Kafka Cluster: Single-Broker and 3-Broker with KRaft — hands-on setup of a local Kafka cluster using Docker Compose, ready for all upcoming examples.