What Is Apache Kafka: Event Streaming From First Principles

The Problem Kafka Solves

Imagine an e-commerce platform. A customer places an order. What needs to happen next?

  • Inventory must be reserved
  • Payment must be charged
  • A confirmation email must be sent
  • The warehouse must be notified to pick and pack
  • Analytics must record the sale
  • Fraud detection must evaluate the transaction

One request. Six downstream systems. In a traditional REST architecture, the Order Service calls each of those six services directly — synchronously, one after another. This creates a tight web of dependencies: if the email service is slow, the customer waits. If the analytics service is down, the order fails. Every new consumer of order data requires a change to the Order Service.

Kafka breaks this coupling. The Order Service publishes one event: OrderPlaced. Every downstream system subscribes to that event independently. The Order Service does not know or care who is listening — it just writes to Kafka and returns. New consumers can be added without touching the producer. Consumers can be slow or temporarily offline without blocking the producer. Each consumer reads at its own pace.

This is the core value proposition: decoupled, durable, replayable event streams.


What Kafka Is — And What It Is Not

Kafka Is an Event Streaming Platform

Apache Kafka is a distributed event streaming platform. The official definition covers three capabilities:

  1. Publish and subscribe to streams of events (like a message queue)
  2. Store streams of events durably and reliably for as long as you want (like a database)
  3. Process streams of events as they occur or retrospectively (like a stream processor)

The key insight is the word store. Unlike a traditional message queue (RabbitMQ, ActiveMQ), Kafka retains messages for a configurable period — days, weeks, forever — regardless of whether they have been consumed. This changes the fundamental model: consumers are not racing to drain a queue; they are reading from an immutable, ordered log and tracking their own position in it.

Kafka Is Not a Traditional Message Queue

Traditional QueueKafka
Message storageDeleted after consumptionRetained for a configurable period
Consumer modelCompeting consumers drain one queueEach consumer group has its own cursor
ReplayNot possible (message is gone)Any consumer can re-read from any offset
OrderingOften not guaranteedGuaranteed within a partition
ThroughputModerateMillions of events per second
RoutingComplex routing logic (exchanges, bindings)Simple: topics and partitions

Kafka Is Not a Database

Kafka stores data, but it is not designed for random reads by primary key. You cannot run SELECT * FROM orders WHERE id = 42 against Kafka. Its access pattern is sequential: consumers read a stream of events from a position forward. Kafka complements databases — events flow through Kafka and land in databases (or search indexes, data warehouses) where point queries are needed.


Core Concepts at a Glance

Before going deep in the next article, here are the building blocks:

flowchart TB
    subgraph Cluster["Kafka Cluster"]
        subgraph B1["Broker 1"]
            P0["Topic: orders\nPartition 0\n[0][1][2][3][4][5]"]
        end
        subgraph B2["Broker 2"]
            P1["Topic: orders\nPartition 1\n[0][1][2][3]"]
        end
        subgraph B3["Broker 3 (replica)"]
            R0["Replica of P0"]
            R1["Replica of P1"]
        end
    end

    Producer["⬆ Producer\n(Order Service)"] -->|append| P0
    Producer -->|append| P1
    Consumer["⬇ Consumer\n(Inventory Service)\noffset: 4 on P0, 2 on P1"] -->|read| P0
    Consumer -->|read| P1

Event (Record): the unit of data — a key, a value, a timestamp, and optional headers.

Topic: a named, append-only log. orders, payments, inventory-updates.

Partition: a topic is split into partitions for parallelism. Partition 0 and Partition 1 of the orders topic can be read by different consumer instances simultaneously.

Offset: the position of a record within a partition. Offset 0 is the first record, offset 1 is the second. A consumer remembers its offset and continues from there after a restart.

Broker: a single Kafka server. A cluster has multiple brokers for fault tolerance and throughput.

Consumer Group: a set of consumers that jointly consume a topic. Each partition is assigned to exactly one member of the group at a time — this is how parallel processing works.


Kafka in the E-Commerce Domain

Throughout this series, we build an order-processing platform with five services:

flowchart LR
    OrderSvc[Order Service\n Publisher]
    InventorySvc[Inventory Service\n Consumer]
    PaymentSvc[Payment Service\n Consumer + Publisher]
    NotificationSvc[Notification Service\n Consumer]
    AnalyticsSvc[Analytics Service\n Consumer]

    OrderSvc -->|orders topic| Kafka[(Apache Kafka)]
    Kafka --> InventorySvc
    Kafka --> PaymentSvc
    Kafka --> NotificationSvc
    Kafka --> AnalyticsSvc
    PaymentSvc -->|payments topic| Kafka

Each service is a Spring Boot application. The Order Service publishes events; the others consume them. Some (like Payment) are both consumers and publishers — they consume order events and publish payment events.

This architecture means:

  • Adding a new service (say, a loyalty points service) requires zero changes to the Order Service
  • If the Notification Service goes down for an hour, it catches up from its saved offset when it restarts — no events are lost
  • The Analytics Service can replay the entire order history by seeking to offset 0

Why Kafka Was Built at LinkedIn

Kafka was created at LinkedIn in 2011. LinkedIn had a problem: they needed to move hundreds of billions of events per day between systems — page views, profile updates, connection requests, search queries — to feed their recommendation engine, analytics pipeline, and operational monitoring.

Existing systems could not handle the volume or provide the replayability they needed. So they built Kafka as a unified log — a single, durable, high-throughput stream that all systems could read from independently.

The insight that makes Kafka different is treating the log as the primary data structure. A log is append-only, ordered, and persistent. Every database internally uses a log (the write-ahead log, the binlog). Kafka exposes the log as the API, making it the canonical source of truth for what happened and in what order.


Kafka’s Throughput Numbers (and Why)

Kafka can sustain millions of events per second on commodity hardware. The reasons are architectural:

Sequential disk I/O: Kafka writes to disk sequentially. Sequential disk I/O is within 2–3× of RAM speed on modern SSDs and dramatically faster than random I/O. Kafka never updates existing records — it only appends.

Zero-copy: Kafka uses the OS sendfile() syscall to transfer data from disk to the network socket without copying it through user space. The data moves: disk → kernel buffer → network card — never touching the JVM heap.

Batching: Producers and consumers batch records. Instead of one network round trip per record, a single request carries thousands of records. This amortizes the latency cost across many events.

Page cache: The OS caches recently written data in RAM. Consumer reads that are close to the producer’s head hit the page cache, not disk — effectively making Kafka an in-memory queue for active consumers.


When to Use Kafka (and When Not To)

Use Kafka when:

  • You need to decouple a producer from multiple consumers
  • You need durability — events must survive consumer downtime
  • You need replay — historical events must be re-processable
  • You need high throughput — thousands to millions of events per second
  • You need ordering guarantees within a logical partition (same customer ID, same order ID)
  • You are building event sourcing or CQRS architectures

Do not use Kafka when:

  • You need request-reply with a sub-10ms response (use gRPC or HTTP)
  • You need complex routing based on message content (use RabbitMQ with routing keys)
  • Your team is not prepared to operate a distributed system (use a managed service like Confluent Cloud, AWS MSK, or Azure Event Hubs instead of self-hosted Kafka)
  • Your message volume is low (< 1,000 events/day) and a simple database table would suffice

Key Takeaways

  • Kafka is an event streaming platform, not just a message queue — it stores events durably and allows replay
  • Unlike traditional queues, Kafka retains messages after consumption; consumers track their own position (offset)
  • A topic is an append-only log, split into partitions for parallelism; each partition has an ordered, immutable sequence of events
  • Consumer groups enable parallel consumption — each partition is assigned to exactly one consumer in the group
  • Kafka’s throughput comes from sequential I/O, zero-copy networking, and aggressive batching
  • Throughout this series we build an e-commerce platform: Order Service publishes events; Inventory, Payment, Notification, and Analytics services consume them

Next: Kafka Architecture: Brokers, Topics, Partitions, and Replicas — how the cluster is structured, how leaders and followers work, and what happens when a broker fails.