Skip to content

Apache Kafka architecture components

Apache Kafka is a distributed event streaming platform built on a cluster architecture designed for high throughput and fault tolerance.^[kafka-01.md]

Core Components

Broker

A Kafka cluster consists of one or more servers known as brokers. Each broker is identified by a unique integer ID (broker.id) which must be distinct across the cluster^[kafka-01.md]. The primary responsibility of a broker is to handle the storage and serving of data.

Topic

Data is organized into specific categories named topics.^[kafka-01.md] Producers write records to a topic, and consumers subscribe to specific topics to read data.

Partition

For scalability, topics are divided into partitions.^[kafka-01.md] This allows data to be distributed across multiple brokers, enabling greater parallelism for both data production and consumption.

Producer

The producer is the component that sends (publishes) data to Kafka topics.^[kafka-01.md]

Consumer

The consumer reads (subscribes to) data from topics.^[kafka-01.md] Consumers can be organized into consumer groups to work together on processing data.

Offset

To track progress, Kafka uses an offset.^[kafka-01.md] This acts as a unique identifier for a record within a partition, allowing the consumer to keep track of its position in the data stream.

ZooKeeper

Kafka relies on ZooKeeper to manage the cluster state and coordination. Brokers connect to ZooKeeper via a connection string (e.g., localhost:2181) to register themselves and maintain metadata.^[kafka-01.md]

Configuration

Key configuration parameters for a broker include:

  • Data Storage: The log.dirs parameter specifies the directories where message log files are stored^[kafka-01.md].
  • Topic Management: By default, num.partitions determines the number of log partitions per topic if not specified during creation^[kafka-01.md]. The delete.topic.enable property controls whether users can delete topics^[kafka-01.md].

Sources

^[kafka-01.md]