Apache Kafka architecture components¶
Apache Kafka is a distributed event streaming platform built on a cluster architecture designed for high throughput and fault tolerance.^[kafka-01.md]
Core Components¶
Broker¶
A Kafka cluster consists of one or more servers known as brokers. Each broker is identified by a unique integer ID (broker.id) which must be distinct across the cluster^[kafka-01.md]. The primary responsibility of a broker is to handle the storage and serving of data.
Topic¶
Data is organized into specific categories named topics.^[kafka-01.md] Producers write records to a topic, and consumers subscribe to specific topics to read data.
Partition¶
For scalability, topics are divided into partitions.^[kafka-01.md] This allows data to be distributed across multiple brokers, enabling greater parallelism for both data production and consumption.
Producer¶
The producer is the component that sends (publishes) data to Kafka topics.^[kafka-01.md]
Consumer¶
The consumer reads (subscribes to) data from topics.^[kafka-01.md] Consumers can be organized into consumer groups to work together on processing data.
Offset¶
To track progress, Kafka uses an offset.^[kafka-01.md] This acts as a unique identifier for a record within a partition, allowing the consumer to keep track of its position in the data stream.
ZooKeeper¶
Kafka relies on ZooKeeper to manage the cluster state and coordination. Brokers connect to ZooKeeper via a connection string (e.g., localhost:2181) to register themselves and maintain metadata.^[kafka-01.md]
Configuration¶
Key configuration parameters for a broker include:
- Data Storage: The
log.dirsparameter specifies the directories where message log files are stored^[kafka-01.md]. - Topic Management: By default,
num.partitionsdetermines the number of log partitions per topic if not specified during creation^[kafka-01.md]. Thedelete.topic.enableproperty controls whether users can delete topics^[kafka-01.md].
Sources¶
^[kafka-01.md]