Apache Kafka is a distributed, fault-tolerant event streaming platform capable of handling trillions of events per day — built for real-time data pipelines and stream processing at scale.
Everything in Kafka revolves around a handful of powerful primitives.
Publishes (writes) records to one or more Kafka topics. Producers decide which partition a message lands in, using a key hash or round-robin strategy.
Reads records from topics. Consumers track their position (offset) independently, allowing replay and parallel consumption without coordination.
A named, ordered, persistent log of records. Topics are split into partitions for parallelism and replicated for fault tolerance.
An ordered, immutable sequence of records within a topic. Partitions enable horizontal scaling — each can live on a different broker.
A unique, sequential integer assigned to each record within a partition. Consumers commit offsets to track exactly where they left off.
A Kafka server that stores and serves records. A Kafka cluster consists of multiple brokers for redundancy and throughput distribution.
A set of consumers that together consume a topic. Each partition is assigned to exactly one consumer in the group, enabling load balancing.
An optional key attached to each message. Records with the same key always go to the same partition, preserving ordering for that key.
Each partition is replicated across N brokers (replication factor). One replica is the leader; others are followers that sync from it.
Kafka's architecture decouples producers from consumers through a fault-tolerant, distributed commit log.
A topic is an append-only, immutable log. Records are written to the end and identified by their offset. Partitions provide ordering within a key, and replication across brokers provides durability.
Topics retain data for a configurable time (e.g. 7 days) or size. Data is deleted or compacted after the limit is reached.
For changelog topics, Kafka keeps only the latest value per key, compacting old records — perfect for state stores and caches.
Each partition has one leader broker (handles all reads/writes) and N-1 followers. If the leader fails, a follower is elected automatically.
Allows apps to publish streams of records to topics. Supports batching, compression (gzip, snappy, lz4, zstd), and async/sync delivery.
WriteSubscribe to topics and process record streams. Consumers form groups for parallel, load-balanced consumption with offset management.
ReadA lightweight stream-processing library. Build stateful, real-time apps that transform or aggregate data as it flows through Kafka topics.
TransformScalable, fault-tolerant connectors to import/export data from databases, S3, Elasticsearch, JDBC sources, and hundreds of others.
IntegrateManage topics, brokers, ACLs, consumer groups, and configs programmatically. Essential for infrastructure-as-code workflows.
ManageReplace traditional message queues (RabbitMQ, ActiveMQ) with a high-throughput, durable, replayable event bus connecting microservices without tight coupling.
Stream clickstream, IoT sensor data, or log events into analytics pipelines (Flink, Spark, ksqlDB) for dashboards with sub-second latency.
Capture every row-level change in databases (PostgreSQL, MySQL) via Debezium connectors and stream them to data warehouses or caches in real time.
Collect logs from hundreds of services into a central Kafka cluster, then fan out to Elasticsearch, S3, or Splunk — decoupling collection from storage.
Use Kafka as the system of record for domain events. Services rebuild state by replaying the event log, enabling time-travel debugging and audit trails.
Replace nightly batch ETL with continuous streams: transform, filter, join, and route data between dozens of sources and sinks in motion.
Implement saga patterns and event-driven choreography between services with guaranteed delivery and independent scaling of each consumer.
Stream features to a feature store in real time, ensuring ML models always have fresh data for inference without batch lag.
Kafka gives you fine-grained control over delivery semantics via producer acknowledgment and idempotency settings.
acks=0). Fastest but least safe.acks=all). Most common default.enable.idempotence=true).isolation.level=read_committed to see only committed data.How does Kafka stack up against other messaging and streaming technologies?
| Feature | Kafka | RabbitMQ | AWS SQS | Pulsar |
|---|---|---|---|---|
| Message Persistence | ✔ Log-based | ~ Queue | ~ Configurable | ✔ Ledger |
| Replay Messages | ✔ By offset | ✗ | ✗ | ✔ Cursor reset |
| High Throughput | ✔ 1M+ msg/s | ~ Moderate | ~ Managed | ✔ High |
| Exactly-Once | ✔ Transactions | ~ With effort | ✗ | ✔ Native |
| Stream Processing | ✔ Kafka Streams | ✗ External | ✗ External | ~ Pulsar Functions |
| Multi-consumer | ✔ Consumer groups | ~ Exchanges | ✗ Single | ✔ Subscriptions |
| Operational Complexity | ~ Medium | ✔ Low | ✔ Managed | ~ Medium-High |