Apache Kafka is a commonly used open source tool. In this blog, we break down what Apache Kafka is and how Kafka works.
Apache Kafka is a stream processor and can be used as a message broker as well. If architecture or software requires low end-to-end latency with exceptional durability (persistence), Kafka is the piece of software to provide this and other functionality.
Kafka is not a message broker. It is a stream processor. There is a difference here - you can use Kakfa in an application as a message handler. Kafka has a publish-subscribe feature, like many message brokers, but unlike many message brokers, Kafka is a distributed streaming platform.
This means the ability to publish and subscribe to streams of records, store streams of records in a durable, fault-tolerant method, and process streams as they occur.
Kafka can run as a cluster. These clusters can be local, or they can be disparate, on separate sides of the state, or world. Records are stored in topics. A record has 3 parts:
Like Apache ActiveMQ / Artemis and other brokers, there is a producer and consumer API available with the Kafka platform. There is also a streams API and a connector API. The streams API assist in wiring applications to manipulate streams and act as a “stream processor”.
The producer and consumer APIs are self-explanatory, they allow applications to act as producers and consumers. There is one more API, the admin API, and it allows management applications control over the stream processor cluster.
If the content doesn’t seem to be different from that of message brokers, Kafka might not fit with the system architecture, and a message broker might be the correct tool for you. Kafka is going to provide extremely low latency (FAST) transfer of data (messages) between disconnected, abstract, distant parts of a system.
Kafka breaks the data out into topics. Topics in Kafka allow multiple subscribers to connect, meaning they can have between zero and some positive abstract number of subscribers.
Topics are divided into partitions. A partition is an ordered (ordered as records are published), immutable (unchangeable) sequence of records that is continuously built on. Records are assigned an id, or an offset, as they are added to the topic.
Kafka persists all records, whether they have been consumed or not. Records are persisted for the retention period defined in the broker configuration, but the default is 7 days.
# The minimum age of a log file to be eligible for deletion due to age
Topic partitions are replicated across a number of brokers in the cluster for fault tolerance.
# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
Producers publish the data to the topics. The producer can publish to topics while choosing which partition to place records in. There are multiple methods of doing this, but “round-robin balancing” would be the simplest.
Consumers consume data from the topics. Consumer information is not kept. This is important, because it allows for stateless consumption, faster throughput, less errors and more. Consumers have the offset of their consumption (what record they are currently consuming) and nothing more. Consumers are cheap, they do not use much memory (from the broker perspective), and they can come and go with relative ease.
Consumers also have a lot of freedom with Kafka, freedom to process messages as they like. They can reprocess older messages by changing their offset or skip ahead to the current time to start processing the newest messages.
With MirrorMaker you can replicate your cluster data across the globe to different clusters.
Kafka offers some high-level guarantees, these are from the Kafka documentation at the Apache Software Foundation.
Kafka is a stream processing platform that is distributable across clusters, near or far, providing reliable, FAST, and durable message processing for the entire enterprise stack. If you do not want to use it as a stream processor, it is an excellent message broker.
If you need help with implementing Kafka into your stack, connect with an OpenLogic expert. Try the OpenLogic free trial to open a consultative support ticket to work with an enterprise architect today!
TRY KAFKA SUPPORT
Enterprise Solutions Architect, OpenLogic by Perforce
Andrew has been working in the IT industry since 1996, ranging from hardware and networking to application development. Andrew’s #1 specialty is Apache Tomcat, and he is recognized in the Tomcat community as a subject matter expert, assisting the Tomcat open source project in many ways.