Blog

May 29, 2020

What Is Apache Kafka?

Apache Kafka,

Middleware

Wondering how Kafka works and why it is so popular? With low end-to-end latency, exceptional durability, and the ability to handle mass amounts of streaming data, Apache Kafka has quickly become a go-to tool for stream processing.

In this blog, we give a high-level overview of how Kafka works, talk Kafka Topics, and discuss when Kafka should be used.

What Is Apache Kafka?
How Does Kafka Work?
What Is Kafka Topic?
Why Use Apache Kafka?

What Is Apache Kafka?

Apache Kafka is a popular open source stream processor / middleware tool that can also be used as a message broker. Kafka provides low end-to-end latency with exceptional durability (persistence).

Apache Kafka was initially conceived at LinkedIn in 2008 and in 2011, the Apache Foundation took over the project. In 2014, some of Kafka's original developers founded Confluent to offer a commercialized version of the software.

Compare Apache Kafka vs. Confluent Kafka >>

Kafka is a stream processor, and while you can use Kafka in an application as a message handler, it is not technically a message broker. Kafka has a publish-subscribe feature, like many message brokers, but it is a distributed streaming platform. This means Kafka has the ability to publish and subscribe to streams of records, store streams of records in a durable, fault-tolerant method, and process streams as they occur.

How Does Kafka Work?

Kafka can run as a cluster. These clusters can be local, or they can be disparate, on separate sides of the state, or world.

Records are stored in topics. A record has 3 parts:

A Key
A Value
A Timestamp

Like Apache ActiveMQ/Artemis and other brokers, there is a producer and consumer API available with the Kafka platform. There is also a streams API and a connector API. The streams API assist in wiring applications to manipulate streams and act as a “stream processor”.

The producer and consumer APIs are self-explanatory, they allow applications to act as producers and consumers. There is one more API, the admin API, and it allows management applications control over the stream processor cluster.

Related blog >> What Are APIs?

Message Broker Tool

If the content doesn’t seem to be different from that of message brokers, Kafka might not fit with the system architecture, and a message broker might be the correct tool for you. Kafka is going to provide extremely low latency (FAST) transfer of data (messages) between disconnected, abstract, distant parts of a system.

The Decision Maker's Guide to Apache Kafka
Implementing Kafka requires skill and successfully maintaining Kafka deployments requires patience and expertise. Download our guide to go in-depth and learn about partition strategy, security best practices, running Kafka on Kubernetes, and more.
Get the Guide

What Is Kafka Topic?

Kafka breaks the data out into topics. Topics in Kafka allow multiple subscribers to connect, meaning they can have between zero and some positive abstract number of subscribers.

Topics are divided into partitions. A Kafka partition is an ordered (ordered as records are published), immutable (unchangeable) sequence of records that is continuously built on. Records are assigned an id, or an offset, as they are added to the topic.

Kafka persists all records, whether they have been consumed or not. Records are persisted for the retention period defined in the broker configuration, but the default is 7 days.

# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168

Topic partitions are replicated across a number of brokers in the cluster for fault tolerance.

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1

Producers publish the data to the topics. The producer can publish to topics while choosing which partition to place records in. There are multiple methods of doing this, but “round-robin balancing” would be the simplest.

Consuming Kafka Data

Consumers consume data from the topics. Consumer information is not kept. This is important, because it allows for stateless consumption, faster throughput, less errors and more. Consumers have the offset of their consumption (what record they are currently consuming) and nothing more. Consumers are cheap, they do not use much memory (from the broker perspective), and they can come and go with relative ease.

Consumers also have a lot of freedom with Kafka, freedom to process messages as they like. They can reprocess older messages by changing their offset or skip ahead to the current time to start processing the newest messages.

With MirrorMaker you can replicate your cluster data across the globe to different clusters.

Why Use Apache Kafka?

Kafka offers some high-level guarantees; these are from the Kafka documentation at the Apache Software Foundation.

Messages sent by a producer to a particular topic partition will be appended in the order they are sent. That is, if a record M1 is sent by the same producer as a record M2, and M1 is sent first, then M1 will have a lower offset than M2 and appear earlier in the log.
A consumer instance sees records in the order they are stored in the log.
For a topic with replication factor N, we will tolerate up to N-1 server failures without losing any records committed to the log.

Kafka is a stream processing platform that is distributable across clusters, near or far, providing reliable, fast, and durable message processing for the entire enterprise stack. If you do not want to use it as a stream processor, it is an excellent message broker.

Hopefully this blog has given you a better understanding of how Kafka works and different applications in enterprise settings.

Need Kafka Support Beyond What the Community Provides?
OpenLogic has been helping enterprises succeed with Kafka since 2017. Our experts can provide 24/7 technical support, professional services, training, and long-term support for EOL versions of Kafka.
Explore Kafka Solutions

Additional Resources

Blog - Keeping Up With the Kafka Lifecycle
Video - Kafka Best Practices
Blog - Apache Kafka vs. Confluent Kafka
Case Study - Credit Card Processing Company Avoids Kafka Exploit
Webinar - Harnessing Streaming Data With Apache Kafka
Blog - Kafka vs. RabbitMQ
Blog - Using Apache Kafka for Stream Processing
Blog - Using Kafka with ZooKeeper
Blog - Exploring Kafka Connect
Blog - 8 Kafka Security Best Practices

Featured Product

Kafka Service Bundle

Services

Training

Taking an Open Source Approach to Big Data Management

What Is Apache Kafka?

Table of Contents

What Is Apache Kafka?

How Does Kafka Work?

Message Broker Tool

The Decision Maker's Guide to Apache Kafka

What Is Kafka Topic?

Consuming Kafka Data

Why Use Apache Kafka?

Need Kafka Support Beyond What the Community Provides?

Additional Resources