How to Develop a Winning Kafka Partition Strategy
For teams using Apache Kafka as a messaging system, Kafka partitions play a key role. Though technically a distributed streaming platform, Apache Kafka has excellent capabilities as a message broker, and understanding how topic partitions function within the greater architecture is essential.
In this blog, we’ll discuss Kafka partitions in more depth, with a focus on how to develop and implement a partitioning strategy based on your use case and application needs.
- What Is a Kafka Partition?
- How Are Kafka Partitions Used?
- Common Kafka Partitioning Strategies
- How to Pick the Right Kafka Partition Strategy
- Final Thoughts
What Is a Kafka Partition?
In Apache Kafka, partitions are the main method of concurrency for topics. A topic, a dedicated location for events or messages, will be broken into multiple partitions among one or more Kafka brokers.
Before we jump into Kafka partitioning strategy, it’s helpful to have a high-level grasp of the structural mechanism for handling data (messages) in Kafka. Messages in Kafka are broken into topics, which are divided into partitions. As a distributed system, Kafka runs in a cluster, and each container in the cluster is called a broker. Partitioning is what enables messages to be split in parallel across several brokers in the cluster.
Using this method of parallelism, Kafka scales to support multiple consumers and producers simultaneously. This method of partitioning allows linear scaling for both consumers as well as producers. When partition replicas are introduced to the environment, this is how Kafka handles redundancy as well.
How Are Kafka Partitions Used?
Kafka partitions work by creating multiple logs from a single topic log and spreading them across one or more brokers, as shown in the images below. As previously mentioned, partitions are what makes Kafka scalable. The number of partitions can be defined at topic creation, or they can be added using the Kafka kafka-topics.sh utility.
Many partitions equate to many clients being able to access a given topic. Keep in mind, however, while one consumer can work with many different partitions, partitions can work with a single consumer at a time. So the goal is to have parity between clients and partitions, as having more clients than partitions does not create an effective gain in throughput or performance.
Single Broker With Multiple Partitions
Multiple Partitions, Multiple Brokers, With Replica Factor 2
In the last image, each primary partition would be defined as the partition leader, with each replica defined as a follower. Producers will place messages onto the partition leader and the consumer will read from that same leader as well, while the replicas are utilized for redundancy. There is more than could be said around Kafka replication, redundancy, and in-sync replicas, but for the purposes of this article, just know that Kafka preserves data and achieves redundancy through partitions and their replicas.
Common Kafka Partitioning Strategies
There are two primary partition strategies for producers that organizations can utilize, and they each have their own benefits and drawbacks. We will discuss both these methods, as well as custom partitioning approaches.
Round Robin Partitioning
This partitioner class is the default partitioning method used when no message key is supplied. This partitioning strategy is generally utilized when messaging order is not important, and having the data evenly balanced across partitions and broker nodes is desirable. Stateless applications generally work well with round robin partitions as message order is usually not needed in stateless applications. If message order is required by the application, then round robin partitioning would not be a good option.
Message Key Partitioning
Also known as the Default Partitioner utilized in 2.4 and earlier, message key partitioning is used when a message key is provided. The key is placed through a hashing function and all messages with the same key are placed onto the same partition, preserving message order. This partitioning method is used when messaging order is important or, when messages should be grouped together. Multi-tenant environments are a common use case for this type of partitioning and messages are grouped together based on a customer key. Another thing to note is that this partitioning strategy can lead to partitions not being evenly distributed and more active messages keys having larger partitions than other less active messages keys.
Custom partitioning is possible, as a Kafka producer can be written to follow any number of rules-based methodologies to assign messages to a particular partition. For instance, a partition can be specified in the record itself, or by utilizing the Partitioner interface provide in the kafka.clients.producer package. Most enterprise use cases will not require custom partitioning, but if you do go down this route, note that some changes have occurred between 2.4 and more recent versions of Kafka, such as 3.3. See KIP-794 for further information.
How to Pick the Right Kafka Partition Strategy
Selecting the best partitioning method will largely depends on the needs of the application in question. If message ordering is important, then utilizing message key for partitioning would be preferable. If the application is stateless and message order is not required, then round robin partitioning would be fine. In the majority of use cases, one of these two strategies will be the method of choice. In situations where a different strategy needs to be defined, implementing a custom partitioning method is possible, but this will not be necessary for the vast majority of teams using Kafka as a message broker.
Topic partitions are at the heart of any Kafka strategy, environment design, and implementation. Choosing the partitioning method that fits your particular use case is the first step in ensuring long-term success with Kafka and benefitting from its remarkable durability and speed in applications.
Get Expert Support for Your Kafka Deployments
If you're working with Kafka at scale, when things go wrong, they go wrong at the same scale. With OpenLogic, you get SLA-backed, 24/7/365 technical support delivered by enterprise architects who have years of experience working with Kafka.
- Guide - Apache Kafka 101
- Blog - Using Apache Kafka for Stream Processing
- Blog - What Is Apache Kafka?
- Webinar - Harnessing Streaming Data with Apache Kafka
- Training - Apache Kafka Training Course
- White Paper - The New Stack: Cassandra, Kafka, and Spark
- Case Study - Credit Card Processing Company Avoids Kafka Exploit With Support From OpenLogic
- Blog - 5 Apache Kafka Security Best Practices