Blog
July 11, 2025
Apache Kafka is a powerful stream processing application that can be found at the heart of the largest data warehouses around the world. Responsible for the heavy lifting from data sources to data sinks, Apache Kafka is capable processing millions of records or message per second while still maintaining sub-second end-to-end latency. However, this is only possible if we keep our Apache Kafka Clusters along with their consumers and produce secured.
In this post we will discuss some standard Apache Kafka security best practices to help us do exactly that, including recommendations for authentication, encryption, updates, access control lists, and more.
Table of Contents
Getting Started with Apache Kafka
Some of the information discussed in this blog does have some prerequisite concepts that will be helpful for the reader to be familiar with. Understanding basic Kafka concepts such as topics, partitions, and consumer groups will be very handy. I recommend visiting our Enterprise Kafka Resources hub, downloading The Decision Maker's Guide to Apache Kafka, or watching the video below that covers some basic tips for configuring and testing your deployments.
8 Kafka Security Best Practices
“Out of the box” default configurations are great for prototyping and proof-of-concept designs, but to really unleash the performance and reliability of Kafka, it must be properly secured.
We all know the easiest way to stand up any enterprise resource is to just run with its default configuration. Or, maybe bypassing encryption like TLS, or skipping hardening processes like SELinux. But if we can’t guarantee the integrity of our services, then we cannot provide the reliability of services and accuracy of data required for enterprise operations.
With that in mind, here are a few elemental Apache Kafka security best practices your organization should be applying in your Kafka environments.
1. Authenticate Everything
An often ignored security practice we find when doing Kafka environment analysis for customers is client authentication. Many organizations fail to authenticate their Kafka producers and consumers, which may be understandable in some contexts, but with strong support for multiple SASL offerings — including SASL/GSSAPI (Kerberos), SASL/OAUTHBEARER, SASL/SCRAM-SHA-256, SASL/SCRAM-SHA-512, and SASL/PLAIN — securing your cluster to only talk with authenticated clients is fairly straightforward, and definitely worth your time.
It's unfortunately common that organizations will spend a lot of time securing external attack vectors while ignoring their internal attack surfaces. This creates a hardened outer shell, but leaves the inside “gooey” and easily compromised from internal threats. Adding client authentication is a major step in hardening that “gooey” center.
In addition to authenticating clients, we should be authenticating broker communications to ZooKeeper as well. Starting with version 3.5.6 of ZooKeeper that is shipped with Kafka 2.4, support for mutual TLS (mTLS) was implemented. Authentication support was expanded to include SASL mechanisms starting with Kafka 2.5. In KRaft mode this is accomplished by enabling TLS for inter-broker communication. With Strimzi, TLS for inter-broker communication is enabled by default and configured out of the box. This is not the case with the vanilla Kafka; the security.inter.broker.protocol must be set to TLS, and the TLS keystore and truststore must be configured in the server,properties file manually as well.
2. Encrypt Everything
The availability of free and easy-to-implement cryptography is ubiquitous in the modern enterprise. With freely available PKI solutions that either self-sign or utilize free services like LetsEncrypt, there is very little reason to have non-encrypted traffic crossing your network infrastructure. While disabled by default, encrypting communications on your Kafka Cluster is fairly easy and helps ensure the integrity of your cluster.
Keep in mind, there are performance considerations in regard to CPU and JVM implementations when enabling cluster encryption, but the benefits of enabling encryption will almost always outweigh the performance considerations.
Also keep in mind that some older clients do not have support for encryption, and require that versions 0.9.0 and higher of the consumer and producer API be utilized (which ties into the next security best practice).
3. Update Regularly
While I would consider this a “performance tuning” best practice as well (and it is definitely applicable to more than just Kafka), keeping your software updated with the most recent bug and security fixes is a must.
We are all too familiar with looking out over our enterprise services and feeling that sinking feeling in the pit in our stomachs when anyone even mentions the word “upgrade,” but it's paramount that updates get done in a timely manner. To make that sinking “pit of doom” feeling a little less pronounced, have an upgrade plan. You should have both a long-term and short-term upgrade plan within your organization.
What versions will you be running 3-4 months?
How about in 6-12 months?
18-24 months?
These plans should not only include your infrastructure and DevOps folks, but your development teams as well. The responsibility for maintaining Kafka infrastructure like brokers, ZooKeeper, etc., and the responsibility for maintaining consumer and producer code will probably fall across multiple teams or groups. The people who are responsible for upgrading your cluster very likely won't be responsible for maintaining your producer or consumer code. Coordinating your upgrade plans between these two groups is crucial as there can be breaking changes in Kafka versions that require changes to the producer or consumer code.
One challenge with keeping up-to-date with Kafka versions is its release cadence. With three yearly planned releases and a short window of community support (12-16 months) for each, staying on top of your Kafka upgrades can be difficult, particularly for large enterprises that must maintain hundreds or thousands of clusters. At that scale, most enterprises can't turn the ship, so to speak, that quickly. That's where a solution like OpenLogic's Kafka LTS can be helpful for patching sunsetted versions, giving teams additional time to plan and implement their upgrades on a schedule that works for them.
Getting these changes scheduled into your developer’s sprints ahead of time will take out a lot of the heartburn of upgrading your Kafka infrastructure.
4. Audit All the Things
A major pillar of any security posture is auditing. To do any auditing, we must have logs to audit. Unfortunately, logging infrastructure is an all too often overlooked step in the rollout of an inordinate amount of projects. Many times, it’s only after something goes wrong and organizations need to be able to view and correlate logs across an array of disparate and distributed systems that the necessity of centralized logging and metrics collections becomes clear.
With Kafka being an inherently distributed system, a robust approach to logging and metrics is even more critical to long-term success. Throw the Strimzi operator in the mix and the ephemeral nature of Kubernetes means there is a risk of completely losing logs altogether without centralized logging. Luckily, cloud-native solutions like Prometheus and Loki make the process a breeze. Solutions like Logstash and OpenSearch also present a low barrier to entry for a centralized logging solution. In many hosted Kafka on Kubernetes environments, like Google Cloud’s GKE, simply configuring PodMan can be enough to create a searchable, centralized collection of logs.
Regardless of the tools, once a centralized and searchable logging infrastructure is in place, organizations can easily audit things like ACL changes, authentication events, topic creation/deletion and consumer group activity.
Do More With Kafka For Less
From technical support and LTS to managed Kafka services, OpenLogic can help you successfully deploy open source Kafka for a fraction of the cost of Confluent.
5. Enable and Configure Access Control Lists
Now that we have authentication and encryption enabled as well as running the latest versions and having a solid strategy for auditing, we want to make sure that we are securing “who” (consumers and producers) is talking to "what" (topics, broker configurations, etc.).
To do this organizations should be enabling and configuring access control list (ACL). ACLs control a number of client operations, such as creating, deleting, or altering the configurations of topics, reading or writing events to a topic, and even managing the creation and deletion of ACLs for a topic. This step is a must in almost all multi-tenant environments, but even single-tenant environments will benefit from implementing ACLs.
To store ACLs, Kafka utilizes a pluggable Authorizer and an out of the box Authorizer that leverages ZooKeeper or the _cluster_ metadata topic when running KRaft mode. You can change the Authorizer in server.properties. Kafka ACLs are defined with a general format of “Identity ‘A’ is (allowed/denied) Operation ‘B’ from host ‘C’ on any resource ‘D’ matching resource pattern ‘E’." By default, if no resource pattern matches a given resource, then that resource is considered to not have any ACLs and can only be access by super users. To add, remove, or list ACLs, a Kafka authorizer CLI is provided.
6. Do the Hard Things
It’s all too easy to take shortcuts when it comes to security, but putting in the extra time and effort up front can save you time and heartache down the road.
SELinux is commonly thrown into this category. It is so tempting to just configure SELinux into permissive mode (or disable it altogether) but this hamstrings your operating systems strongest hardening tool and defense. With the ability to help protect from threats we are not even aware of, taking the time and effort to properly configure custom SELinux policies for production workloads needs to be done. This should apply to any production workload, not just Kafka.
While this can be a frustrating process, tools are available to help in configuring the proper policies to allow workloads to have access to the appropriate server resources. Tools like audit2allow, sealert and auditd logs can go a long way to eliminating that frustration. It might not be nearly as easy as just running “setenforce 0” but I know I sleep a lot better at night when “getenforce” returns “enforcing”.
7. Secure External Systems
In our increasingly interconnected world, systems are only as secure as the systems they connect to and the data/code pulled from external sources. The rise of supply chain attacks in recent years has only made this even more readily apparent. The same amount of thought and planning for security must be put into external systems accessing your Kafka clusters as well. The need for the same level of TLS encryption, authentication, authorization, and auditing enforcement should be put into your schema registry or Kafka Connect connectors as well.
8. Automate Security Scanning
Speaking of supply chain attacks, with plugin-based frameworks like Connect or Camel, automating security scanning of third-party code bases becomes increasingly important. Integrating security analysis tools into your CI/CD pipelines is an excellent way to protect your organization from these supply chain attacks. In addition to third-party products, open source tools like Open Policy Agent use automated scanning to detect security police violations in Kafka clusters as well. It can detect things like brokers not being configured for TLS, overly permissive ACLs on topics, and disabled SELinux environments, effectively catching security misconfigurations before they rollout to your enterprise.
Final Thoughts
We all know it's so much easier to not implement encryption or not implement authentication. And the fact that Kafka ships out the door with these things disabled makes not doing these things even more tempting. Running a simple one-line command to eliminate a problem caused by SELinux is so much simpler than using tools like audit2allow to craft custom policy.
Doing the hard things, however, is how we do our job correctly. Are there are use cases and contexts where we can skimp on the security details? Perhaps, but when comes to enterprise-level workloads, it's absolutely critical to take the time to do things correctly. Considering how costly a data breach can be to both a company's reputation and bottom line, it's always worth it to the do the hard things.
If your organization is lacking in high-level Kafka expertise, consider partnering with OpenLogic. Our Enterprise Architects have been providing technical support and professional services for Kafka since 2017. Additionally, we now offer long-term support for some EOL Kafka versions and a service bundle for teams that want more hands-on, continuous monitoring and assistance.
Additional Resources
- Blog - Apache Kafka vs. Confluent Kafka
- White Paper - Decision Maker's Guide to Apache Kafka
- Blog - Get Ready for Kafka 4: Changes and Upgrade Considerations
- Blog - Kafka Raft Mode: Running Kafka Without ZooKeeper
- Blog - How to Develop a Winning Kafka Partition Strategy
- Blog - Using Apache Kafka for Stream Processing
- Case Study - Credit Card Processing Company Avoids Kafka Exploit
- Training Course - Building, Maintaining, and Monitoring Applications With Apache Kafka
- Blog - Exploring Kafka Connect