Kafka MirrorMaker Overview: Key Features and Benefits
Kafka MirrorMaker is a tool for mirroring data between Apache Kafka clusters. It leverages the Kafka Connect framework to replicate data, which improves resiliency.
In this blog, one of our Kafka experts explains how MirrorMaker works and its key features, as well as how to set up MirrorMaker 2 (MM2), and why teams should consider using MirrorMaker in their Kafka deployments.
- What Is Kafka MirrorMaker?
- How Kafka MirrorMaker Works
- MirrorMaker 2 Key Features and Capabilities
- How to Set Up MirrorMaker 2
- Final Thoughts
What Is Kafka MirrorMaker?
MirrorMaker is tool in Apache Kafka that simplifies data mirroring (or topics replication) from one Kafka cluster (the source) to another (the destination).
MirrorMaker 2 (MM2) became available in Kafka 2.4.0 (see KIP-382) and leverages the Kafka Connect framework to simplify its configuration and operations. The Connect framework and ecosystem essentially acts as a basement layer for MM2 operations; MM2 extends its existing functionality.
While the Connect framework was built to enable data streaming from Kafka to other data systems, in the case of MM2, the framework facilitates data mirroring in cross-Kafka operations.
How Kafka MirrorMaker Works
According to the official documentation, MirrorMaker "uses a Kafka consumer to consume messages from the source cluster, and then re-publishes those messages to the local (target) cluster using an embedded Kafka producer."
MirrorMaker 2 relies on a collection of standard Kafka Connect connectors. Each of the connectors has its own role. Here is a list of connectors and what they do:
1. MirrorSourceConnector: Replicates topics, topic ACLs & configs of a single source cluster and emits offset syncs.
2. MirrorSinkConnector: Consumes from the primary cluster and replicate topics to a single target cluster.
3. MirrorCheckpointConnector: Emits consumer offset checkpoints.
4. MirrorHeartBeatConnector: Checks connectivity between clusters.
Depending on the the chosen mode to run MirrorMaker, an administrator may or may not be required to explicitly configure the connectors.
- Dedicated MirrorMaker cluster: This mode is also called driver mode. This is the easiest production-ready mode to implement as it requires no previous knowledge of the Kafka Connect framework.
- MirrorMaker in a Connect cluster: This mode is suitable for situations when a previously deployed Connect cluster already exists in an organization.
- Standalone MirrorMaker connector: This mode is good for testing a single Connect worker.
MirrorMaker 2 Key Features and Capabilities
MirrorMaker 2 improves upon MirrorMaker 1 (MM1) in several ways. The built-in high-level driver in MirrorMaker 2 significantly simplifies initial configuration and connectors management in a dedicated cluster. The auto-discovery feature also allows new topics and partitions to be automatically detected. It takes a bit of time, but eventually the new topic gets replicated.
MirrorMaker 2 also automatically syncs topic configuration between clusters, including ACLs. This is a change from MM1 where topics merely picked the default cluster configuration settings.
MirrorMaker 2 supports various topologies and data flows including “active/active" or “active/passive" cluster pairs, cross-datacenter replication, aggregation, and more.
In MirrorMaker 2, new metrics were added, including end-to-end replication latency across multiple clusters (Connect Monitoring). Customer offsets now get synced in MM2 as well (KIP-545), which allows consumer migration between clusters. In MM1, the clusters were loosely coupled, making this “customer transfer” scenario not possible.
The naming of a topic's source vs. destination was also changed in MM2. Unlike in MM1, where topics had the same name, a prefix now gets assigned to the replicated topics in MM2. However, it is worth mentioning, should one wish to replicate MM1's naming behavior, this is still possible.
Finally, MM2 offers a significantly reduced rebalance count.
Download the Decision Maker's Guide to Apache Kafka
Our new white paper covers everything you need to know to successfully implement and optimize Kafka in your organization, including configuration strategies, security best practices, and more.
How to Set Up MirrorMaker 2
As previously mentioned, MirrorMaker 2 can be run in different modes, and the easiest to configure and proof-of-concept test is the “dedicated cluster” mode. This mode can be tested in a lab environment either using Docker and containers or virtual machines. I prefer the latter, so that's what I will be describing here. To get started:
1. Install Oracle VM VirtualBox and Vagrant on a test machine.
2. Using Vagrant, deploy 3 virtual machines:
- dcA (data center A) / ip 192.168.51.101 / Zookeeper and Kafka services.
- dcB (data center B) / ip 192.168.51.102 / Zookeeper and Kafka services.
- mm2 / ip 192.168.51.103 / MirrorMaker 2.
3. Browse to the folder containing the Vagrant script file and execute “vagrant run && vagrant reload.” This process will take some time.
4. Configure MirrorMaker 2.
- vagrant ssh mm2
- Run the following 6 commands.
sudo sed -i ‘26 s/A_host1:9092, A_host2:9092, A_host3:9092/192.168.51.101:9092/’ connect-mirror-maker.properties
sudo sed -i ‘27 s/B_host1:9092, B_host2:9092, B_host3:9092/192.168.51.102:9092/’ connect-mirror-maker.properties
sudo sed -i ‘35 s/true/false/’ connect-mirror-maker.properties
echo -e "192.168.51.101 dcA\n192.168.51.102 dcB" | sudo tee -a /etc/hosts
echo "replication.policy.class=org.apache.kafka.connect.mirror.IdentityReplicationPolicy" | sudo tee -a connect-mirror-maker.properties
The final configuration will look like this:
5. Start MirrorMaker process.
6. Create a test topic “testtopic” in dcA, populate it with some data and let it replicate to dcB. Please take note the topic name will be the same dcA vs dcB since we’ve configured “IdentityReplicationPolicy” in the step above. Commands:
- vagrant ssh dcA
- kafka-topics.sh --bootstrap-server localhost:9092 --topic testtopic --create --partitions 1 --replication-factor 1
- kafka-verifiable-producer.sh --bootstrap-server localhost:9092 --throughput 5 --max-messages 10000 --topic testtopic
kafka-verifiable-producer will start generating 5 messages per second until 10000 messages have been reached.
7. Connect to dcB to monitor MirrorMaker2 has replicated the topic (takes some time) and is replicating the topic messages right now. Commands:
- vagrant ssh dcB
- kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic testtopic
Now we can see the replication is happening.
There are several other replication tools in the market that cover similar or identical scenarios as MirrorMaker 2. MM2, however, comes bundled with the original Kafka installation package, is free, and is backed by Apache community support. The learning curve is neither long nor complicated, meaning for teams with Kafka deployments, using MirrorMaker will make the most sense.
Need Help With Your Kafka Deployments?
OpenLogic's team of Enterprise Architects have extensive experience supporting Kafka in enterprise applications. Our technical support includes unlimited tickets and up to 24/7/365 SLAs.
- Blog - Top 5 Kafka Security Best Practices
- Blog - Using Apache Kafka for Stream Processing
- Blog - Deploying Kafka on Kubernetes
- On-Demand Webinar - Harnessing Streaming Data with Apache Kafka
- Guide - Apache Kafka 101
- White Paper - The New Stack: Cassandra, Kafka, and Spark