Blog

April 27, 2023

Kafka MirrorMaker Overview: Key Features and Benefits

Apache Kafka

Kafka MirrorMaker is a tool for mirroring data between Apache Kafka clusters. It leverages the Kafka Connect framework to replicate data, which improves resiliency.

In this blog, one of our Kafka experts explains how MirrorMaker works and its key features, as well as how to set up MirrorMaker 2 (MM2), and why teams should consider using MirrorMaker in their Kafka deployments.

What Is Kafka MirrorMaker?
How Kafka MirrorMaker Works
MirrorMaker 2 Key Features and Capabilities
How to Set Up MirrorMaker 2
Final Thoughts

What Is Kafka MirrorMaker?

MirrorMaker is tool in Apache Kafka that simplifies data mirroring (or topics replication) from one Kafka cluster (the source) to another (the destination).

MirrorMaker 2 (MM2) became available in Kafka 2.4.0 (see KIP-382) and leverages the Kafka Connect framework to simplify its configuration and operations. The Connect framework and ecosystem essentially acts as a basement layer for MM2 operations; MM2 extends its existing functionality.

While the Connect framework was built to enable data streaming from Kafka to other data systems, in the case of MM2, the framework facilitates data mirroring in cross-Kafka operations.

How Kafka MirrorMaker Works

According to the official documentation, MirrorMaker "uses a Kafka consumer to consume messages from the source cluster, and then re-publishes those messages to the local (target) cluster using an embedded Kafka producer."

Connectors

MirrorMaker 2 relies on a collection of standard Kafka Connect connectors. Each of the connectors has its own role. Here is a list of connectors and what they do:

1. MirrorSourceConnector: Replicates topics, topic ACLs & configs of a single source cluster and emits offset syncs.

2. MirrorSinkConnector: Consumes from the primary cluster and replicate topics to a single target cluster.

3. MirrorCheckpointConnector: Emits consumer offset checkpoints.

4. MirrorHeartBeatConnector: Checks connectivity between clusters.

Depending on the chosen mode to run MirrorMaker, an administrator may or may not be required to explicitly configure the connectors.

Running Modes

Dedicated MirrorMaker cluster: This mode is also called driver mode. This is the easiest production-ready mode to implement as it requires no previous knowledge of the Kafka Connect framework.
MirrorMaker in a Connect cluster: This mode is suitable for situations when a previously deployed Connect cluster already exists in an organization.
Standalone MirrorMaker connector: This mode is good for testing a single Connect worker.

MirrorMaker 2 Key Features and Capabilities

MirrorMaker 2 improves upon MirrorMaker 1 (MM1) in several ways. The built-in high-level driver in MirrorMaker 2 significantly simplifies initial configuration and connectors management in a dedicated cluster. The auto-discovery feature also allows new topics and partitions to be automatically detected. It takes a bit of time, but eventually the new topic gets replicated.

MirrorMaker 2 also automatically syncs topic configuration between clusters, including ACLs. This is a change from MM1 where topics merely picked the default cluster configuration settings.

MirrorMaker 2 supports various topologies and data flows including “active/active" or “active/passive" cluster pairs, cross-datacenter replication, aggregation, and more.

In MirrorMaker 2, new metrics were added, including end-to-end replication latency across multiple clusters (Connect Monitoring). Customer offsets now get synced in MM2 as well (KIP-545), which allows consumer migration between clusters. In MM1, the clusters were loosely coupled, making this “customer transfer” scenario not possible.

The naming of a topic's source vs. destination was also changed in MM2. Unlike in MM1, where topics had the same name, a prefix now gets assigned to the replicated topics in MM2. However, it is worth mentioning, should one wish to replicate MM1's naming behavior, this is still possible.

Finally, MM2 offers a significantly reduced rebalance count.

The Decision Maker's Guide to Apache Kafka
Our new white paper covers everything you need to know to successfully implement and optimize Kafka in any environment. Get expert guidance on configuration strategies, security best practices, Kafka on Kubernetes, and more to leverage your streaming data insights.
Get My Copy

How to Set Up MirrorMaker 2

As previously mentioned, MirrorMaker 2 can be run in different modes, and the easiest to configure and proof-of-concept test is the “dedicated cluster” mode. This mode can be tested in a lab environment either using Docker and containers or virtual machines. I prefer the latter, so that's what I will be describing here. To get started:

1. Install Oracle VM VirtualBox and Vagrant on a test machine.

2. Using Vagrant, deploy 3 virtual machines:

dcA (data center A) / ip 192.168.51.101 / Zookeeper and Kafka services.
dcB (data center B) / ip 192.168.51.102 / Zookeeper and Kafka services.
mm2 / ip 192.168.51.103 / MirrorMaker 2.

3. Browse to the folder containing the Vagrant script file and execute “vagrant run && vagrant reload.” This process will take some time.

4. Configure MirrorMaker 2.

vagrant ssh mm2
Run the following 6 commands.

cd /opt/kafka_2.12-3.4.0/config/

sudo sed -i ‘26 s/A_host1:9092, A_host2:9092, A_host3:9092/192.168.51.101:9092/’ connect-mirror-maker.properties

sudo sed -i ‘27 s/B_host1:9092, B_host2:9092, B_host3:9092/192.168.51.102:9092/’ connect-mirror-maker.properties

sudo sed -i ‘35 s/true/false/’ connect-mirror-maker.properties

echo -e "192.168.51.101 dcA\n192.168.51.102 dcB" | sudo tee -a /etc/hosts

echo "replication.policy.class=org.apache.kafka.connect.mirror.IdentityReplicationPolicy" | sudo tee -a connect-mirror-maker.properties

The final configuration will look like this:

5. Start MirrorMaker process.

6. Create a test topic “testtopic” in dcA, populate it with some data and let it replicate to dcB. Please take note the topic name will be the same dcA vs dcB since we’ve configured “IdentityReplicationPolicy” in the step above. Commands:

vagrant ssh dcA
kafka-topics.sh --bootstrap-server localhost:9092 --topic testtopic --create --partitions 1 --replication-factor 1

kafka-verifiable-producer.sh --bootstrap-server localhost:9092 --throughput 5 --max-messages 10000 --topic testtopic

kafka-verifiable-producer will start generating 5 messages per second until 10000 messages have been reached.

7. Connect to dcB to monitor MirrorMaker2 has replicated the topic (takes some time) and is replicating the topic messages right now. Commands:

vagrant ssh dcB
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic testtopic

Now we can see the replication is happening.

Final Thoughts

There are several other replication tools in the market that cover similar or identical scenarios as MirrorMaker 2. MM2, however, comes bundled with the original Kafka installation package, is free, and is backed by Apache community support. The learning curve is neither long nor complicated, meaning for teams with Kafka deployments, using MirrorMaker will make the most sense.

Need Help With Your Kafka Deployments?
OpenLogic experts have extensive experience supporting Kafka in enterprise applications. If you're paying for commercial Kafka, we can help you migrate off and manage your open source implementation. We also offer standalone technical support and LTS for end-of-life Kafka.
Talk to Us

Additional Resources

Blog - Apache Kafka vs. Confluent Kafka
Video - Running Kafka on K8s With Strimzi
Blog - Top 5 Kafka Security Best Practices
Blog - Using Apache Kafka for Stream Processing
Blog - Deploying Kafka on Kubernetes
On-Demand Webinar - Harnessing Streaming Data with Apache Kafka
Guide - Apache Kafka 101
White Paper - The New Stack: Cassandra, Kafka, and Spark

Featured Product

Kafka Service Bundle

Services

Training

Taking an Open Source Approach to Big Data Management