Architectural Breakdown: Transitioning Monolith to Kubernetes
Today, we’ll take a look at a real example of how we can use technologies like Camel, Kafka, Spark, Cassandra, and Kubernetes to build an application ecosystem where the remnants of monoliths we tear down are first class citizens that can continue to serve the business in a cost-effective manner while uplifting developers into DevOps enlightenment. Whoa—déjà vu! It’s not a glitch in the Matrix; it’s part two of my blog series on a Cloud Native, DevOps state of mind. Part one is available here.
Before we can tear down the monolith, we must first understand the monolith. I recently completed the migration of a Colorado-based company’s inventory management and chain of custody software. I wrote the first architecture and the proof of concept for this software, and when I sat down to work with the team on rearchitecting for Kubernetes (OpenShift Origin, specifically), I was very fortunate to not have to go through the excruciating detail of documenting and understanding the application at the level that’s really required for this kind of project, as I had been involved since the inception. Not that that’s not fun! It just meant I could jack straight in to the most fun part: writing the new hotness.
Figure 1: Old and Busted (The Monolith Architecture)
This complex architecture represents a Line of Business software developed for a highly regulated industry that needed to meet pharmaceutical-grade chain-of-custody requirements in the United States.
The LOB application was developed with a combination of Camel and SwitchYard as a message-driven app that integrated into KIEDrools and jBPM for Human Task Orchestration. Zabbix was used to monitor the system. Kibana was used to report on the eventually consistent data journaled to Elasticsearch, while the ACID-transacted “live” state of the system was persisted to PostgreSQL. WildFly, the open source version of JBoss, hosted all of the Java applications. The community editions of all of these products were used, in fact.
All of these services were highly available, and most were available as Active/Active services. Camel and SwitchYard were stateless, KIE jBPM was configured with ZooKeeper for replication of the GIT filesystem, and Infinispan was used to cluster the session information. PostgreSQL was managed by PGpool-II, and the native Elasticsearch clustering was utilized for Elasticsearch.
However, because the LOBAPP deployment in WildFly was collocated with the KIE jBPM Workbench deployment, the high availability was performed in a Hot Active/Passive manner, guaranteeing uptime, but not reaping any of the rewards of running an n-tier, or even load balanced, system. To avoid bottlenecks, customers were given two VMs per deployment. This further impacted resources and costs for the LOBAPP, which was sold as a SaaS. This hurt the company that was developing the software, as stakeholders questioned the ability to scale the model.
Nginx was used to create path-based load-balancers internal to the services on the VM. Puppet and Foreman would deploy this incredibly complex VM. The system ran well and didn’t require a lot of intervention. It was clean and easy to troubleshoot. However, the biggest problem was resource utilization and time-to-live for new customers. Sales people had to wait up to an hour for a new instance to finish provisioning. A miracle in 2005, but in 2015 a nightmare, as teams would prep a day in advance of a meeting just to make sure the environment was ready for a demo, eating up even more resources on the private cloud. This SaaS offering was eating too much of its margin on this expensive per-customer deployment. The decision to migrate to a multi-tenant microservice mesh would greatly improve the scalability of this system. Additionally, this wouldn’t require a major rewrite, especially because of the message-driven nature of the application.
Let’s walk step by step through the architectural decisions that were made when migrating the application from a VM monolith to a Kubernetes multi-tenant microservice.
Architectural maze running
Early on in the architecture process, we decided that because of the messaging-driven nature of the application, the microservices architecture should look exactly like one of the VMs, but exploded:
The WAR deployment methodology should be similarly exploded, breaking the services apart by domain, and interconnecting them by adding a message broker like ActiveMQ. Calls to HTTPS endpoints would be handled by a service mesh, like Istio.
How could we say that with confidence, though? Architecturally, we understood two things as absolutes: All external “client” requests were being made over HTTPS (the global DDOS prevention service and datacenter load balancer had two rules: 80, and 443), and there was a service endpoint connected to JMS for each action on the LOB APIs. The first recommendation for you, the brave cyber pioneer tearing down the monolith and achieving Cloud Native zen, is to be able to enumerate the services on your monolith externally and “internally.” Externally is easier because you can ask the firewall team, but internally is a bit more difficult. How does Module A in your service communicate with Module B? If it’s not using HTTPS, is it using a portable protocol that’s going to be compatible with your service mesh? Can it be encapsulated in HTTPS, perhaps over websocket?
Because the application had already implemented JWT and SSO correctly (through Keycloak), implementing multi-tenancy wasn’t an issue as far as authentication was concerned. All of the customers already came through url.tld/app/customerinstance to be routed to their VM. A soft-multitenancy model could continue to be supported in this way. The global VIP outside of the private datacenters could load balance traffic to a geographically appropriate DC or a hot-spare DC.
That’s where this exercise can get exciting, because in the case of this application, the monolith deconstructs pretty gracefully. But what if the “WildFly / KIE jBPM / Drools” aspect wasn’t something that could be containerized or even virtualized? What if it was an AS/400 mainframe? This is where the abstraction layers in Camel can get exciting for your team: extract as much as possible away from the monolith, so that you can n-tier and elasticize the workload. If you can’t figure out how to do that, or want a second set of eyes, we’d love to talk.
Next up in the blog series, we’ll modernize the architecture that I developed previously and deploy it via Ansible. Spark, Kafka, and Cassandra will replace Elasticsearch, PostgreSQL, and ActiveMQ, and we’ll throw in a touch of Grafana and Prometheus as well!
OpenLogic architects are available to assist you with this and other popular open source solutions!