An Enterprise Apache Tomcat Clustering Guide - Part 2
Editor's note October 2015: We recently created a white paper - Increasing system availability by leveraging Apache Tomcat clustering that covers all three parts of the series with a few updates to the content, be sure to take a look!
This post is part 2 of a 3 part blog series that will look at leveraging Apache Tomcat clustering, in order to increase your system’s availability.
What are your options?
In determining your configuration you must evaluate the resources at hand. This section will discuss possible options for your resources, without actually taking your resources into consideration. The next section will make suggestions as to which configurations your company may leverage depending on the resources available.
Vertical / Horizontal / Hybrid Clusters
A “Vertical” cluster expands vertically. A “Horizontal” cluster expands, you guessed it, horizontally. What does this mean? A vertically expanding cluster has limited horizontal layout. Horizontal layout would be multiple systems / resources. A vertical cluster is on a single machine. (A machine can be many things, including a physical device or a virtual host.) As need increases, Tomcat instances are spawned on the same machine, using configuration tweaks that allow multiple instances to run on the same system.
A horizontal cluster contains Tomcat instances running on separate machines. If demand for processing increases and you had a pure, horizontal cluster configuration, the network technician (or you) would install a new machine, virtual or physical, and on that machine is a new Tomcat instance. What is on the Tomcat instance is discussed in the next paragraph.
Real life is often very different from dictate. Companies rarely have a pure “Horizontal” or “Vertical” cluster configuration. Most systems are hybrids. A hybrid cluster is a mixture of vertical and horizontal clustering to facilitate a specific need and/or to match the hardware provided.
Homogeneous / Heterogeneous
Is your setup going to be for multiple applications, or just a few, or maybe just one? Do you have applications that require specific hardware? This determines whether or not you decide to use a heterogeneous vs. homogeneous setup. While this section defines your options, the next section will help you decide which option suits your needs.
A homogeneous setup is very common. Companies will often duplicate their Tomcat environment, launching servers on many devices with a simple copy of the Tomcat directory. A Tomcat cluster that has the same web applications deployed on all nodes is considered homogeneous.
Homogeneous setups can be hard to keep truly identical. Sometimes, especially after node failure and replacement, it can be hard to synchronize the Tomcat instances. The best way to do this is to create an image of the Tomcat setup from a node designated as the “primary” node. As long as this image stays up-to-date you can distribute it over as many Tomcat setups as you prefer.
Load balancing happens outside of the Tomcat cluster. The broad scope of load balancing will not be touched in this document. We are concerned with Apache Httpd server and the built in load balancing / gateway features, as this is free and available, and because of this, it is a common solution in many enterprise systems.
To use Apache Httpd as a load balancer we will configure it as a gateway. Once it is aware of it’s “nodes”, it is able to balance traffic across these nodes. In part 3 of this blog we will show an example configuration, using mod_proxy_ajp, of a Httpd gateway with “Round Robin” load balancing.
Another common enterprise configuration option for load balancing is the hardware load balancer. A hardware load balancer (HLB.) performs the same tasks as a software balancer (like the one in the Apache Httpd server.) The main difference between a software balancer and a hardware balance (besides price,) is resources. A HLB. has dedicated hardware resources (RAM, Processor, Network adapters, etc..) This allows hardware balancers to perform at a much more efficient rate, while providing more features. This is also an infinitely more expensive method, as you can find many free Open Source load balancing solutions.
What options fit my organization and our available resources?
As you can see from this blog, there are many factors in determining your cluster configuration. When choosing how to configure your cluster you must examine the resources available.
How do you choose your scalability options? This relies heavily on the availability of hardware resources. For instance, you have 4 low end servers, meaning they have 1 processor with 1 - 4 gigabytes of RAM. This would be an ideal situation for a “Horizontal” cluster. Each member of the cluster would be able to run 1 instance of Tomcat efficiently. One of the servers could be used as a balancer running Apache Httpd server. Here is a drawing of the architecture.
If your situation was a bit different, and you had better servers, you could consider a hybrid cluster. If there are servers available with 2 or more processors and a large amount of ram (8 gigabytes or more) this would be ideal for multiple Tomcat instances. In this configuration you can setup a hybrid cluster by running multiple instances of Tomcat on multiple machines, and multiple instances of Apache Httpd to handle the load of load balancing. This configuration could look something like this:
Heterogeneous / Homogeneous Tomcat Configuration
Determining heterogeneous vs. homogeneous setup can be simple in some situations. The easiest situation is one with a single web application. If there is only one application to deploy, you deploy it to all members in the cluster. This is a very straightforward homogeneous configuration.
Unfortunately most companies do not have one single web application, however, this situation is not overly complicated, unless the company has an extremely large number of applications.
The division of your applications over Tomcat nodes will be your choice. The Tomcat configuration of the nodes will be a little more complicated and we will discuss this in part 3 of the blog. If the company has an application that requires heavy processing and large amounts of ram (HPR1,) you can setup this application on two nodes by itself. After this, take the remaining applications (GUI) and place them on two different nodes in the cluster. This will prevent the GUI application from being bogged down when HPR1 is consuming the CPU and RAM. This cluster might look like this:
There are many things to take into consideration when designing and building your cluster. If a large company is relying on you to provide a reliable, highly available application implementation, clustering / load balancing is the right choice. If you are new to clustering, or an old hand, purchase a support contract. There are companies that will provide Open Source Software support for a nominal fee. Companies like OpenLogic provide 24x7 support for your Tomcat and Apache Httpd configuration. This will allow you to offer your customers an extremely reliable, available service while at the same time, providing someone to turn to if you run into problems.
Need help with your open source stack? Download this infographic.