Current Articles | RSS Feed
Today, businesses require high availability (HA) and zero downtime for many IT services. Redundancy and failover mechanisms are essential parts of the HA infrastructure. CentOS 6 provides a reliable, flexible, and powerful clustering solution that lets you use multiple servers to avoid downtime.
CentOS's sophisticated HA suite provides all the packages you need for reliable and secure internode communication, including fencing of failed nodes. It can handle allocating complex resources and services across multiple nodes, and ensure that if a service fails, end users don't see the failure because the service is either restarted or reallocated to a different node, according to a predefined scenario.
CentOS's HA suite relies on a few key components:
CentOS's clustering components are designed to work together to ensure high availability. If you want to see how they work, here's how to set up a simple cluster with two nodes sharing an IP address. Once you know the basics, you can extend your setup with more nodes and more complicated options.
To begin, each of the nodes, which we'll call n1 and n2, should have a minimal CentOS install. In CentOS 6 you can use the small installation images CentOS-6.0-i386-minimal.iso and CentOS-6.0-x86_64-minimal.iso for 32- and 64-bit architectures. Each node must have two network interfaces (hardware or virtual); one, which is used for external and inter-node communication, is always up, while the second, used for sharing one IP address between the nodes, is set up only on the first node by default.
You must permit communication between nodes in iptables. The easiest way to do this is to add the directive -A INPUT -s IP -j ACCEPT to /etc/sysconfig/iptables before the last REJECT statements. Make sure to replace IP with the other node's IP address.
-A INPUT -s IP -j ACCEPT
Edit the file /etc/sysconfig/selinux, set SELinux from enforcing to permissive, and reboot. You don't want to enforce SELinux in the beginning because it would prevent you from getting the HA cluster operational. Once you have the cluster running properly, you can re-enable SELinux, as described in this article.
enforcing
permissive
Now you can install the packages needed for HA. Run the command yum groupinstall "High Availability" on both nodes to install the HA core packages, including cman and rgmanager. You may also want to run yum install luci on n1 to install Luci, CentOS's web interface for managing a cluster. Luci makes CentOS HA cluster administration easier because it lets you navigate through all the complex options. However, its reliability is questionable, and it's not as powerful as the command line.
yum groupinstall "High Availability"
cman
rgmanager
yum install luci
Ensure that the necessary services are started by default. Run on both nodes:
chkconfig cman onchkconfig rgmanager onchkconfig modclusterd onchkconfig ricci on
On the management node, run chkconfig luci on if you want to have Luci started too. You can then access Luci at https://n1's_ip_address:8084. For authentication, use the root login.
chkconfig luci on
Before you start your new cluster you have to configure it. You can perform all the cluster configuration on node n1; it will be sent to node n2 automatically. To begin, run the Cluster Configuration System command ccs_tool create -2 hacluster. This will create the file /etc/cluster/cluster.conf with default skeleton settings for a two-node cluster.
ccs_tool create -2 hacluster
Edit /etc/cluster/cluster.conf and replace the names of the two nodes NEEDNAME-01 and NEEDNAME-02 with the names of your nodes. It's important that these names resolve to those of their respective hosts; if you are using short names like n1 and n2 you might have to add them to /etc/hosts with their corresponding IP addresses.
The next step is to ensure that the configuration is valid. Run the command ccs_config_validate without any arguments. If you've correctly filled in the values for the hosts you should see the confirmation message Configuration validates. If you haven't, the command will report the exact error, such as Cannot find node name in cluster.conf.
ccs_config_validate
Configuration validates
Cannot find node name in cluster.conf
Once you know the configuration is valid you can update the configuration file on all the nodes by manually copying the file /etc/cluster/cluster.conf. You'll need to do this only once, before the cluster software has been started. Once you have a working cluster, you can apply all subsequent configuration changes with the command cman_tool version -r. cman_tool will ask for the root passwords on both nodes.
cman_tool version -r
cman_tool
Now start the cluster daemons with the command service cman start && service rgmanager start && service modclusterd start && service ricci start. To verify that both members are connected and synchronized after the change, run the command clustat. The output should be similar to this:
service cman start && service rgmanager start && service modclusterd start && service ricci start
clustat
Cluster Status for hacluster @ Tue Nov 22 02:32:06 2011Member Status: Quorate Member Name ID Status ------ ---- ---- ------ 10.1.1.1 1 Online, Local 10.1.1.2 2 Online
Now you have a working cluster, but no software is taking advantage of it. To set that up, add an IP address service for the cluster. This will establish its virtual IP address (VIP), which we'll set to 10.1.1.100 in our case.
Important note: Because our virtual IP address 10.1.1.100 is in the same network as our primary interfaces' IP addresses, it does not require additional configuration. However, if it were in a different network, it could not be started on the same primary interfaces and would require a separate physical or virtual interface configured especially for it. This rule applies to all cluster services, such as httpd. Before entrusting any service's management to the cluster, you have to ensure that it is properly configured, and the cluster's only task is to start and stop it.
In our simple scenario, the idea is that if the main node (n1) fails, the second node handles all traffic to the VIP. This will ensure availability across the cluster as long as critical services are mirrored on both nodes. To add a VIP address, edit the rm (resource manager) configuration in /etc/cluster/cluster.conf as follows:
rm
< rm > < failoverdomains/ > < resources/ > < service autostart="1" exclusive="0" name="IP" recovery="relocate" > < ip address="10.1.1.100" monitor_link="on" sleeptime="10"/ > < /service >< /rm >
Note: Each time you make changes to the file /etc/cluster/cluster.conf, be sure you increment config_version="n". If you don't, cman_tool will not recognize the configuration file as changed and will not proceed.
config_version="n"
After you add the new resource, service and increment config_version, apply the configuration with the command cman_tool version -r. You should be able to see the changes logged in /var/log/messages, as follows:
config_version
Nov 22 02:36:07 n1 modcluster: Updating cluster.confNov 22 02:36:10 n1 corosync[1117]: [QUORUM] Members[2]: 1 2Nov 22 02:36:19 n1 rgmanager[1423]: ReconfiguringNov 22 02:36:19 n1 rgmanager[1423]: Loading Service DataNov 22 02:36:49 n1 rgmanager[1423]: Stopping changed resources.Nov 22 02:36:49 n1 rgmanager[1423]: Restarting changed resources.Nov 22 02:36:49 n1 rgmanager[1423]: Starting changed resources.Nov 22 02:36:49 n1 rgmanager[1423]: Initializing service:IPNov 22 02:36:52 n1 rgmanager[4129]: Removing IPv4 address 10.1.1.100/24 from eth1Nov 22 02:37:03 n1 rgmanager[1423]: Starting stopped serviceservice:IPNov 22 02:37:08 n1 rgmanager[4241]: Adding IPv4 address 10.1.1.100/24 to eth1Nov 22 02:37:16 n1 rgmanager[1423]: Service service:IP started
To verify the cluster configuration and check that the new service is running, run clustat again. You should see output similar to this:
Cluster Status for hacluster @ Tue Nov 22 03:07:50 2011Member Status: Quorate Member Name ID Status ------ ---- ---- ------ 10.1.1.1 1 Online, Local, rgmanager 10.1.1.2 2 Online, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:IP 10.1.1.1 started
As the above output shows, the VIP is allocated to 10.1.1.1 (n1). If you want to reallocate the VIP, use the command clusvcadm -r IP -m 10.1.1.2, where IP is the service and 10.1.1.2 is the node IP or hostname that you want to serve as your primary node.
clusvcadm -r IP -m 10.1.1.2
Congratulations – you now have a working cluster with a simple configuration that provides basic HA. If the active node goes down and is unreachable, the VIP is automatically reallocated to the second node.
At least that's the theory. You may experience some issues, and when you do, you'll find that troubleshooting HA clusters is a complex task because of the way they work – various daemons and kernel modules are responsible for different HA tasks, and each of these essential parts of the cluster can cause problems.
Checking logs can provide some clues when things go wrong. By default, all cluster-related messages go to /var/log/messages. These simple messages give basic information about major changes in the cluster, such as starting, stopping, or moving a service. This log also collects errors about problems with cluster communication.
You can find additional logs in the directory /var/log/cluster/. Each of the cluster's components has a log file; the most important ones are:
corosync-objctl
fence_tool dump
We created a simple cluster here, but using CentOS's HA cluster is not always this straightforward. If you want to support complex services such as the Global File System (GFS) and provide reliable fencing, things can get much more complicated. As you explore advanced options, you can find more information in the Red Hat clustering documentation.
Allowed tags: <a> link, <b> bold, <i> italics