Open Source Software Technical Articles

Want the Best of the Wazi Blogs Delivered Directly to your Inbox?

Subscribe to Wazi by Email

Your email:

Connect with Us!

Current Articles | RSS Feed RSS Feed

Ensure High Availability with CentOS 6 Clustering

  
  
  

Today, businesses require high availability (HA) and zero downtime for many IT services. Redundancy and failover mechanisms are essential parts of the HA infrastructure. CentOS 6 provides a reliable, flexible, and powerful clustering solution that lets you use multiple servers to avoid downtime.

CentOS's sophisticated HA suite provides all the packages you need for reliable and secure internode communication, including fencing of failed nodes. It can handle allocating complex resources and services across multiple nodes, and ensure that if a service fails, end users don't see the failure because the service is either restarted or reallocated to a different node, according to a predefined scenario.

CentOS's HA suite relies on a few key components:

    • Corosync is responsible for the clustering infrastructure, including node membership and cluster quorum.
    • RGManager manages the cluster's services, and is responsible for their state and node locations.
    • The CMAN cluster management framework is responsible for the connection between the Corosync clustering infrastructure and RGManager's service management.

CentOS's clustering components are designed to work together to ensure high availability. If you want to see how they work, here's how to set up a simple cluster with two nodes sharing an IP address. Once you know the basics, you can extend your setup with more nodes and more complicated options.

Installation

To begin, each of the nodes, which we'll call n1 and n2, should have a minimal CentOS install. In CentOS 6 you can use the small installation images CentOS-6.0-i386-minimal.iso and CentOS-6.0-x86_64-minimal.iso for 32- and 64-bit architectures. Each node must have two network interfaces (hardware or virtual); one, which is used for external and inter-node communication, is always up, while the second, used for sharing one IP address between the nodes, is set up only on the first node by default.

You must permit communication between nodes in iptables. The easiest way to do this is to add the directive -A INPUT -s IP -j ACCEPT to /etc/sysconfig/iptables before the last REJECT statements. Make sure to replace IP with the other node's IP address.

Edit the file /etc/sysconfig/selinux, set SELinux from enforcing to permissive, and reboot. You don't want to enforce SELinux in the beginning because it would prevent you from getting the HA cluster operational. Once you have the cluster running properly, you can re-enable SELinux, as described in this article.

Now you can install the packages needed for HA. Run the command yum groupinstall "High Availability" on both nodes to install the HA core packages, including cman and rgmanager. You may also want to run yum install luci on n1 to install Luci, CentOS's web interface for managing a cluster. Luci makes CentOS HA cluster administration easier because it lets you navigate through all the complex options. However, its reliability is questionable, and it's not as powerful as the command line.

Ensure that the necessary services are started by default. Run on both nodes:


chkconfig cman on
chkconfig rgmanager on
chkconfig modclusterd on
chkconfig ricci on

On the management node, run chkconfig luci on if you want to have Luci started too. You can then access Luci at https://n1's_ip_address:8084. For authentication, use the root login.

Get an Open Source Support Quote

Configuration

Before you start your new cluster you have to configure it. You can perform all the cluster configuration on node n1; it will be sent to node n2 automatically. To begin, run the Cluster Configuration System command ccs_tool create -2 hacluster. This will create the file /etc/cluster/cluster.conf with default skeleton settings for a two-node cluster.

Edit /etc/cluster/cluster.conf and replace the names of the two nodes NEEDNAME-01 and NEEDNAME-02 with the names of your nodes. It's important that these names resolve to those of their respective hosts; if you are using short names like n1 and n2 you might have to add them to /etc/hosts with their corresponding IP addresses.

The next step is to ensure that the configuration is valid. Run the command ccs_config_validate without any arguments. If you've correctly filled in the values for the hosts you should see the confirmation message Configuration validates. If you haven't, the command will report the exact error, such as Cannot find node name in cluster.conf.

Once you know the configuration is valid you can update the configuration file on all the nodes by manually copying the file /etc/cluster/cluster.conf. You'll need to do this only once, before the cluster software has been started. Once you have a working cluster, you can apply all subsequent configuration changes with the command cman_tool version -r. cman_tool will ask for the root passwords on both nodes.

Now start the cluster daemons with the command service cman start && service rgmanager start && service modclusterd start && service ricci start. To verify that both members are connected and synchronized after the change, run the command clustat. The output should be similar to this:


Cluster Status for hacluster @ Tue Nov 22 02:32:06 2011
Member Status: Quorate

Member Name ID Status
------ ---- ---- ------
10.1.1.1 1 Online, Local
10.1.1.2 2 Online

Now you have a working cluster, but no software is taking advantage of it. To set that up, add an IP address service for the cluster. This will establish its virtual IP address (VIP), which we'll set to 10.1.1.100 in our case.

Important note: Because our virtual IP address 10.1.1.100 is in the same network as our primary interfaces' IP addresses, it does not require additional configuration. However, if it were in a different network, it could not be started on the same primary interfaces and would require a separate physical or virtual interface configured especially for it. This rule applies to all cluster services, such as httpd. Before entrusting any service's management to the cluster, you have to ensure that it is properly configured, and the cluster's only task is to start and stop it.

In our simple scenario, the idea is that if the main node (n1) fails, the second node handles all traffic to the VIP. This will ensure availability across the cluster as long as critical services are mirrored on both nodes. To add a VIP address, edit the rm (resource manager) configuration in /etc/cluster/cluster.conf as follows:


< rm >
< failoverdomains/ >
< resources/ >
< service autostart="1" exclusive="0" name="IP" recovery="relocate" >
< ip address="10.1.1.100" monitor_link="on" sleeptime="10"/ >
< /service >
< /rm >

Note: Each time you make changes to the file /etc/cluster/cluster.conf, be sure you increment config_version="n". If you don't, cman_tool will not recognize the configuration file as changed and will not proceed.

After you add the new resource, service and increment config_version, apply the configuration with the command cman_tool version -r. You should be able to see the changes logged in /var/log/messages, as follows:


Nov 22 02:36:07 n1 modcluster: Updating cluster.conf
Nov 22 02:36:10 n1 corosync[1117]: [QUORUM] Members[2]: 1 2
Nov 22 02:36:19 n1 rgmanager[1423]: Reconfiguring
Nov 22 02:36:19 n1 rgmanager[1423]: Loading Service Data
Nov 22 02:36:49 n1 rgmanager[1423]: Stopping changed resources.
Nov 22 02:36:49 n1 rgmanager[1423]: Restarting changed resources.
Nov 22 02:36:49 n1 rgmanager[1423]: Starting changed resources.
Nov 22 02:36:49 n1 rgmanager[1423]: Initializing service:IP
Nov 22 02:36:52 n1 rgmanager[4129]: Removing IPv4 address 10.1.1.100/24 from eth1
Nov 22 02:37:03 n1 rgmanager[1423]: Starting stopped serviceservice:IP
Nov 22 02:37:08 n1 rgmanager[4241]: Adding IPv4 address 10.1.1.100/24 to eth1
Nov 22 02:37:16 n1 rgmanager[1423]: Service service:IP started

To verify the cluster configuration and check that the new service is running, run clustat again. You should see output similar to this:


Cluster Status for hacluster @ Tue Nov 22 03:07:50 2011
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
10.1.1.1 1 Online, Local, rgmanager
10.1.1.2 2 Online, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:IP 10.1.1.1 started

As the above output shows, the VIP is allocated to 10.1.1.1 (n1). If you want to reallocate the VIP, use the command clusvcadm -r IP -m 10.1.1.2, where IP is the service and 10.1.1.2 is the node IP or hostname that you want to serve as your primary node.

Congratulations – you now have a working cluster with a simple configuration that provides basic HA. If the active node goes down and is unreachable, the VIP is automatically reallocated to the second node.

HA Troubleshooting

At least that's the theory. You may experience some issues, and when you do, you'll find that troubleshooting HA clusters is a complex task because of the way they work – various daemons and kernel modules are responsible for different HA tasks, and each of these essential parts of the cluster can cause problems.

Checking logs can provide some clues when things go wrong. By default, all cluster-related messages go to /var/log/messages. These simple messages give basic information about major changes in the cluster, such as starting, stopping, or moving a service. This log also collects errors about problems with cluster communication.

You can find additional logs in the directory /var/log/cluster/. Each of the cluster's components has a log file; the most important ones are:

    • corosync.log, which provides information about intercluster communication and synchronization. You can also use the command corosync-objctl without any arguments to show the cluster's configuration in full detail.
    • rgmanager.log, which lists information related to the cluster's services.
    • fenced.log, which provides information about fencing nodes. This log is not very detailed by default. It's better to use the command fence_tool dump when debugging fenced daemon problems.

Conclusion

We created a simple cluster here, but using CentOS's HA cluster is not always this straightforward. If you want to support complex services such as the Global File System (GFS) and provide reliable fencing, things can get much more complicated. As you explore advanced options, you can find more information in the Red Hat clustering documentation.




This work is licensed under a Creative Commons Attribution 3.0 Unported License
Creative Commons License.


This work is licensed under a Creative Commons Attribution 3.0 Unported License
Creative Commons License.

Comments

hi, it is a good tutorial. i have only a doubt. How to setup a virtual ip? ty
Posted @ Wednesday, December 19, 2012 2:04 PM by Lucas
I already finish the installation and it looks to be working fine, but when I set off the owner cluster, it goes offline and does not made the change. Can someone help me please ?
Posted @ Wednesday, April 17, 2013 6:47 PM by Marcos Castro
Hello,  
 
Thanks a lot for the article ! 
 
As, explained my configuration is up now ! but, on luci gui i dont see the cluster. 
Could you please suggest? 
 
Thanks !
Posted @ Saturday, August 31, 2013 6:32 AM by Sumit
This is exactly what I was looking for. The upgrade to RHEL 6.4 forced cman and support for previous solutions disappeared. This was a simple push in the right direction. Thanks!
Posted @ Friday, September 20, 2013 10:29 PM by Aaron
the failoverdomains section <rm> of the cluster.conf file is wrong as shown. 
you have to close it with  
< rm > 
< failoverdomains/ > 
< resources/ > 
< service autostart="1" exclusive="0" name="IP" recovery="relocate" > 
< ip address="10.1.1.100" monitor_link="on" sleeptime="10"/ > 
< /service > 
</resources> 
</failoverdomains> 
< /rm > 
 
Posted @ Tuesday, October 29, 2013 9:40 AM by Rob
Thanks for posting this article, the explanation is simple and easy to understand.  
Can we expect more details on managing clusters using luci.
Posted @ Monday, November 25, 2013 2:16 AM by Naveen
Good Article.
Posted @ Wednesday, April 23, 2014 7:39 AM by Mahesh
An updated tutorial with a little more background is available here: http://www.bigthinkingapplied.com/creating-a-ha-cluster-with-red-hat-cluster-suite-part-2/
Posted @ Wednesday, July 16, 2014 6:27 PM by nneko
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics