provides software and services that enable enterprises
Live Chat 1-888-673-6564

Open Source Software Technical Articles

  • Home
  • Search
  • Contact Us
  • Products and Support
  • Services
  • Enterprise OSS Blog
  • Wazi Technical Blog
  • About Wazi
  • Attributions and Licensing
  • Supply Chain Compliance
  • How to Contribute
  • Contributors
  • Resources Library
  • Cloud Services
  • Partners
  • Customers
  • Community
  • Company
  • Careers
  • News and Events

Subscribe to Wazi by Email

Your email:


Enterprise Developer Support 24 x 7, Get a Support Quote Now!


click-here-to-chat-with-an-online-representative

download-oss-discovery

Latest Posts

  • Use Perl to enhance ModSecurity
  • The secret to great reporting with Drupal 7
  • A more colorful LibreOffice unveiled
  • Toward a more colorful LibreOffice
  • Flexible administration with Puppet's Facter and templates
  • Knock for OpenSSH
  • Get more out of phpMyAdmin
  • Image annotation in GIMP, Dia, and OpenOffice Draw
  • Solr, Drupal 7, and faceted search
  • Using FreeNAS' new full disk encryption for ZFS

Connect with Us!

Current Articles | RSS Feed RSS Feed

Ensure High Availability with CentOS 6 Clustering

Posted by Anatoliy A. Dimitrov on Tue, Nov 29, 2011
  
Email This Email Article  
Tweet  
  

Today, businesses require high availability (HA) and zero downtime for many IT services. Redundancy and failover mechanisms are essential parts of the HA infrastructure. CentOS 6 provides a reliable, flexible, and powerful clustering solution that lets you use multiple servers to avoid downtime.



CentOS's sophisticated HA suite provides all the packages you need for reliable and secure internode communication, including fencing of failed nodes. It can handle allocating complex resources and services across multiple nodes, and ensure that if a service fails, end users don't see the failure because the service is either restarted or reallocated to a different node, according to a predefined scenario.



CentOS's HA suite relies on a few key components:




    • Corosync is responsible for the clustering infrastructure, including node membership and cluster quorum.

    • RGManager manages the cluster's services, and is responsible for their state and node locations.

    • The CMAN cluster management framework is responsible for the connection between the Corosync clustering infrastructure and RGManager's service management.



CentOS's clustering components are designed to work together to ensure high availability. If you want to see how they work, here's how to set up a simple cluster with two nodes sharing an IP address. Once you know the basics, you can extend your setup with more nodes and more complicated options.



Installation



To begin, each of the nodes, which we'll call n1 and n2, should have a minimal CentOS install. In CentOS 6 you can use the small installation images CentOS-6.0-i386-minimal.iso and CentOS-6.0-x86_64-minimal.iso for 32- and 64-bit architectures. Each node must have two network interfaces (hardware or virtual); one, which is used for external and inter-node communication, is always up, while the second, used for sharing one IP address between the nodes, is set up only on the first node by default.



You must permit communication between nodes in iptables. The easiest way to do this is to add the directive -A INPUT -s IP -j ACCEPT to /etc/sysconfig/iptables before the last REJECT statements. Make sure to replace IP with the other node's IP address.

Edit the file /etc/sysconfig/selinux, set SELinux from enforcing to permissive, and reboot. You don't want to enforce SELinux in the beginning because it would prevent you from getting the HA cluster operational. Once you have the cluster running properly, you can re-enable SELinux, as described in this article.



Now you can install the packages needed for HA. Run the command yum groupinstall "High Availability" on both nodes to install the HA core packages, including cman and rgmanager. You may also want to run yum install luci on n1 to install Luci, CentOS's web interface for managing a cluster. Luci makes CentOS HA cluster administration easier because it lets you navigate through all the complex options. However, its reliability is questionable, and it's not as powerful as the command line.



Ensure that the necessary services are started by default. Run on both nodes:




chkconfig cman on
chkconfig rgmanager on
chkconfig modclusterd on
chkconfig ricci on



On the management node, run chkconfig luci on if you want to have Luci started too. You can then access Luci at https://n1's_ip_address:8084. For authentication, use the root login.

 

19a98812-f823-48dc-841e-bf029c63c6d7

 

Configuration



Before you start your new cluster you have to configure it. You can perform all the cluster configuration on node n1; it will be sent to node n2 automatically. To begin, run the Cluster Configuration System command ccs_tool create -2 hacluster. This will create the file /etc/cluster/cluster.conf with default skeleton settings for a two-node cluster.



Edit /etc/cluster/cluster.conf and replace the names of the two nodes NEEDNAME-01 and NEEDNAME-02 with the names of your nodes. It's important that these names resolve to those of their respective hosts; if you are using short names like n1 and n2 you might have to add them to /etc/hosts with their corresponding IP addresses.



The next step is to ensure that the configuration is valid. Run the command ccs_config_validate without any arguments. If you've correctly filled in the values for the hosts you should see the confirmation message Configuration validates. If you haven't, the command will report the exact error, such as Cannot find node name in cluster.conf.



Once you know the configuration is valid you can update the configuration file on all the nodes by manually copying the file /etc/cluster/cluster.conf. You'll need to do this only once, before the cluster software has been started. Once you have a working cluster, you can apply all subsequent configuration changes with the command cman_tool version -r. cman_tool will ask for the root passwords on both nodes.



Now start the cluster daemons with the command service cman start && service rgmanager start && service modclusterd start && service ricci start. To verify that both members are connected and synchronized after the change, run the command clustat. The output should be similar to this:




Cluster Status for hacluster @ Tue Nov 22 02:32:06 2011
Member Status: Quorate

Member Name ID Status
------ ---- ---- ------
10.1.1.1 1 Online, Local
10.1.1.2 2 Online



Now you have a working cluster, but no software is taking advantage of it. To set that up, add an IP address service for the cluster. This will establish its virtual IP address (VIP), which we'll set to 10.1.1.100 in our case.



Important note: Because our virtual IP address 10.1.1.100 is in the same network as our primary interfaces' IP addresses, it does not require additional configuration. However, if it were in a different network, it could not be started on the same primary interfaces and would require a separate physical or virtual interface configured especially for it. This rule applies to all cluster services, such as httpd. Before entrusting any service's management to the cluster, you have to ensure that it is properly configured, and the cluster's only task is to start and stop it.



In our simple scenario, the idea is that if the main node (n1) fails, the second node handles all traffic to the VIP. This will ensure availability across the cluster as long as critical services are mirrored on both nodes. To add a VIP address, edit the rm (resource manager) configuration in /etc/cluster/cluster.conf as follows:




< rm >
< failoverdomains/ >
< resources/ >
< service autostart="1" exclusive="0" name="IP" recovery="relocate" >
< ip address="10.1.1.100" monitor_link="on" sleeptime="10"/ >
< /service >
< /rm >



Note: Each time you make changes to the file /etc/cluster/cluster.conf, be sure you increment config_version="n". If you don't, cman_tool will not recognize the configuration file as changed and will not proceed.



After you add the new resource, service and increment config_version, apply the configuration with the command cman_tool version -r. You should be able to see the changes logged in /var/log/messages, as follows:




Nov 22 02:36:07 n1 modcluster: Updating cluster.conf
Nov 22 02:36:10 n1 corosync[1117]: [QUORUM] Members[2]: 1 2
Nov 22 02:36:19 n1 rgmanager[1423]: Reconfiguring
Nov 22 02:36:19 n1 rgmanager[1423]: Loading Service Data
Nov 22 02:36:49 n1 rgmanager[1423]: Stopping changed resources.
Nov 22 02:36:49 n1 rgmanager[1423]: Restarting changed resources.
Nov 22 02:36:49 n1 rgmanager[1423]: Starting changed resources.
Nov 22 02:36:49 n1 rgmanager[1423]: Initializing service:IP
Nov 22 02:36:52 n1 rgmanager[4129]: Removing IPv4 address 10.1.1.100/24 from eth1
Nov 22 02:37:03 n1 rgmanager[1423]: Starting stopped serviceservice:IP
Nov 22 02:37:08 n1 rgmanager[4241]: Adding IPv4 address 10.1.1.100/24 to eth1
Nov 22 02:37:16 n1 rgmanager[1423]: Service service:IP started



To verify the cluster configuration and check that the new service is running, run clustat again. You should see output similar to this:




Cluster Status for hacluster @ Tue Nov 22 03:07:50 2011
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
10.1.1.1 1 Online, Local, rgmanager
10.1.1.2 2 Online, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:IP 10.1.1.1 started



As the above output shows, the VIP is allocated to 10.1.1.1 (n1). If you want to reallocate the VIP, use the command clusvcadm -r IP -m 10.1.1.2, where IP is the service and 10.1.1.2 is the node IP or hostname that you want to serve as your primary node.



Congratulations – you now have a working cluster with a simple configuration that provides basic HA. If the active node goes down and is unreachable, the VIP is automatically reallocated to the second node.



HA Troubleshooting



At least that's the theory. You may experience some issues, and when you do, you'll find that troubleshooting HA clusters is a complex task because of the way they work – various daemons and kernel modules are responsible for different HA tasks, and each of these essential parts of the cluster can cause problems.



Checking logs can provide some clues when things go wrong. By default, all cluster-related messages go to /var/log/messages. These simple messages give basic information about major changes in the cluster, such as starting, stopping, or moving a service. This log also collects errors about problems with cluster communication.



You can find additional logs in the directory /var/log/cluster/. Each of the cluster's components has a log file; the most important ones are:




    • corosync.log, which provides information about intercluster communication and synchronization. You can also use the command corosync-objctl without any arguments to show the cluster's configuration in full detail.

    • rgmanager.log, which lists information related to the cluster's services.

    • fenced.log, which provides information about fencing nodes. This log is not very detailed by default. It's better to use the command fence_tool dump when debugging fenced daemon problems.



Conclusion

 

We created a simple cluster here, but using CentOS's HA cluster is not always this straightforward. If you want to support complex services such as the Global File System (GFS) and provide reliable fencing, things can get much more complicated. As you explore advanced options, you can find more information in the Red Hat clustering documentation.

Follow @openlogic
Follow @CloudSwing

This work is licensed under a Creative Commons Attribution 3.0 Unported License
Creative Commons License.Follow @openlogic
Follow @OSCloudServices

This work is licensed under a Creative Commons Attribution 3.0 Unported License
Creative Commons License.
Tags: Linux, CentOS, Technical, Tutorial, Server, SELinux, corosync, iptables

Comments

hi, it is a good tutorial. i have only a doubt. How to setup a virtual ip? ty
Posted @ Wednesday, December 19, 2012 2:04 PM by Lucas
I already finish the installation and it looks to be working fine, but when I set off the owner cluster, it goes offline and does not made the change. Can someone help me please ?
Posted @ Wednesday, April 17, 2013 6:47 PM by Marcos Castro
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics

Loading...
Error sending email
Email sent successfully

Email article
Email To : 
Your name : 
Message : (maximum 200 characters)
Home | Search | Contact Us | Products and Support | Services | Enterprise OSS Blog | Wazi Technical Blog | Resources Library | Cloud Services | Partners | Customers | Community | Company | Careers | News and Events
Products
OpenLogic Exchange (OLEX)
License Compliance Module
OSS Discovery
OSS Deep Discovery
OpenUpdate
Services
Open Source Support
CentOS Support
Scanning & Compliance
Open Source Training
Professional Services
Solutions
Support & Indemnification
Open Source Governance
Open Source Scanning
Open Source Provisioning
Consulting & Training
Contact Us
1-888-673-6564


© 2013 OpenLogic, Inc. All rights reserved.
Site Map  |  Privacy Policy