Open Source Software Technical Articles

Want the Best of the Wazi Blogs Delivered Directly to your Inbox?

Subscribe to Wazi by Email

Your email:

Connect with Us!

Current Articles | RSS Feed RSS Feed

How to Troubleshoot Your CentOS Linux Server

  
  
  

Ever felt bewildered when troubleshooting a problem with your CentOS server? A handful of tools and best practices for troubleshooting can help you solve most of your problems swiftly and efficiently.



Some problems are easier than others to diagnose. For example, it's clear what the cause (and the solution) is when you see an error such as Web application cannot write to /var/www/cache/. Unfortunately, such detailed output is rarely available because it may raise performance or security issues.



In many production environments, detailed logging that shows the result of every operation is too expensive, in terms of additional CPU and disk input/output operations. Nevertheless, checking the logs is the first step to follow once you realize that the solution to a problem is not obvious. By default, all logs in CentOS are located in the directory /var/log/. Some important log files are:




    • /var/log/messages, which stores logs from many native CentOS services, such as the kernel logger, the network manager, and many other services that don't have their own log files. This log file tells you if there are kernel problems (kernel panic messages) or kernel limits violations, such as the number of currently open files, which can cause system problems. You can fix kernel misconfigurations by editing the file /etc/sysctl.conf and changing the value for the corresponding error.

    • /var/log/dmesg, which contains information about hardware found by the kernel drivers. It can help you troubleshoot hardware problems and missing drivers. You can also use the command /bin/dmesg for similar purposes. /bin/dmesg provides more detailed information in real time, while the log file keeps less information for historical purposes.

    • /var/log/audit/audit.log, which is the file in which the Linux Auditing System (auditd) writes its logs, including all SELinux information. If auditd is disabled, SELinux sends its logs to /var/log/messages. SELinux is a common suspect for any strange behavior and problems in CentOS. It is enabled by default in CentOS 6 and should not be frivolously disabled, as it is important for security. You can check its status with the command sestatus. A Wazi article about Linux server hardening covers the basics of SELinux, including how to adjust its policies in order to avoid problems.

    • Service- and application-specific logs – Many applications create logs in other places, and have options that control where and what to log. By default in CentOS the Apache web server logs in the directory /var/log/httpd/, mail servers log in /var/log/maillog, and MySQL logs in /var/log/mysqld.log. However, not all logs are located in the logs directory. Some applications, such as user-space programs, may not have privileges to write there. Others prefer to log inside their own root directory. You may need to consult an application's manual to learn where it writes its logs.



Logs differ between applications and services but most of them describe events specifying the time of occurrence, a severity level and a message. An example log entry from a MySQL server looks like this:




120503 7:34:22 [ERROR] Can't create IP socket: Too many open files in system


First columns show the date (120503, i.e. 3 May 2012) and time. ERROR is the severity level; importance in severity levels rises from debug, info, and notice messages that are informative rather than describing an issue. Next come warning and error – severity levels that point to an actual problem. Finally, the most alarming events are critical, alert, and emergency.



If the logs you examine don't reveal much information, try increasing the logging level by changing the configuration of a service or application you are troubleshooting. For example, if you are debugging web problems with Apache, edit the file /etc/httpd/conf/httpd.conf and change LogLevel warn to LogLevel debug. Most services and applications support debug level, which makes the logger describe in detail what is happening with a service.



Of course, information from logs is not always sufficient to help you zero in on the cause of a problem. After all, logs contain mostly predefined messages about break points that have been designed by programmers. That's why sometimes logs not only lack information but may also contain "unknown" errors or misleading messages.



Tracing Network Problems



You may run into problems if an application or service is unable to connect to your network. Check whether the application is really listening on the IP address and port that you expect it to be by executing the command netstat -ntulp, where n is for numeric output, t is for TCP, u is for UDP, l is for listening programs, and p shows also the program name along with the rest of the attributes. Here is an example of output you might see:




Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 2661/sshd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 3546/master


Two things are important in the above example. First, sshd is listening on all local addresses (0.0.0.0) on port 22. This is how it is supposed to look when it works properly. However, master, which is the Postfix mail daemon, is listening only on the local 127.0.0.1 address. This is a typical example of a problem; remote clients would not be able to connect to and use the mail server. To fix it you would have to edit one of Postfix's configuration files, /etc/postfix/main.cf, and change inet_interfaces = localhost to inet_interfaces = all.



Sometimes programs don't listen on the interfaces (usually the external ones) they are supposed to, such as the mail server from the above example. That may be intentional; using a non-standard port can be a good security measure against unauthorized access. Other times the port specified for the application is a problem. The port could be already in use or the application may not have the necessary privileges to bind to that port; ports below 1024 require applications to be started with superuser privileges.



Next, check the firewall. CentOS has a strict firewall policy enabled by default that allows only SSH connections from outside and blocks external access to any other installed services. You can check the current firewall rules using the command /sbin/iptables -L -n, where L says to list the rules and n specifies numeric output. Here is the output from a default CentOS 6 iptables setup:




Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
ACCEPT icmp -- 0.0.0.0/0 0.0.0.0/0
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:22
REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited

Chain FORWARD (policy ACCEPT)
target prot opt source destination
REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited

Chain OUTPUT (policy ACCEPT)
target prot opt source destination


This shows that only ssh is allowed for incoming connections, while everything is allowed for outgoing connections. In more detail, there are three chains – INPUT for incoming packets, FORWARD for routed packets, and OUTPUT for outgoing packets. Next to each chain name is the default policy action, which is ACCEPT, indicating that by default everything is allowed. If the policy action were DROP, the logic of the following rules would be reversed. For more information, check the iptables howto.



The fastest way to eliminate the firewall as the cause for your problem is to disable it by executing the command service iptables stop, but think twice before you disable software designed to keep your system safe from malicious outsiders. If disabling the firewall resolves your issue, amend the firewall rules and save the changes. Here is a command that enables TCP port 80 for incoming connections to Apache in CentOS 6, saves the changes, and restarts the firewall service:




/sbin/iptables -A INPUT -m state --state NEW -m tcp -p tcp --dport 80 -j ACCEPT
service iptables save
service iptables restart

19a98812-f823-48dc-841e-bf029c63c6d7

Tracking Processes and Open Files



The next step is to investigate running processes and the commands and files associated with them. You may unearth a zombie process that has to be killed before another application can start properly. Knowing all commands and files associated with a process can tell you how the process was started and by whom, which may help you troubleshoot security problems.



First, use the command /bin/ps auxf to list the current processes. The arguments mean the command should list all process (ax) along with their owners (u) in full format (f), which includes the commands that started them. If the list is too long you can use the pipe and grep commands to search for something specific.



Make sure that the owner has the correct access privileges to all necessary files. For example, an Apache web process under the name nobody should have full permissions for all files inside its document root directory. That's why the most practical and secure thing to do is to make the user nobody the owner of these files.



Knowing what files are associated with a program is useful because problems with a program's files can reflect on the program itself. The best way to see what files were opened by a program is to run /usr/sbin/lsof. Without any additional arguments, lsof lists all opened files system-wide for all users. You can refine the output by grepping for a specific command (process name), user, or file name.



One common Linux situation is having files locked by one process, preventing them from being written by another. lsof can debug such cases; you can grep its output for the locked file name. If the locking process is responsive, stop it by running its initialization script: service name_of_service stop. If it's not, you can kill the process by executing kill -9 pid_of_process.



lsof shows even deleted files associated with currently running processes. This may help you to troubleshoot situations with files deleted on purpose, as may happen if you've been the victim of a security breach. Checking deleted files is also useful when troubleshooting free disk space problems. Even though a file has been deleted, its disk space might not have been reclaimed yet because a process has not fully released it. In such cases you have to stop the process to let the system reclaim the deleted file's space.



Finally, strace can help you trace system calls and signals. Use it by appending it before the command you are debugging, like this: /usr/bin/strace some_command. File errors are indicated by "No such file or directory" messages. However, these errors about missing files do not necessarily indicate a problem. This is be because a file might not be present when a feature is disabled or package is not installed, for example.



Resolving Missing Dependencies



Problems with missing dependencies usually occur when your operating system or applications have been built via custom installations. CentOS installations using yum are straightforward because all dependencies are resolved automatically. However, when you perform manual installations, you can run into missing libraries and programs. It gets even worse when you are running a 64-bit version of CentOS and you are installing a program that has 32-bit dependencies. An example of such an error is some_app: error while loading shared libraries: libstdc++.so.5: cannot open shared object file: No such file or directory.



To rectify the problem, carefully read the installation manual and ensure that all required packages are installed. If you are not sure what package provides a given file or library, use yum whatprovides. If, for instance, you got the error about the missing libstdc++.so.5, execute yum whatprovides */libstdc++.so.5, and you will see that this file is provided by the compat-libstdc++ package.



If you have already installed the package that's responsible for the error, check whether the custom installation does not require 32-bit libraries. The easiest way to do this is by using the command ldd, as in ldd /full/path_to_custom_command, and see that dependencies are searched in /lib/ instead of /lib64/. In CentOS you can install packages for both 32- and 64-bit architectures, even though this is not a good practice from a performance or stability point of view.



Boot in Rescue Mode as a Last Resort



In some extreme cases you may not be able to perform any of the above troubleshooting steps because your system has crashed and cannot even boot. In such cases, turn to the CentOS installation disk and its rescue mode.



First, prepare to boot your system from installation media containing an CentOS installation image. Even the minimal installation image is good for this. Then, from the first boot-up menu, choose "Rescue installed system" and follow the wizard that guides you through the process. At one point you are informed that the rescue environment will try to find your installation and mount it in /mnt/sysimage/. Choose "Continue" so that the system is mounted in read/write mode.



When the rescue wizard completes, choose to start shell in the final step. This gives you a bash shell, and you can go to your damaged installation by chrooting into it with the command chroot /mnt/sysimage/. After you do that you can work as if your old system has booted properly and perform all the troubleshooting you need.



All of these steps we've talked about may be helpful for you even if you have minimal CentOS background, though the better you understand CentOS, the more efficiently you can troubleshoot problems with it.




This work is licensed under a Creative Commons Attribution 3.0 Unported License
Creative Commons License.


This work is licensed under a Creative Commons Attribution 3.0 Unported License
Creative Commons License.

Comments

Thanks you for this article! It help me a lot to trouble shoot problems in a centos server. kinda weird I experience our centos server experience downtime everyday sad to say we already set up a cron to restart the server when its down it so happen our main server is down that's why the cron is not running properly.
Posted @ Sunday, February 02, 2014 10:03 PM by Ariel Licas
How about if your website is not showing? how to diagnose? 
Posted @ Friday, July 25, 2014 11:28 PM by bryan
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics