This morning, we noticed that all our web pages on one server were not accessible. So first I checked the apache error logs. /var/log/apache2/error.log was 0 bytes long. So I went to error.log.1:
# tail /var/log/apache2/error.log.1 … [Sun Jul 14 07:51:20 2013] [notice] Graceful restart requested, doing restart [Sun Jul 14 07:51:21 2013] [notice] seg fault or similar nasty error detected in the parent process
So it looks like apache hung during a graceful restart.
For those who do not know what a graceful restart is:
The graceful signal causes the parent process to advise the children to exit after their current request (or to exit immediately if they’re not serving anything). The parent re-reads its configuration files and re-opens its log files. When a child dies off the parent replaces it with a child of the new generation of the configuration. This immediately begins serving new requests.
You can send such a graceful signal to apache2 like this:
# apachectl graceful
If it worked fine, you should see the following in /var/log/apache2/error.log:
[Sun Jul 14 12:50:59 2013] [notice] Graceful restart requested, doing restart [Sun Jul 14 12:51:03 2013] [notice] Digest: generating secret for digest authentication ... [Sun Jul 14 12:51:03 2013] [notice] Digest: done [Sun Jul 14 12:51:03 2013] [notice] mod_python: Creating 8 session mutexes based on 150 max processes and 0 max threads. [Sun Jul 14 12:51:03 2013] [notice] mod_python: using mutex_directory /tmp [Sun Jul 14 12:51:03 2013] [notice] Apache/2.2.16 (Debian) DAV/2 mod_fcgid/2.3.6 Phusion_Passenger/3.0.11 mod_python/3.3.1 Python/2.6.6 mod_ssl/2.2.16 OpenSSL/0.9.8o mod_perl/2.0.4 Perl/v5.10.1 configured -- resuming normal operations
Depending on which mods are installed, your output might be different.
Now to the problem we had this morning.
When apache2 is running, you have the parent process (usually running under user root in order to be able to listen to the standard HTTP/S ports) and a few children processes:
# ps -Af | grep apache2 | grep -v grep www-data 10777 22889 0 12:31 ? 00:00:02 /usr/sbin/apache2 -k start www-data 14601 22889 0 12:51 ? 00:00:00 /usr/sbin/apache2 -k start www-data 14618 22889 1 12:51 ? 00:00:02 /usr/sbin/apache2 -k start www-data 14619 22889 0 12:51 ? 00:00:00 /usr/sbin/apache2 -k start www-data 14620 22889 0 12:51 ? 00:00:00 /usr/sbin/apache2 -k start www-data 14621 22889 0 12:51 ? 00:00:00 /usr/sbin/apache2 -k start www-data 14622 22889 0 12:51 ? 00:00:00 /usr/sbin/apache2 -k start www-data 14623 22889 0 12:51 ? 00:00:00 /usr/sbin/apache2 -k start www-data 14627 22889 0 12:51 ? 00:00:00 /usr/sbin/apache2 -k start www-data 14629 22889 0 12:51 ? 00:00:00 /usr/sbin/apache2 -k start www-data 14636 22889 0 12:51 ? 00:00:00 /usr/sbin/apache2 -k start root 22889 1 0 11:12 ? 00:00:01 /usr/sbin/apache2 -k start
This morning, it looked like this:
# ps -Af | grep apache2 | grep -v grep www-data 14601 1 3 12:51 ? 00:00:10 /usr/sbin/apache2 -k start
So the parent process as well as almost all children processes are gone. Only one child process seems to hang.
The solution is simple, kill the last apache2 process and restart apache2:
# killall apache2 # service apache2 start
If you’ve already checked the processes as shown above you can also just use kill with the PID instead of killall:
# kill 14601 # service apache2 start
I haven’t yet been able to spend much time looking for the reason why this happened. A graceful restart of apache is usually done because the configuration changed or because the log files were rotated.
Many *.log.1 log files have the same timestamp as the entry in /var/log/apache2/error.log.1 and I could see that rsyslogd also got HUPed 1 second later:
syslogd: [origin software="rsyslogd" swVersion="4.6.4" x-pid="1244" x-info="http://www.rsyslog.com"] rsyslogd was HUPed, type 'lightweight'.
So I assume it’s related to rotating log files. CPanel seems to use graceful restarts a lot but our server runs Plesk and it seems not to use graceful restarts by default (still need to check the configuration in the database).