As of a few weeks back I started getting very slow shutdowns ...
Anyone else seeing this or got some color on the problem ?
here's what seems to be happening:
shutdown ..
network goes offline ntpd asks for an update (looks like it traps the kill -15 .. and stays up trying to clean up ) named times out .. and gets error .. wont shutdown cleanly (it has received a kill -15 which it has trapped) named waits trying to clean up - cannot clean up coz network is down. ntpd finally exits (and stops nagging named)
delay of shutdown now about 4 mins .. . eventually named gets killed harder harder (and finally exits)
gene/
====================== messages ======================= Mar 31 06:55:38 lap1 ntpd[2767]: 192.5.41.41 interface 10.10.10.70 -> (null) Mar 31 06:55:43 lap1 avahi-daemon[1467]: Got SIGTERM, quitting. Mar 31 06:55:43 lap1 avahi-daemon[1467]: Leaving mDNS multicast group on interface virbr0.IPv4 with address 192.168.122.1. Mar 31 06:55:43 lap1 avahi-daemon[1467]: avahi-daemon 0.6.27 exiting. Mar 31 06:55:43 lap1 ntpd[2767]: ntpd exiting on signal 15 Mar 31 06:55:43 lap1 ntpd[1546]: ntpd 4.2.6p3-RC10@1.2239-o Thu Nov 25 16:18:33 UTC 2010 (1) Mar 31 06:55:43 lap1 ntpd[1549]: proto: precision = 0.839 usec Mar 31 06:55:43 lap1 ntpd[1549]: 0.0.0.0 c01d 0d kern kernel time sync enabled Mar 31 06:55:43 lap1 ntpd[1549]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123 Mar 31 06:55:43 lap1 ntpd[1549]: Listen and drop on 1 v6wildcard :: UDP 123 Mar 31 06:55:43 lap1 ntpd[1549]: Listen normally on 2 lo 127.0.0.1 UDP 123 Mar 31 06:55:43 lap1 ntpd[1549]: Listen normally on 3 virbr0 192.168.122.1 UDP 123 Mar 31 06:55:43 lap1 ntpd[1549]: Listen normally on 4 wlan0 fe80::21f:3bff:fe27:f3c5 UDP 123 Mar 31 06:55:43 lap1 ntpd[1549]: Listen normally on 5 lo ::1 UDP 123 Mar 31 06:55:43 lap1 ntpd[1549]: Listening on routing socket on fd #22 for interface updates Mar 31 06:55:44 lap1 named[1483]: error (network unreachable) resolving 'tock.usno.navy.mil/AAAA/IN': 10.10.10.63#53 Mar 31 06:55:44 lap1 named[1483]: error (network unreachable) resolving 'charon.nofs.navy.mil/A/IN': 10.10.10.63#53
... (loads of similar errors)
Mar 31 06:55:49 lap1 named[1483]: error (network unreachable) resolving 'i.root-servers.net/AAAA/IN': 2001:dc3::35#53 Mar 31 06:55:49 lap1 named[1483]: error (network unreachable) resolving 'l.root-servers.net/AAAA/IN': 2001:500:1::803f:235#53 Mar 31 06:55:49 lap1 ntpd[1549]: ntpd exiting on signal 15 Mar 31 06:55:49 lap1 named[1483]: received control channel command 'stop' Mar 31 06:55:49 lap1 named[1483]: shutting down: flushing changes Mar 31 06:55:49 lap1 named[1483]: stopping command channel on 127.0.0.1#953 Mar 31 06:55:49 lap1 named[1483]: stopping command channel on ::1#953 Mar 31 06:55:49 lap1 named[1483]: no longer listening on 127.0.0.1#53 Mar 31 06:56:13 lap1 dnsmasq[1991]: no servers found in /etc/resolv.conf, will retry
==================================================================
On 03/31/2011 01:01 PM, Genes MailLists wrote:
This may be the bug:
On 03/31/2011 01:01 PM, Genes MailLists wrote:
As of a few weeks back I started getting very slow shutdowns ...
Anyone else seeing this or got some color on the problem ?
here's what seems to be happening:
...
Mar 31 06:55:49 lap1 named[1483]: no longer listening on 127.0.0.1#53 Mar 31 06:56:13 lap1 dnsmasq[1991]: no servers found in /etc/resolv.conf, will retry
==================================================================
Also why is dnsmasq running at all - its turned off - and named is doing DNS ?
chkconfig --list dnsmasq dnsmasq 0:off 1:off 2:off 3:off 4:off 5:off 6:off
Is this part of the problem ???
On Thu, Mar 31, 2011 at 7:10 PM, Genes MailLists lists@sapience.com wrote:
Also why is dnsmasq running at all - its turned off - and named is doing DNS ?
chkconfig --list dnsmasq dnsmasq 0:off 1:off 2:off 3:off 4:off 5:off 6:off
Is this part of the problem ???
I am running some tests on i686 with the 2.6.38 kernel - and like you I find that dnsmasq is running despite it being off under chkconfig.
I find that it is started after NetworkManager has started up during boot in the messages log:
Mar 31 19:37:33 lapmike3 dnsmasq[2237]: started, version 2.52 cachesize 150 Mar 31 19:37:33 lapmike3 dnsmasq[2237]: compile time options: IPv6 GNU-getopt DBus no-I18N DHCP TFTP Mar 31 19:37:33 lapmike3 dnsmasq-dhcp[2237]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1h Mar 31 19:37:33 lapmike3 dnsmasq[2237]: reading /etc/resolv.conf Mar 31 19:37:33 lapmike3 dnsmasq[2237]: using nameserver 10.0.0.1#53 Mar 31 19:37:33 lapmike3 dnsmasq[2237]: read /etc/hosts - 2 addresses
However like you I don't know what is starting dnsmasq or why it needs to be running at all.
I am going to turn it off and then shutdown - and see if I get any delay - I also have ntpd running so I will report back once I have turned off and rebooted.
Interestingly I am on a 10.0.x.x network so why dnsmasq is looking at 192.x.x.x I have no idea!
On Thu, Mar 31, 2011 at 7:52 PM, mike cloaked mike.cloaked@gmail.com wrote:
On Thu, Mar 31, 2011 at 7:10 PM, Genes MailLists lists@sapience.com wrote:
Also why is dnsmasq running at all - its turned off - and named is doing DNS ?
chkconfig --list dnsmasq dnsmasq 0:off 1:off 2:off 3:off 4:off 5:off 6:off
Is this part of the problem ???
I am running some tests on i686 with the 2.6.38 kernel - and like you I find that dnsmasq is running despite it being off under chkconfig.
I find that it is started after NetworkManager has started up during boot in the messages log:
Mar 31 19:37:33 lapmike3 dnsmasq[2237]: started, version 2.52 cachesize 150 Mar 31 19:37:33 lapmike3 dnsmasq[2237]: compile time options: IPv6 GNU-getopt DBus no-I18N DHCP TFTP Mar 31 19:37:33 lapmike3 dnsmasq-dhcp[2237]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1h Mar 31 19:37:33 lapmike3 dnsmasq[2237]: reading /etc/resolv.conf Mar 31 19:37:33 lapmike3 dnsmasq[2237]: using nameserver 10.0.0.1#53 Mar 31 19:37:33 lapmike3 dnsmasq[2237]: read /etc/hosts - 2 addresses
However like you I don't know what is starting dnsmasq or why it needs to be running at all.
I am going to turn it off and then shutdown - and see if I get any delay - I also have ntpd running so I will report back once I have turned off and rebooted.
Interestingly I am on a 10.0.x.x network so why dnsmasq is looking at 192.x.x.x I have no idea!
Interestingly: [root@lapmike3 log]# service dnsmasq stop Shutting down dnsmasq: [FAILED] [root@lapmike3 log]# service dnsmasq status dnsmasq (pid 2237) is running...
Am confused!
On Thu, Mar 31, 2011 at 7:52 PM, mike cloaked mike.cloaked@gmail.com wrote:
On Thu, Mar 31, 2011 at 7:10 PM, Genes MailLists lists@sapience.com wrote:
Also why is dnsmasq running at all - its turned off - and named is doing DNS ?
chkconfig --list dnsmasq dnsmasq 0:off 1:off 2:off 3:off 4:off 5:off 6:off
Is this part of the problem ???
I am running some tests on i686 with the 2.6.38 kernel - and like you I find that dnsmasq is running despite it being off under chkconfig.
I find that it is started after NetworkManager has started up during boot in the messages log:
Mar 31 19:37:33 lapmike3 dnsmasq[2237]: started, version 2.52 cachesize 150 Mar 31 19:37:33 lapmike3 dnsmasq[2237]: compile time options: IPv6 GNU-getopt DBus no-I18N DHCP TFTP Mar 31 19:37:33 lapmike3 dnsmasq-dhcp[2237]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1h Mar 31 19:37:33 lapmike3 dnsmasq[2237]: reading /etc/resolv.conf Mar 31 19:37:33 lapmike3 dnsmasq[2237]: using nameserver 10.0.0.1#53 Mar 31 19:37:33 lapmike3 dnsmasq[2237]: read /etc/hosts - 2 addresses
However like you I don't know what is starting dnsmasq or why it needs to be running at all.
I am going to turn it off and then shutdown - and see if I get any delay - I also have ntpd running so I will report back once I have turned off and rebooted.
I had a shutdown that appeared uneventful (i686) in about 10 or 15 seconds:
Mar 31 19:57:04 lapmike3 ntpd[2959]: ntpd exiting on signal 15 Mar 31 19:57:05 lapmike3 named[3530]: received control channel command 'stop' Mar 31 19:57:05 lapmike3 named[3530]: shutting down: flushing changes Mar 31 19:57:05 lapmike3 named[3530]: stopping command channel on 127.0.0.1#953 Mar 31 19:57:05 lapmike3 named[3530]: no longer listening on 127.0.0.1#53 Mar 31 19:57:05 lapmike3 named[3530]: no longer listening on 10.0.0.69#53 Mar 31 19:57:05 lapmike3 named[3530]: no longer listening on 192.168.122.1#53 Mar 31 19:57:05 lapmike3 named[3530]: exiting Mar 31 19:57:05 lapmike3 NetworkManager[1300]: <info> caught signal 15, shutting down normally.
with other lines and then ending the shutdown at: Mar 31 19:57:07 lapmike3 kernel: Kernel logging (proc) stopped. Mar 31 19:57:07 lapmike3 rsyslogd: [origin software="rsyslogd" swVersion="4.6.3" x-pid="1197" x-info="http://www.rsyslog.com"] exiting on signal 15.
So there was no significant delay once ntpd was shut down till the machine was off. I did a restart and the next log entry for bootup was about 40 seconds later after restart and POST.
In your log entries quoted in the original post there seemed to be about 35 seconds covered by that part of the log output - so what section of the log covered the three or so minutes? Will that give a clue I wonder?
On 03/31/2011 03:09 PM, mike cloaked wrote:
On Thu, Mar 31, 2011 at 7:52 PM, mike cloaked mike.cloaked@gmail.com wrote:
On Thu, Mar 31, 2011 at 7:10 PM, Genes MailLists lists@sapience.com wrote:
about 40 seconds later after restart and POST.
In your log entries quoted in the original post there seemed to be about 35 seconds covered by that part of the log output - so what section of the log covered the three or so minutes? Will that give a clue I wonder?
Sorry I thought I said - they are all the same errors from named similar to the first 2 and last 2 I showed - i removed the rest and put ellipsis in their stead.
On Thu, Mar 31, 2011 at 8:24 PM, Genes MailLists lists@sapience.com wrote:
Sorry I thought I said - they are all the same errors from named similar to the first 2 and last 2 I showed - i removed the rest and put ellipsis in their stead.
OK I thought the ellipsis was in in the middle of the time range (about 5 seconds in the time listed for the log entries)? You mean there were other log messages outside the range where there was the long delay?
Either way I don't see this for i686 and am not running an equivalent 64 bit machine. So it would be useful to see other corresponding reports to see if this is only being manifest on x86_64?
On Thu, 31 Mar 2011 14:10:32 -0400 Genes MailLists wrote:
Also why is dnsmasq running at all
Because (almost certainly) due to libvirtd running, which always starts dnsmasq to enable its "default" network model. I always use virsh to delete the default network and config all my virtual machines with a bridge network.
At least that was what I found when I finally tracked down the dnsmasq origins on my home system.
On Thu, Mar 31, 2011 at 8:02 PM, Tom Horsley horsley1953@gmail.com wrote:
On Thu, 31 Mar 2011 14:10:32 -0400 Genes MailLists wrote:
Also why is dnsmasq running at all
Because (almost certainly) due to libvirtd running, which always starts dnsmasq to enable its "default" network model. I always use virsh to delete the default network and config all my virtual machines with a bridge network.
At least that was what I found when I finally tracked down the dnsmasq origins on my home system.
OK makes sense and I do have libvirtd running.
On 03/31/2011 03:16 PM, mike cloaked wrote:
On Thu, Mar 31, 2011 at 8:02 PM, Tom Horsley horsley1953@gmail.com wrote:
On Thu, 31 Mar 2011 14:10:32 -0400 Genes MailLists wrote:
Also why is dnsmasq running at all
Because (almost certainly) due to libvirtd running, which always starts dnsmasq to enable its "default" network model. I always use virsh to delete the default network and config all my virtual machines with a bridge network.
At least that was what I found when I finally tracked down the dnsmasq origins on my home system.
OK makes sense and I do have libvirtd running.
Ahhh .. thank you .. .but I use virtualbox - do I need libvirtd ?