Hi,
I've been experiencing weird reboots leately -- not the "bad-RAM" kinda reboots, they seem to be software-related, because they seem to happen always at the same time.
It usually happens right after boot, when ntpd is synchronizing. This is what appears on /var/log/messages:
Apr 2 22:21:43 localhost ntpd[2300]: synchronized to LOCAL(0), stratum 10 Apr 2 22:21:43 localhost ntpd[2300]: kernel time sync enabled 0001 Apr 2 22:22:49 localhost ntpd[2300]: synchronized to 200.218.160.160, stratum 2
here the system rebooted
Apr 2 22:23:45 localhost syslogd 1.4.1: restart. Apr 2 22:23:45 localhost kernel: klogd 1.4.1, log source = /proc/kmsg started.
After it reboots, it "survives" -- and even synchronizes again:
Apr 2 22:27:09 localhost ntpd[2319]: synchronized to LOCAL(0), stratum 10 Apr 2 22:27:09 localhost ntpd[2319]: kernel time sync enabled 0001 Apr 2 22:29:16 localhost ntpd[2319]: synchronized to 193.6.222.47, stratum 2 Apr 2 22:43:16 localhost ntpd[2319]: time reset -2.639872 s Apr 2 22:47:17 localhost ntpd[2319]: synchronized to LOCAL(0), stratum 10 Apr 2 22:48:22 localhost ntpd[2319]: synchronized to 193.6.222.47, stratum 2
Anyone ever seen something similar? Is it really possible that time syncs could cause reboots? Any other log file I could check for additional clues?
I am using latest kernel (2.6.20-1.2933.fc6)
TIA
Andre
On Monday 02 April 2007, Andre Costa wrote:
Hi,
I've been experiencing weird reboots leately -- not the "bad-RAM" kinda reboots, they seem to be software-related, because they seem to happen always at the same time.
It usually happens right after boot, when ntpd is synchronizing. This is what appears on /var/log/messages:
Apr 2 22:21:43 localhost ntpd[2300]: synchronized to LOCAL(0), stratum 10 Apr 2 22:21:43 localhost ntpd[2300]: kernel time sync enabled 0001 Apr 2 22:22:49 localhost ntpd[2300]: synchronized to 200.218.160.160, stratum 2
here the system rebooted
Apr 2 22:23:45 localhost syslogd 1.4.1: restart. Apr 2 22:23:45 localhost kernel: klogd 1.4.1, log source = /proc/kmsg started.
After it reboots, it "survives" -- and even synchronizes again:
Apr 2 22:27:09 localhost ntpd[2319]: synchronized to LOCAL(0), stratum 10 Apr 2 22:27:09 localhost ntpd[2319]: kernel time sync enabled 0001 Apr 2 22:29:16 localhost ntpd[2319]: synchronized to 193.6.222.47, stratum 2 Apr 2 22:43:16 localhost ntpd[2319]: time reset -2.639872 s Apr 2 22:47:17 localhost ntpd[2319]: synchronized to LOCAL(0), stratum 10 Apr 2 22:48:22 localhost ntpd[2319]: synchronized to 193.6.222.47, stratum 2
Anyone ever seen something similar? Is it really possible that time syncs could cause reboots? Any other log file I could check for additional clues?
I believe that big crash corrections backwards can cause this. And I know that the fedora's all save the time in the mobo's hardware clock at shutdown time, so the clock should be reasonably close when its used to set the system time at the next boot. However, if the cmos battery is getting on in years and laying down on the job of keeping the hardware clock somewhere near coherent while powered off, the wrong time might be recovered at bootup, and not corrected until the startup of ntpd, which actually does a crash correction using ntpdate before handing the keep it correct chores off to the ntpd, which in turn fine tunes the second to maintain the system clock within a few milliseconds of the network time servers.
The location of the reboot in your logs would be the #1 clue as to thise theory to me. The fact that it usually keeps running after one reboot because the hardware clock hasn't had time to go doofy because its now running on electric power is another clue.
So I'd check the cmos battery on the motherboard with a digital meter as step one, after its been off overnight. Over 3 volts would be considered decent for a wee bit yet, below 2.7 or so would be grounds to replace it soonest. ISTR most of them are around 3.3 to 3.6 volts new, but read the voltage stamped on the cell to be sure. Less than say 85% of that rated voltage would be grounds to write the cell's type number down and get one the next time you are in town.
I am using latest kernel (2.6.20-1.2933.fc6)
TIA
Andre
-- Andre Oliveira da Costa
Hi Gene,
On Mon, 02 Apr 2007 23:01:02 -0400 Gene Heskett gene.heskett@verizon.net wrote:
On Monday 02 April 2007, Andre Costa wrote:
Hi,
I've been experiencing weird reboots leately -- not the "bad-RAM" kinda reboots, they seem to be software-related, because they seem to happen always at the same time.
It usually happens right after boot, when ntpd is synchronizing. This is what appears on /var/log/messages:
Apr 2 22:21:43 localhost ntpd[2300]: synchronized to LOCAL(0), stratum 10 Apr 2 22:21:43 localhost ntpd[2300]: kernel time sync enabled 0001 Apr 2 22:22:49 localhost ntpd[2300]: synchronized to 200.218.160.160, stratum 2
> here the system rebooted
Apr 2 22:23:45 localhost syslogd 1.4.1: restart. Apr 2 22:23:45 localhost kernel: klogd 1.4.1, log source = /proc/kmsg started.
After it reboots, it "survives" -- and even synchronizes again:
Apr 2 22:27:09 localhost ntpd[2319]: synchronized to LOCAL(0), stratum 10 Apr 2 22:27:09 localhost ntpd[2319]: kernel time sync enabled 0001 Apr 2 22:29:16 localhost ntpd[2319]: synchronized to 193.6.222.47, stratum 2 Apr 2 22:43:16 localhost ntpd[2319]: time reset -2.639872 s Apr 2 22:47:17 localhost ntpd[2319]: synchronized to LOCAL(0), stratum 10 Apr 2 22:48:22 localhost ntpd[2319]: synchronized to 193.6.222.47, stratum 2
Anyone ever seen something similar? Is it really possible that time syncs could cause reboots? Any other log file I could check for additional clues?
I believe that big crash corrections backwards can cause this. And I know that the fedora's all save the time in the mobo's hardware clock at shutdown time, so the clock should be reasonably close when its used to set the system time at the next boot. However, if the cmos battery is getting on in years and laying down on the job of keeping the hardware clock somewhere near coherent while powered off, the wrong time might be recovered at bootup, and not corrected until the startup of ntpd, which actually does a crash correction using ntpdate before handing the keep it correct chores off to the ntpd, which in turn fine tunes the second to maintain the system clock within a few milliseconds of the network time servers.
Mmmh... the weary CMOS battery theory indeed makes sense, it's about time I replace that. But, what's new to me is that nptdate needs to reboot the machine in order to correct large time-drifts.
The location of the reboot in your logs would be the #1 clue as to thise theory to me. The fact that it usually keeps running after one reboot because the hardware clock hasn't had time to go doofy because its now running on electric power is another clue.
Right, so far I agree it makes perfect sense. Also, IIRC it tends to happen more frequently after I spend a couple of days away from the computer (I guess the clock drifts more and more the longer it is counting solely on CMOS battery). Eg. today the "auto-reboot" did not take place (I used the computer yesterday).
So I'd check the cmos battery on the motherboard with a digital meter as step one, after its been off overnight. Over 3 volts would be considered decent for a wee bit yet, below 2.7 or so would be grounds to replace it soonest. ISTR most of them are around 3.3 to 3.6 volts new, but read the voltage stamped on the cell to be sure. Less than say 85% of that rated voltage would be grounds to write the cell's type number down and get one the next time you are in town.
Nah, I will replace it right away, it is about time =)
Thks a lot for sharing your thoughts, even though it is still unconfirmed, your theory makes perfect sense so far (I would never think of that =)). And, if it proves to be right, it was a hardware problem after all ;-)
Regards,
Andre