I'm guessing no one has an answer as to why FC14 install hangs at the 'waiting for hardware to initialize' while I can install CentOS with no problems what so ever? Or what to do to try and get FC14 installed (tried the standard ide=nodma noapic acpi=off options already). Below are the output of dmesg and lspci as JB previously requested.
-------- Original Message -------- Subject: Re: [Fedora] Re: FC14 Installation Hangs Date: Fri, 21 Jan 2011 14:40:08 -0700 From: Ashley M. Kirchner ashley@pcraft.com Reply-To: Community support for Fedora users users@lists.fedoraproject.org To: Community support for Fedora users users@lists.fedoraproject.org
On 1/21/2011 1:22 PM, JB wrote:
Just install CentOS (as if you were installing F14 - layout mainly), select desktop set (X, GNOME), nothing more. It will take 15 min or so. Then pass to the list the following displays: $ dmesg $ lspci
JB
That I can do. In the interest of not spamming the list with them, I've put them on the web:
http://www.yeehaw.net/dmesg.txt http://www.yeehaw.net/lspci.txt
Hopefully someone can make something out of them and tell me how to get FC14 to work ...
On 1/22/2011 11:11 AM, JB wrote:
I have to guess somewhat, which is a polite way to admit ... Pass these on kernel line: xdriver=vesa nomodeset agp=off
JB
No dice. Same spot.
At this point I may just have to continue running CentOS, it's not like it's the end of the world anyway. Fedora had such a great run up til now ...
On 1/22/2011 11:31 AM, JB wrote:
Remove "quiet" and add "ignore_loglevel" (without "" characters) to the boot options to see where it hangs.
There was no 'quiet' option on the boot line to begin with.
After the 'waiting for hardware to initialize...' line, there are several lines that appear, but the last few are (manually typing these in by the way):
[ 10.232634] scsi 0:0:1:0: CD-ROM TOSHIBA CD-ROM XM-6702B 1007 PQ: 0 ANSI 5 [ 10.233039] sd 0:0:1:0: [sda] Mode Sense: 00 3a 00 00 [ 10.234443] sr0: scsi3-mmc drive: 48x/48x cd/rw xa/form2 cdda tray [ 10.234653] Uniform CD-ROM driver Revision: 3.20 [ 10.236336] sr 0:0:1:0: Attached scsi CD-ROM sr0 [ 10.237553] sr 0:0:1:0: Attached scsi generic sg1 type 5
It looks like after it detects the primary drive, then the slave CD-ROM, it just stops. I tried a completely different optical drive, same result. I don't think it's the optical drive getting it stuck, but I could be wrong.
On 1/22/2011 12:03 PM, JB wrote:
Kernel command line: always keep --> ignore_loglevel add --> enforcing=0
The only major difference this time is that it's also detecting the additional drives that are on the add-on card. Previously it wasn't, but now I can see them after it says 'waiting for hardware to initialize...'. It's going through ata1.00, ata1.01, ata3.00, and ata3.01 - which accounts for the four hard drives, and then it lists the CD-ROM. But it still dies at the same exact spot after detecting the CD-ROM ...
For the record, I previously tried without the add-on card plugged in and it too went nowhere. So there is something either with the CD-ROM or just after it that causing it to hang. And when I say hang, I'm talking complete lock up. NUM key doesn't do anything, the machine is a brick. Only recovery is to hold the power button till it shuts off.
By the way, when booting CentOS, it also displays that 'waiting for hardware to initialize...' however it sits there for a mere 5 seconds then moves on and boots normally.
Ashley M. Kirchner <ashley <at> pcraft.com> writes:
there is something perhaps with your two Ethernet setups (you have 2 NIC controllers shown in lspci).
always keep --> ignore_loglevel enforcing=0 add --> ether=0,0,eth1
this will force probing both Ethernets.
Another kernel option you can try as well is: nousb
JB
On 1/22/2011 12:35 PM, JB wrote:
Ashley M. Kirchner<ashley<at> pcraft.com> writes:
there is something perhaps with your two Ethernet setups (you have 2 NIC controllers shown in lspci).
always keep --> ignore_loglevel enforcing=0 add --> ether=0,0,eth1
this will force probing both Ethernets.
Another kernel option you can try as well is: nousb
It detects the ethernets just fine, one pops up just before the 'waiting for the cows to come home' line and the other just after. Adding the ether option and nousb made no difference.
On Sat, 22 Jan 2011 19:35:16 +0000 (UTC) JB jb.1234abcd@gmail.com wrote:
Ashley M. Kirchner <ashley <at> pcraft.com> writes:
there is something perhaps with your two Ethernet setups (you have 2 NIC controllers shown in lspci).
always keep --> ignore_loglevel enforcing=0 add --> ether=0,0,eth1
this will force probing both Ethernets.
Only on ISA bus so that won't make any difference.
On Saturday, January 22, 2011 01:14:40 pm Ashley M. Kirchner wrote:
On 1/22/2011 12:03 PM, JB wrote:
Kernel command line: always keep --> ignore_loglevel add --> enforcing=0
The only major difference this time is that it's also detecting theadditional drives that are on the add-on card. Previously it wasn't, but now I can see them after it says 'waiting for hardware to initialize...'. It's going through ata1.00, ata1.01, ata3.00, and ata3.01 - which accounts for the four hard drives, and then it lists the CD-ROM. But it still dies at the same exact spot after detecting the CD-ROM ...
For the record, I previously tried without the add-on card pluggedin and it too went nowhere. So there is something either with the CD-ROM or just after it that causing it to hang. And when I say hang, I'm talking complete lock up. NUM key doesn't do anything, the machine is a brick. Only recovery is to hold the power button till it shuts off.
By the way, when booting CentOS, it also displays that 'waiting forhardware to initialize...' however it sits there for a mere 5 seconds then moves on and boots normally.
I looked at URL, http://fedoraproject.org/wiki/KernelCommonProblems, for hints. It has a suggestion, under "Crashes/Hangs", "# initcall_debug will allow to see the last thing the kernel tried to initialise before it hung." I have the impression, from what is said, the kernel is hanging trying to initialize something. Perhaps this parameter will help give us a hint.
On 1/22/2011 12:38 PM, Rick Sewill wrote:
I looked at URL, http://fedoraproject.org/wiki/KernelCommonProblems, for hints. It has a suggestion, under "Crashes/Hangs", "# initcall_debug will allow to see the last thing the kernel tried to initialise before it hung." I have the impression, from what is said, the kernel is hanging trying to initialize something. Perhaps this parameter will help give us a hint.
Well now this is interesting. The very last line just after the CD-ROM part, and also where it hangs, is:
sdd:
Which indicates one of the drives, and in this particular case, it would be the main drive (primary bus, master) based on the specs it's listing a few lines above. So a) why is it being listed as sdd and not sda? b) why is it hanging? The CD-ROM is being listed as sdc ... So what happened to sda and sdb? When I boot into CentOS, it correctly sees sda and sdb for the primary drive and CD-ROM, then it moves to sde, sdf, and sdg (which are all on the add-on card). Is it possible that Fedora 14 is somehow reading the add-on card first and assigning sda and sdb to two of the drives there? And if so, then why is it ignoring the 3rd drive that's also on the add-on?
I should point out that I have tried various other drives as the primary boot drive when I first started experiencing these lock ups, before I starting posting to the list. So I'm fairly certain it's not the drive itself (otherwise the other 4 that I tried would also be defective, and 2 of those were brand new out of the box.)
At this point, I'm inclined to just keep it as is, with CentOS. The machine's been off the net for 24 hours now and I have to get it back into production. So unless someone has a eureka moment, or a burst of bright ideas, I'm going to continue configuring the machine with CentOS and have it in production by Monday.
ata3.01 - which accounts for the four hard drives, and then it lists the CD-ROM. But it still dies at the same exact spot after detecting the CD-ROM ...
Ok try this
Remove the 'quiet' add
irqpoll initcall_debug
The first one tries to catch and deal with hangs due to IRQ routing bugs in the BIOS etc, the second will print a trace of each function called during initialisation. It's not exciting to most people but as part of a Fedora bug report it will let the Fedora kernel maintainers see which initialiser hung the machine.
Alan
Ok try this
Remove the 'quiet' add
irqpoll initcall_debug
The first one tries to catch and deal with hangs due to IRQ routing bugs
in the
BIOS etc, the second will print a trace of each function called during initialisation. It's not exciting to most people but as part of a Fedora
bug
report it will let the Fedora kernel maintainers see which initialiser
hung the
machine.
Alan
I'll try this on Monday when I'm back in the office. In the meantime, is there a way to capture this so I don't have to try and type it all out? While the machine has serial ports, they're all disabled in BIOS and I don't know how else I can capture all the stuff that gets spewed out, other than to take a physical picture of the screen when it hangs again.
meantime, is there a way to capture this so I don't have to try and type it all out? While the machine has serial ports, they're all disabled in BIOS and I don't know how else I can capture all the stuff that gets spewed out, other than to take a physical picture of the screen when it hangs again.
The critical thing to capture if it hangs is the last value dumped by the initcall debug before it hangs. That's the one that says what the last initialiser (the one that hunge the box) was.
You can also attach digital photos to bugs, which is a good way to capture long oopses on dead boxen
On 1/22/2011 2:53 PM, Alan Cox wrote:
Ok try this Remove the 'quiet' add
irqpoll initcall_debug
The first one tries to catch and deal with hangs due to IRQ routing bugs in the BIOS etc, the second will print a trace of each function called during initialisation. It's not exciting to most people but as part of a Fedora bug report it will let the Fedora kernel maintainers see which initialiser hung the machine.
Ok, finally got a moment to try this. Current kernel line is:
xdriver=vesa nomodeset ide=nodma noapic acpi=off ignore_loglevel enforcing=0 initcall_debug
That doesn't give me anything I don't already know or have seen - it looks up after discovering the main drive and CD. Adding 'irqpoll' to the line just locks up way earlier in the boot process:
.... [ 0.000000] SLUB: Genslabs=13, HWalign=32, Order=0-3, MinObjects=0, CPUs=2, Nodes=1 [ 0.000000] Hierarchical RCU implementation. [ 0.000000] - RCU dyntick-idle grace-period acceleration is enabled. [ 0.000000] - RCU-based detection of stalled CPUs is disabled. [ 0.000000] - Verbose stalled-CPUs detection is disabled. [ 0.000000] NR_IRQS:2304 [ 0.000000] Console: colour VGS+ 80x25 [ 0.000000] console [tty0] enabled [ 0.000000] spurious 8259A interrupt: IRQ7.
... and it dies.
If I try to boot with *just* 'ignore_loglevel irqpoll', same thing, it dies like above.
What I don't get is how this dies with FC14 but not with CentOS 5 ... did they figure out something that the FC developers haven't?
A
On Mon, 2011-01-24 at 12:09 -0700, Ashley M. Kirchner wrote:
On 1/22/2011 2:53 PM, Alan Cox wrote:
Ok try this Remove the 'quiet' add
irqpoll initcall_debug
The first one tries to catch and deal with hangs due to IRQ routing bugs in the BIOS etc, the second will print a trace of each function called during initialisation. It's not exciting to most people but as part of a Fedora bug report it will let the Fedora kernel maintainers see which initialiser hung the machine.
Ok, finally got a moment to try this. Current kernel line is: xdriver=vesa nomodeset ide=nodma noapic acpi=off ignore_loglevelenforcing=0 initcall_debug
That doesn't give me anything I don't already know or have seen -it looks up after discovering the main drive and CD. Adding 'irqpoll' to the line just locks up way earlier in the boot process:
.... [ 0.000000] SLUB: Genslabs=13, HWalign=32, Order=0-3,MinObjects=0, CPUs=2, Nodes=1 [ 0.000000] Hierarchical RCU implementation. [ 0.000000] - RCU dyntick-idle grace-period acceleration is enabled. [ 0.000000] - RCU-based detection of stalled CPUs is disabled. [ 0.000000] - Verbose stalled-CPUs detection is disabled. [ 0.000000] NR_IRQS:2304 [ 0.000000] Console: colour VGS+ 80x25 [ 0.000000] console [tty0] enabled [ 0.000000] spurious 8259A interrupt: IRQ7.
... and it dies. If I try to boot with *just* 'ignore_loglevel irqpoll', same thing,it dies like above.
What I don't get is how this dies with FC14 but not with CentOS 5... did they figure out something that the FC developers haven't?
A
Install with basic video? Or what ever the correct verbage is.
Ashley M. Kirchner <ashley <at> pcraft.com> writes:
...
I am looking at that dmesg posted by you - CentOS had some errors too.
We have to reset the kernel line and run it again with max verbose debugging.
Please reset kernel parameters to only the ones needed for debugging (and keep them always), remove anything else:
ignore_loglevel enforcing=0 initcall_debug
Run it.
JB
On 1/24/2011 1:43 PM, JB wrote:
I am looking at that dmesg posted by you - CentOS had some errors too. We have to reset the kernel line and run it again with max verbose debugging.
Please reset kernel parameters to only the ones needed for debugging (and keep them always), remove anything else:
ignore_loglevel enforcing=0 initcall_debug
Run it.
JB
i tried that and posted the results of that in my last e-mail. It hangs at the same spot, after detecting the main drive and CD-ROM. If I add the irqpoll, it hangs even earlier on. And while CentOS might've run into errors, at least it knew to move on and work. FC14 just hangs, hard.
Ashley M. Kirchner <ashley <at> pcraft.com> writes:
On 1/24/2011 1:43 PM, JB wrote:
I am looking at that dmesg posted by you - CentOS had some errors too. We have to reset the kernel line and run it again with max verbose debugging.
Please reset kernel parameters to only the ones needed for debugging (and keep them always), remove anything else:
ignore_loglevel enforcing=0 initcall_debug
Run it.
JB
i tried that and posted the results of that in my last e-mail. Ithangs at the same spot, after detecting the main drive and CD-ROM. If I add the irqpoll, it hangs even earlier on. And while CentOS might've run into errors, at least it knew to move on and work. FC14 just hangs, hard.
OK. I saw your first post today like below, and that's why asked to remove all unneeded parameters.
Ok, finally got a moment to try this. Current kernel line is: xdriver=vesa nomodeset ide=nodma noapic acpi=off ignore_loglevel enforcing=0 initcall_debug
I am looking at kernel parameters and Fedora kernel problems - there is so much of it that could go wrong that the head is spinning.
Let's hope that Alan finds time to come back to the thread - he is the real expert here.
I will continue looking into it as well, so stick around. Will try something.
JB
On 1/24/2011 2:06 PM, JB wrote:
I am looking at kernel parameters and Fedora kernel problems - there is so much of it that could go wrong that the head is spinning.
Let's hope that Alan finds time to come back to the thread - he is the real expert here.
I will continue looking into it as well, so stick around. Will try something.
JB
At the moment, CentOS is running with:
Kernel command line: ro root=LABEL=/ ide=nodma noapic acpi=off apm=off
I just added the apm option at the last reboot because I was seeing errors in dmesg - not that it was causing any harm, but at the same time, i couldn't care less for apm on this machine.
The one thing I'm noticing is, even though I have acpi=off, I still see this as well:
ACPI Error (tbget-0168): Invalid address flags 8 [20060707] ACPI Error (tbget-0168): Invalid address flags 8 [20060707] ACPI Error (tbget-0168): Invalid address flags 8 [20060707] ACPI Error (tbget-0168): Invalid address flags 8 [20060707] PCI: Firmware left 0000:00:03.0 e100 interrupts enabled, disabling ACPI Error (tbget-0168): Invalid address flags 8 [20060707] ACPI Error (tbget-0168): Invalid address flags 8 [20060707] ACPI Error (tbget-0168): Invalid address flags 8 [20060707] ACPI Error (tbget-0168): Invalid address flags 8 [20060707] ACPI Error (tbget-0168): Invalid address flags 8 [20060707] ACPI Error (tbget-0168): Invalid address flags 8 [20060707] PCI: Firmware left 0000:01:04.0 e100 interrupts enabled, disabling ACPI Error (tbget-0168): Invalid address flags 8 [20060707] PCI: Firmware left 0000:01:05.0 e100 interrupts enabled, disabling
This machine has one on-board e100 and an add-in card with 2 e100 on it. That's what the above errors message are referring to. If I remove the add-on card, the error message appears only once, referring to the on-board e100. I swapped the card for a brand new one and the same 2 errors re-appeared (bringing the total to 3). Card and on-board appear to be working just fine though ...
Ashley M. Kirchner <ashley <at> pcraft.com> writes:
...
Let's take a shot at hard disk problems:
keep only -> ignore_loglevel enforcing=0 initcall_debug
Keep these 2 runs separately.
1st run: add -> pci=nomsi pci=nommconf pci=nocrs
2nd run (remove 1st run added parameters): rdblacklist=ahci
JB
On 1/24/2011 2:32 PM, JB wrote:
Let's take a shot at hard disk problems: keep only -> ignore_loglevel enforcing=0 initcall_debug
Keep these 2 runs separately.
1st run: add -> pci=nomsi pci=nommconf pci=nocrs
2nd run (remove 1st run added parameters): rdblacklist=ahci
JB
Now we're starting to get into the more cryptic stuff ...
1st run:
On tty0, it gets past the 'waiting for the cows to come home' message and gives me a blue screen, asking me to test the Media. If I let it test the media, it goes nowhere. The screen also has some junk on it [1]. If I tell it to skip it, the dialog box goes away, and that's as far as it goes. When I switch to tty2 to see what's going on, I see this:
Detected stage 2 image on CD (url: cdrom:///dev/sr0:/mnt/stage2) loader: stage 2 url is cdrom:///dev/sr0:/mnt/stage2 ata3: lost interrupt (Status 0x58) ata3: drained 65536 bytes to clear DRQ. ata3.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen sr 2:0:1:0: [sr0] CDB: Read(10): 28 00 00 00 4f 40 00 00 40 00 ata3.01: cmd a0/01:00:00:00:fc/00:00:00:00:00/b0 tag 0 dma 131072 in res 40/00:02:00:0c:00/00:00:00:00:00/b0 Emask 0x4 (timeout) ata3.01: status: { DRDY } ata3: soft resetting link ata3.00: configured for MWDMA2 ata3.01: configured for UDMA/25 ata3: EH complete
This repeats 3 times, and eventually it bails with a few lines of:
Buffer I/O error on device sr0, logical block [various]
Then cycles back to the top error section.
Ok, so perhaps I have a bad disk. Perhaps the drive itself is also bad even though it ran the CentOS install just fine - I did it through CD install, and it read all 7 disks with no problems, no errors, nothing.
So, I ripped the drive out, put in a DVD drive instead. Burned a DVD install disk and powered up the machine. It gets as far as 'GRUB Loading stage2..' on screen and that's it. The drive's light is blinking non-stop and it's been sitting there for about 15 minutes now. Worse than before.
Back to the CD drive and a *new* install disk, just in case the old one was indeed fubared ... No dice, same thing as posted above. And I made sure the checksums passed.
Now on to run 2:
Same thing!
As the machine is now in production (running CentOS), it's more difficult to go through rebooting the thing every time to try and figure this out. Just saying. I'm willing to keep trying things, but I have to limit the amount of times I reboot this time every day (and eventually stop alltogether.)
Ashley M. Kirchner <ashley <at> pcraft.com> writes:
...
One more time for today :-) I incorporated a probable fix (1st run) from similar problems in Bugzilla.
Try:
keep only -> ignore_loglevel enforcing=0 initcall_debug
Keep these runs separate.
1st run: add -> pci=use_crs
2nd run (remove 1st run added parameters): add -> noapic nolapic nolapic_timer
3rd run (remove 2nd run added parameters): nohz=off highres=off
4rd run (add last parameter to 3rd run): nohz=off highres=off clocksource=acpi_pm
JB
On 1/24/2011 4:04 PM, JB wrote:
Try: keep only -> ignore_loglevel enforcing=0 initcall_debug
Keep these runs separate.
1st run: add -> pci=use_crs
2nd run (remove 1st run added parameters): add -> noapic nolapic nolapic_timer
3rd run (remove 2nd run added parameters): nohz=off highres=off
All of them produced the same result:
Gets past 'waiting for cows to come home' ... loader: Detected stage 2 image on CD loader: stage2 url is cdrom:///dev/sr0:/mnt/stage2 loader: Loading SELinux policy ... insert same error as previous e-mail, ata3 losing interrupt and what not ...
4rd run (add last parameter to 3rd run): nohz=off highres=off clocksource=acpi_pm
Now this one produced the same result, EXCEPT it's not ata3, it's ata1 now.
What I don't get is that FC14 boot up keeps saying that the main drive and CD are on ata3 ... CentOS says they are on ata1 and the add-on card is ata3.
*sigh*
Ashley M. Kirchner <ashley <at> pcraft.com> writes:
...
Read Lamar Owen's post.
You may search Google and Bugzilla for problems related to your Broadcom CNB20LE board.
There is a chance that Alan drops by and he is expert on hard disks.
Tomorrow will try some more.
JB
On 1/24/2011 4:33 PM, JB wrote:
Read Lamar Owen's post. You may search Google and Bugzilla for problems related to your Broadcom CNB20LE board.
There is a chance that Alan drops by and he is expert on hard disks.
Tomorrow will try some more.
JB
Yep, I'm going to try and boot an FC13 install disk tomorrow morning, see how that fares ...
Ashley M. Kirchner <ashley <at> pcraft.com> writes:
On 1/24/2011 4:33 PM, JB wrote:
Read Lamar Owen's post. You may search Google and Bugzilla for problems related to your Broadcom CNB20LE board.
There is a chance that Alan drops by and he is expert on hard disks.
Tomorrow will try some more.
JB
Yep, I'm going to try and boot an FC13 install disk tomorrowmorning, see how that fares ...
Check the board's BIOS date.
In Bugzilla 665109 they claim that this board can have old or incomplete BIOS. Does it seem to be outdated ? Is there any update on manufacturer's or reseller's web site ?
I would look at the BIOS settings too (sometimes their "automatic" settings work better than our manual ones).
JB
On 1/25/2011 5:41 AM, JB wrote:
Check the board's BIOS date. In Bugzilla 665109 they claim that this board can have old or incomplete BIOS. Does it seem to be outdated ? Is there any update on manufacturer's or reseller's web site ?
I would look at the BIOS settings too (sometimes their "automatic" settings work better than our manual ones).
JB
Well, that was a major pain in the you-know-what. Sheesh. Intel only provides floppy BIOS updates for this board (considering how old it is, I don't blame them.) So I had to find a a floppy drive, find a floppy, and do all the run around with that just to update the BIOS from 1.7 to 1.13 ... Changes? Not on the surface, but about to go try and boot FC13 now. Stay tuned ...
On Tue, Jan 25, 2011 at 14:26:00 -0700, "Ashley M. Kirchner" ashley@pcraft.com wrote:
Well, that was a major pain in the you-know-what. Sheesh. Intelonly provides floppy BIOS updates for this board (considering how old it is, I don't blame them.) So I had to find a a floppy drive, find a floppy, and do all the run around with that just to update the BIOS from 1.7 to 1.13 ... Changes? Not on the surface, but about to go try and boot FC13 now. Stay tuned ...
It's possible to boot floppy images off a disk drive for some of these old boards. biosdisk is one tool to help with this.
Well, with the confetti guns at the ready, I tried FC13, no dice. Boot options were:
ide=nodma noapic acpi=off ignore_loglevel initcall_debug
It quit at the same point it has been lately, which is giving me garbage on screen like my image posted yesterday, and ata3 times out, same error as yesterday. By now I have tried:
FC14 & FC13 First CD install FC13 First CD install FC14 CD netinst FC14 & FC13 DVD install FC14 & FC13 DVD Live CD-Drive (at least 3 different ones) DVD-Drive (two different ones) Different IDE cables
Nothing, it seems stuck at either 'waiting for the cows to come home' or it goes past it but then fails with ata3 timeouts which eventually bombs. I'm not willing to continue trying older versions.
So, I'm giving up. CentOS boot disk worked, install worked, the system is up and running and stable, so far. It will remain like that till the day the hardware fails completely and I push the thing off of the back dock.
Thank you everyone who tried helping. While there's been no solution to the problem, I'm glad for the help and learned that things don't always work. And when they don't, move on to something that will. In this case, CentOS won the battle. Oh, and that sharp piece of metal that left a nice gash in the palm of my hand while swapping drives for the umpteen times. The machine can now claim to have my DNA on it.
Tomorrow is another day, and possibly another battle. Hopefully one with a much better outcome.
Ashley
Ashley M. Kirchner <ashley <at> pcraft.com> writes:
Well, with the confetti guns at the ready, I tried FC13, no dice.Boot options were:
ide=nodma noapic acpi=off ignore_loglevel initcall_debug...
Do not worry, be happy. You are a brave girl - the way you read all that extended output proves that your are a pro :-)
I am afraid you have to give it a shot or two more.
Firstly, the reason you updated BIOS was to potentially fix ACPI as well. But you tried F13 with acpi=off kernel parameter ... So, back to a drawing board :-)
Once again, remove all parameters, except debugging-output: ignore_loglevel initcall_debug Run it.
Secondly, as I asked you before, take a look at BIOS. Just for a kick, every menu (there may be some new stuff as well due to update), do not try to change anything, just get a sense of it all. Then consider if restoring all defaults would be an option, or selecting automatic (where available), or giving up any unnecessary/fancy manual option. Run it. As above.
Thirdly, stick around the thread for many days (even weeks) - there is a good chance somebody will have time (like Lamar next week) and come up with a good idea. Or F15 devs will deliver new code that will fix these things in a few months. Do not expect wonders - yes, some of these guys are true pusycats, but these devs are heroes as well - they do not have access to specs, have to deal with proprietary code (like in your case) - but look Ma, they come up with a working software, again and again.
JB
-----Original Message----- From: users-bounces@lists.fedoraproject.org [mailto:users- bounces@lists.fedoraproject.org] On Behalf Of JB Sent: Tuesday, January 25, 2011 4:43 PM To: users@lists.fedoraproject.org Subject: Re: [Fedora] Re: FC14 Installation Hangs
Do not worry, be happy. You are a brave girl - the way you read all that extended output proves
that your are a pro :-)
I am afraid you have to give it a shot or two more.
Firstly, the reason you updated BIOS was to potentially fix ACPI as well. But you tried F13 with acpi=off kernel parameter ... So, back to a drawing board :-)
Once again, remove all parameters, except debugging-output: ignore_loglevel initcall_debug Run it.
Secondly, as I asked you before, take a look at BIOS. Just for a kick,
every
menu (there may be some new stuff as well due to update), do not try to change anything, just get a sense of it all. Then consider if restoring all defaults would be an option, or selecting automatic (where available), or giving up any unnecessary/fancy manual option. Run it. As above.
Thirdly, stick around the thread for many days (even weeks) - there is a
good
chance somebody will have time (like Lamar next week) and come up with a good idea. Or F15 devs will deliver new code that will fix these things in a few
months.
Do not expect wonders - yes, some of these guys are true pusycats, but these devs are heroes as well - they do not have access to specs, have to deal with proprietary code (like in your case) - but look Ma, they come up with a working software, again and again.
Restoring the BIOS to default settings is something the update does by default. In fact, it completely clears the CMOS, updates the BIOS and upon reboot a message pops up saying the CMOS isn't set and it's reverting to default values. The only thing I changed after that was to set the power failure option to 'power on' when AC is restored. Everything else is at default. That was one of the things I tried early on too, just to make sure it wasn't me that messed something up.
And I also did just boot up, with no parameters at all, after the update. Then slowly started adding stuff ... The acpi=off was one of the first parameters I added after the first boot failed. By now I've seen so many different iterations of the lock up, I couldn't tell you where exactly it locked up.
There are other hardware quirks that I've discovered throughout all of this. For example, if I were to disable the on-board SCSI bus, it pegs the HDD light to on at all times. No clue why. Leaving the SCSI bus at the default 'enabled' state, the HDD light works as expected. I'd rather disable it since it's not being used at all but if I do that, someone else will inevitably call me at 3 in the morning just to tell me the machine is overloaded because the HDD light is pegged on. Not a phone call I'm willing to take and he or she who called will not want to face me the next morning. Floppy drive? What floppy drive? By default that's turned on in BIOS, as is the bus itself (yes, this board allows you to disable one or both) ... disabling the floppy is a two-step process: disable it on the main screen, exit out of it, go back in just to see it enabled again, select disable again and now it sticks.
So you see, I know the motherboard has issues, issues I had hoped would eventually get fixed through BIOS updates. I gave Intel the benefit of the doubt and upgraded from 1.1 to 1.3, then 1.5, then 1.7 when I stopped. And then today to 1.13 ... the quirks are still there (and they know about them too because I have a rather lengthy thread from them about these problems.)
With the machine now in full production, and having "settled" if you will, I'm more inclined to just say 'To hell with it.' And move on. I have other servers to tend to - like a second RH7.3, also from the same era, but completely different hardware. All in all, I have 9 servers that need an upgrade, some more urgent than others. This was the first one, and was supposed to take all of about 4 hours, not 4 days. :)
Now as for sticking around, that I will. As you pointed out, there's a possibility that someone else might have a completely different take on the problem and suggest a different approach, like Lamar.
Ash
Ashley M. Kirchner <ashley <at> pcraft.com> writes:
...
You are much happier and reasonable this time - throwing books at Fedora, however deserved and fun, will not change anything :-)
Please file a new bug report with Bugzilla as an F14 kernel problem. If you need help, tell us. https://bugzilla.redhat.com/
Supply them with a short description of a problem.
Supply board's BIOS date, so kernel maintainers know what applies in their code.
Attach to the report the CentOS dmesg and lspci output files. Add one more output file 'lspci -vv'.
Refer to this thread in report's additional info section like this: Fedora users list - thread with test results http://lists.fedoraproject.org/pipermail/users/2011-January/391177.html
You may want to offer to share thread excerpts (if not confidential) between you and Intel about board and BIOS issues - it will help kernel devs as well.
You will be notified by e-mail of any activity in that report. Be ready to respond to them, with comments or data. The kernel maintainers may ask you to provide them with some info (perhaps digital photos of the debugging output, etc).
This will serve as a blueprint for upgrade process of your other servers :-)
When you file the report, let us know (here in this thread) its Bugzilla # so we can follow it as well.
JB
On Wed, 26 Jan 2011 09:50:42 +0000 (UTC) JB jb.1234abcd@gmail.com wrote:
Ashley M. Kirchner <ashley <at> pcraft.com> writes:
...
You are much happier and reasonable this time - throwing books at Fedora, however deserved and fun, will not change anything :-)
I agree with that :)
I have added ignore_loglevel enforcing=0 initcall_debug
to upgrade kernel
title Upgrade to Fedora 14 (Laughlin) kernel /upgrade/vmlinuz preupgrade ignore_loglevel enforcing=0 initcall_debug repo=hd::/var/cache/yum/preupgrade ks=hd:UUID=c46f0b8e-ea34-47f5-8c7f-57a7f3a5d531:/upgrade/ks.cfg stage2=http://fedora.mirror.garr.it/mirrors/fedora/linux/releases/14/Fedora/i386/os... initrd /upgrade/initrd.img
but the stream is jet very very fast to detect anything
i was able to see in one microsecond that one of the last lines contains 43xx and using dmesg in f13 where i am now i see:
Broadcom 43xx driver loaded [ Features: PMLS, Firmware-ID: FW13 ] cfg80211: Calling CRDA for country: IT cfg80211: Regulatory domain changed to country: IT (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm) (5170000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm) (5250000 KHz - 5330000 KHz @ 40000 KHz), (N/A, 2000 mBm) (5490000 KHz - 5710000 KHz @ 40000 KHz), (N/A, 2700 mBm) EXT3-fs (dm-0): using internal journal
then maybe it is an issue with sata drive.... => EXT3-fs (dm-0): using internal journal
I would delay the stream of lines before waiting for hardware to initialize
or better it would be to avoiud the clear screen, can fedora ppl avoid this clear screen?
-- maurizio
On Tuesday, January 25, 2011 06:43:27 pm JB wrote:
Thirdly, stick around the thread for many days (even weeks) - there is a good chance somebody will have time (like Lamar next week) and come up with a good idea.
Given what I've seen of Ashley's symptoms, it may be more BIOS related than chipset related, and I think the only OSB4 chipset board with a PC-style BIOS I have is the FORCE CPCI one; I checked my other boxen that are currently running, and the only other ServerWorks board I have up right now is a Dell 1600SC server, and it's an OSB5.
I think I have four boxen with dual PIII's that probably have OSB4's, but they are oddballs that boot into Sun-style OpenPROM x86 (Network Appliance NetCaches; would love to get a Linux on them, but, AFAIK (and I'd love to be shown that I'm wrong), only the SPARC Linux distributions can boot from OpenPROM (used on virtually all Sun SPARC boxen) and don't have a PC-style BIOS). And even then if it's a BIOS issue that won't help any.
As the ancient troubleshooting axiom goes 'to fix a problem you must find the problem.'
[Note that there is quite a list of things below to look at, but do look at the bugzilla entry at the very bottom and try that, possibly even first.]
On Monday, January 24, 2011 02:09:52 pm Ashley M. Kirchner wrote:
What I don't get is how this dies with FC14 but not with CentOS 5... did they figure out something that the FC developers haven't?
No. The Fedora kernel is much newer than the C5 kernel, in terms of kernel version and IDE/ATA driver stack. CentOS 5, and Red Hat Enterprise Linux 5, are somewhat akin to Fedora *6* in terms of the versioning of the kernel. This does not mean security fixes since that kernel version have not been applied; they have been backported by Red Hat. What it does mean is that 17 versions of the 2.6 kernel (half of the versions to date) have passed, and the IDE/ATA drive handling has gone from the older IDE/ATA driver stack to the new libata driver stack, which makes the IDE/ATA drives be handled in the SCSI layer (and thus they become /dev/sdX# instead of /dev/hdX#).
I find in the file "/usr/share/doc/kernel-doc-2.6.35.10/Documentation/i2c/busses/i2c-piix4"
the statements: +++++++++++++++ If you own Force CPCI735 motherboard or other OSB4 based systems you may need to change the SMBus Interrupt Select register so the SMBus controller uses the SMI mode.
1) Use lspci command and locate the PCI device with the SMBus controller: 00:0f.0 ISA bridge: ServerWorks OSB4 South Bridge (rev 4f) The line may vary for different chipsets. Please consult the driver source for all possible PCI ids (and lspci -n to match them). Lets assume the device is located at 00:0f.0. 2) Now you just need to change the value in 0xD2 register. Get it first with command: lspci -xxx -s 00:0f.0 If the value is 0x3 then you need to change it to 0x1 setpci -s 00:0f.0 d2.b=1
Please note that you don't need to do that in all cases, just when the SMBus is not working properly. ++++++++++++++++ Don't know if you're hitting this or not. Although I think I actually have one of those Force Computers CompactPCI boards; I'll have to check, if so I can test this there, at some point (not this week; too busy).
Your dmesg shows you do, in fact, have an OSB4-based system, so this might be a part of the problem. Check to see if RHEL6 support is available for this system; if so, there might be a workaround for RHEL6 that might apply to the kernel in F14.
Hmmm, looking closer at you C5 dmesg, I find: [snip] type=1404 audit(1295620090.833:2): selinux=0 auid=4294967295 ses=4294967295 hdb: ATAPI 48X CD-ROM drive, 128kB Cache, (U)DMA Uniform CD-ROM driver Revision: 3.20 piix4_smbus 0000:00:0f.0: Found 0000:00:0f.0 device 3c59x: Donald Becker and others. www.scyld.com/network/vortex.html [snip]
Yeah, the line right after the CD-ROM line, which is where F14 is hanging, is a callout for the piix4_smbus init. Doesn't mean that's the culprit....
Another possibility is tracked in the thread, and kernel bugzilla, started here: http://kerneltrap.org/mailarchive/linux-kernel/2010/8/13/4606665
Can you see if F13 will boot up? F14 shipped with 2.6.35.6; F13 with 2.6.33.3.
You might be hitting a variant of https://bugzilla.redhat.com/show_bug.cgi?id=665109
There is some specific advice in that Bugzilla entry to try. It seems some of these machines with this chipset actually have ACPI, but it's not exactly 'all there' and a workaround has to be used.