"panic: unable to mount root" => need sleep after loading SATA driver in initrd! Should udevstart wait?

Stuart Levy slevy at ncsa.uiuc.edu
Wed Sep 14 19:54:39 UTC 2005


On several of our SATA+Opteron-based systems we've been unable to
run more recent Fedora Core 3 packaged kernels -- 2.6.10, 2.6.11.
Other kinds of problems show up too, but I'm writing about one
that I think I understand a bit now.

The initrd phase loads a bunch of modules -- scsi_mod, sd_mod,
then either sata_sil (for Tyan K8W/2885 motherboards with
Silicon Image SATA) or sata_nv (for Tyan K8WE/2895's with nForce SATA),
and (maybe irrelevant to the problem) 3w-9xxx.
Then it probes for a partition with label "/".

On most of our systems this step reliably fails, so I've been running
stock kernel.org kernels instead (e.g. 2.6.11.11), generally with success.

On our new Tyan 2895 motherboard, 2.6.11.11 booted, but the latest
stock kernel, 2.6.13.1, fails with the same problem: it couldn't find LABEL=/.
It also failed with e.g. "root=0802".

It appears that the sata_nv module can finish loading *before* it
finishes scanning for attached SATA devices.  So there's a race:
which happens first, udevstart (which I assume does the label scan)
or the completion of the SATA scan?  It appears that the SATA scan
loses, so detecting the "/" partition, or even the existence of the
/dev/sda device, hasn't happened by the time it's needed.

I wish there were a "don't finish initializing until scan is complete"
kind of option for the various SATA driver modules...

I worked around this by unpacking the initrd image, adding a "sleep"
(which command is fortunately is built into nash), and repackaging.

Recipe:

	mkdir /tmp/scrap
	cd /tmp/scrap

	zcat /boot/initrd-<whatever>.img | cpio -idvm
	edit "init", adding "sleep 4" after the load of the last SCSI module
	find * -print | cpio -oc | gzip -9 > /boot/initrd-<whatever>.img

But of course this means if I do a "make install" from a kernel
source tree, or pick up a fresh Fedora kernel, it'll become
unbootable again.

I see that the mkinitrd script can insert a "sleep"
in a couple of conditions -- after loading usb-storage and zfcp,
whatever that is.  

mkinitrd also contains this interesting snippet:

    # HACK: module loading + device creation isn't necessarily synchronous...
    # this will make sure that we have all of our devices before trying
    # things like RAID or LVM
    if [ -n "$USE_UDEV" ]; then
      echo "/sbin/udevstart" >> $RCFILE
    fi

That sounds like my problem.  But it's apparently not solving it.
udevstart's man page says it scans sysfs for devices,
but despite the above comment, it probably doesn't know to wait
for them to appear, right?

   Stuart Levy, slevy at ncsa.uiuc.edu

[I tried posting this yesterday too, but that was just before I joined
fedora-list -- guess it doesn't take postings from non-members.]




More information about the users mailing list