Thanks to some debugging statements, I've tracked down the source of the /sbin/loader crash I was experiencing that was hurting Mazu's installation process.
The bug is in loader2/cdinstall.c's use of Kudzu's probeDevices:
devices = probeDevices(CLASS_CDROM, BUS_UNSPEC, 0); if (!devices) { logMessage(ERROR, "No CDROM devices found!"); return 1; }
Unfortunately, Kudzu does not return a null pointer in this case. Here is the official documentation (out of /usr/include/kudzu/kudzu.h) for what is actually returned:
/* Probe for devices of the specified class, on the specified bus, * with the specified class. Returns a NULL-terminated array of * device (or subclass) pointers */ struct device ** probeDevices (enum deviceClass probeClass, enum deviceBus probeBus, int probeFlags );
A NULL-terminated array of zero elements is not a NULL pointer.
So, first of all, if someone can tell me how I can go about submitting a patch for the bug in cdinstall.c, I would appreciate knowing what to do.
Second of all, it appears that Kudzu 1.2.24.3, (as distributed in the FC 5 RPM kudzu-devel-1.2.34.3-1), is not detecting the CD drive on the IBM BladeCenter I've been testing on.
This is an IBM Machine Type 8677, model # 2XX. (8677-2XX), manufactured in April of 2003.
I'll write directly to Bill Notting, the current Kudzu maintainer, and see if I can help make that probing work in Kudzu.
Thanks to Dan Carpenter for his helpful reply when I wrote to this list initially about debugging /sbin/loader.
Steven
On Mon, 2006-05-08 at 20:33 -0400, Steven Augart wrote:
Thanks to some debugging statements, I've tracked down the source of the /sbin/loader crash I was experiencing that was hurting Mazu's installation process.
[snip]
Unfortunately, Kudzu does not return a null pointer in this case. Here is the official documentation (out of /usr/include/kudzu/kudzu.h) for what is actually returned:
A NULL pointer is returned in a few cases where probing fails, so checking for NULL here is entirely appropriate and correct. See the inner loop which iterates over the elements of the array
So, first of all, if someone can tell me how I can go about submitting a patch for the bug in cdinstall.c, I would appreciate knowing what to do.
In general, bug fixes can either be submitted as patches here or via bugzilla. The more complicated the change, the more helpful it is to send here for discussion.
Jeremy
Steven Augart (saugart@mazunetworks.com) said:
Second of all, it appears that Kudzu 1.2.24.3, (as distributed in the FC 5 RPM kudzu-devel-1.2.34.3-1), is not detecting the CD drive on the IBM BladeCenter I've been testing on.
This is an IBM Machine Type 8677, model # 2XX. (8677-2XX), manufactured in April of 2003.
Is USB working correctly? Does it show up in the USB device lists in /proc or /sys in the installer?
Bill
On Mon, 2006-05-08 at 22:33 -0400, Bill Nottingham wrote:
Steven Augart (saugart@mazunetworks.com) said:
Second of all, it appears that Kudzu 1.2.24.3, (as distributed in the FC 5 RPM kudzu-devel-1.2.34.3-1), is not detecting the CD drive on the IBM BladeCenter I've been testing on.
This is an IBM Machine Type 8677, model # 2XX. (8677-2XX), manufactured in April of 2003.
Is USB working correctly? Does it show up in the USB device lists in /proc or /sys in the installer?
Dear Bill,
Thanks for the quick reply! I'll build a new installation CD with a modified Anaconda that doesn't crash on the failure to detect, and then I'll be able to get a shell (or at least anaconda-busybox) and I can tell you what's happening. (I'll have access to the machine again in the morning; it's midnight here in Massachusetts.)
I can also write some test code that will exercise the Kudzu CVS head on those hosts. We have some blades that are running FC 4, and I can see if the Kudzu CVS head can detect the CD under our variant of the FC 4 stock kernel, with all modules loaded. The security appliance products we embed Linux into don't normally use the CD ROM drive, so access to it may be broken on those machines except when booting off of that CD drive; I'll find out in the morning.
On Mon, 2006-05-08 at 22:33 -0400, Bill Nottingham wrote:
Steven Augart (saugart@mazunetworks.com) said:
Second of all, it appears that Kudzu 1.2.24.3, (as distributed in the FC 5 RPM kudzu-devel-1.2.34.3-1), is not detecting the CD drive on the IBM BladeCenter I've been testing on.
This is an IBM Machine Type 8677, model # 2XX. (8677-2XX), manufactured in April of 2003.
Is USB working correctly? Does it show up in the USB device lists in /proc or /sys in the installer?
Dear Bill:
I still haven't managed to get a shell out of /sbin/loader when it crashes, despite some hacking. However, it does look as if USB is working correctly. Here is more information that will probably answer your question. Please let me know if you want me to put more effort into hacking so that I can get a prompt where "ls" and "cat" will work at this early stage in installation. As matters stand, I don't get a shell until Python/Anaconda is running, after Kudzu has been weird, and I will explain why I think USB is ok at that point (at least):
As a reminder, the phase where we probe is right after /sbin/loader prints "getting kickstart file from first CDROM".
This particular machine, as you probably know, has a USB CD-ROM drive.
I get an INFO-level message from /sbin/loader saying: "inserted /tmp/usb-storage.ko"
The kernel log messages include the following, most of which I think will be relevant. (the ones before here are scrolled off the top of the screen):
<6>USB Mass Storage support registered. <5> Vendor: TEAC Model: CD-224E Rev: 2.9B <5> Type: CD-ROM ANSI SCSI revision: 00 <5> Vendor: TEAC Model: FD-05PUB Rev: 2000 <5> Type: Direct-Access ANSI SCSI revision: 00 <5>sd 0:0:0:0: Attached scsi removable disk sda <7>usb-storage: device scan complete <4>sr0: scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda tray <6>Uniform CD-ROM driver Revision: 3.20 <7>sr 1:0:0:0: Attached scsi CD-ROM sr0 <7>usb-storage: device scan complete
After those messages, the kernel starts to chat about the tg3.c:v3.49 driver (for the Ethernet controller on the BladeCenter), but it certainly appears that USB is installed.
Via debugging code, I've seen that at this point, probeDevices is actually returning a one element device array with at least one pointer to a "device" structure, but that device structure's "device" (device pathname) field is a NULL pointer.
Here is what I've done to anaconda/loader2/cdinstall.c to get the debugging info I just described:
/* * Return 0 if we get the kickstart file from the CDROM. Nonzero if we fail * to do so. */ int kickstartFromCD(char *kssrc, int flags) { int rc; char *p, *kspath; struct device ** devices;
logMessage(INFO, "getting kickstart file from first CDROM");
devices = probeDevices(CLASS_CDROM, BUS_UNSPEC, 0); if (!devices) { logMessage(ERROR, "No CDROM devices found" " (probeDevices returned NULL)!"); return 1; }
/* format is ks=cdrom:[/path/to/ks.cfg] */ kspath = ""; p = strchr(kssrc, ':'); if (p) kspath = p + 1;
if (!p || strlen(kspath) < 1) kspath = "/ks.cfg";
struct device *devp = devices[0]; if (!devp) { logMessage(ERROR, "No CDROM devices found" " (probeDevices returned a zero-element array)!"); return 1; } char *devname = NULL; for (struct device **devpp = devices; *devpp; ++devpp) { char *devname = (*devpp)->device; if (!devname ) { newtWinMessage(_("Error"), _("Try more entries"), _("Internal error: probeDevices returned" " a non-existent device name for CD ROM Devices." " This is probably a bug in Kudzu." " Trying to recover with any following entry.")); continue; } } if (!devname) { newtWinMessage(_("Error Wackiness"), _("I Surrender"), _("probeDevices() returned" " no useful device names for CD ROM Devices." " This is probably a bug in Kudzu.")); return 1; }
Interestingly enough, perhaps, with these hacks I do now go into Python Anaconda (without a kickstart file available, of course). And at this point, the CD is in fact successfully mounted, off of /tmp/cdrom, which is block device major=11, minor=0.
On tty4, there are some kernel messages that seem to indicate that the system is attempting to treat the first SCSI device as a CDROM drive, even though that appears to be a way to get to the floppy instead:
<6>device-mapper: 4.5.0-ioctl .....
Then, the following block of 4 lines appears (call it "Block 4"):
<5>SCSI device sda: 2880 512-byte hdwr sectors (1MB) <5>sda: Write Protect is off <7>sda: Mode Sense: 00 46 94 00 <3>sda: assuming drive cache: write through
Followed by a duplicate appearance of Block 4, Followed by the intriguing line:
<6> sda: unknown partition table
And then: <6>SELinux: initialized (dev sda, type vfat), uses genfs_contexts
Followed by two more repetitions of block 4.
Followed by:
<6> sda: unknown partition table <4>VFS: Can't find an ext2 filesystem on dev sda <4>Unable to identify CD-ROM format, <3>cramfs: wrong magic <6>SELinux: initialized (dev sda, type vfat), uses genfs_contexts
At this point, there is now a shell running on tty2.
In /proc/devices, I see:
In that shell, I can explore /sys/bus/usb/devices, which has seven entries, four of "00" and three of "09".
I've attached the contents of /proc/bus/usb/devices. They seem reasonable to me.
Please let me know what it would be appropriate to look for in that directory that would help to debug the problem.
Thank you again for the help and advice. I look forward to hearing from you.
--Steven
Steven Augart (saugart@mazunetworks.com) said:
at this early stage in installation. As matters stand, I don't get a shell until Python/Anaconda is running, after Kudzu has been weird, and I will explain why I think USB is ok at that point (at least):
Right, the shell doesn't start until the second stage installer. Actually, if you get to that point on a CD install, then the CD *is* working...
So, looking at the code briefly, I suspect that kickstartfromCD needs to iterate over the devices (and check for NULL.) Or not probe the USB bus. (in kudzu parlance, USB CDs are SCSI.)
Try the *completely untested* patch attached. It may need some fudging.
Bill
[ I sent the following message to Bill and to anaconda-devel-list. Unfortunately, the attachments made it longer than 40 KB, and th "mailman" program held back the message.
Rather than spam all of your mailboxes with attachments which may only be of interest to some, I've put them up on the web. The attachments referred to in the body of the text can be found at:
http://augart.com/Linux/Anaconda-2006-May/anaconda-11.0.5-1-DebugHack-3mazu.... http://augart.com/Linux/Anaconda-2006-May/anaconda-11.0.5-handleKernelModule... http://augart.com/Linux/Anaconda-2006-May/mazu-anaconda.spec ]
On Tue, 2006-05-09 at 22:24 -0400, Bill Nottingham wrote:
So, looking at the code briefly, I suspect that kickstartfromCD needs to iterate over the devices (and check for NULL.) Or not probe the USB bus. (in kudzu parlance, USB CDs are SCSI.)
Try the *completely untested* patch attached. It may need some fudging.
Dear Bill,
Thanks for the suggestion. Your patch is essentially equivalent to the one that I used to get to the Python stage of the installer; in both cases, we detect these NULL values and have kickstartFromCD warn the user and return the failure value of 1.
Unfortunately, once I'm in the Python stage, that stage crashed and offered me an anacdump.txt file. I've attached it so that others who are CC'd on this correspondence can make use of it; I doubt it has much to do with the Kudzu issue that you and I are attempting to address and fix.
I've also attached the complete patch that I have successfully used to get past the crash in /sbin/loader on Anaconda, since others on anaconda-devel-list may encounter the same problem. The patch is against Anaconda 11.0.5 instead of against the CVS head, since I was not able to get the CVS head to build disks; there must have been internal changes that broke the scripts I've been using (a variant of Dan Carpenter's scripts).
When you have a chance to look at the USB devices list dump I sent you, please let me know if I can provide more helpful information. For Mazu's internal purposes (of getting an installer to work on the blade center), I can probably add some kludges that will detect the CD ROM and will use mknod() to create the appropriate block device in /tmp so that we can read the ks.cfg file off of the CD. However, I'd certainly prefer to make the changes be more generally useful to other users of libkudzu.
At this point, the major blocker for our using FC5's Anaconda as the installer for our Linux-based network security appliance is the crash in the second-stage Python code.
Steven
anaconda-devel@lists.fedoraproject.org