Grub

Sun Aug 19 01:24:48 UTC 2007

Back in the times when I was introducing myself to the PC technology for the 
first time, I stumbled upon a so called "Hardware Book" (don't have it any 
more, so cannot give a reference, sorry...). Looking at the contents, my 
attention was drawn to the question "what happens when you turn on the 
computer?", and while answering in my head "OS gets loaded, of course" I 
turned the pages to see what the author had to say about it. And to my 
surprise, there the question was repeated, more specific: "what happens 
between the points of pushing the power button and the point when OS starts 
to load?" ;-)

I was amazed !! The sequence of actions was explained in a way that did not 
need any too technical background knowledge, while at the same time it was 
extremely detailed. It started off along the lines of "voltage on the 
circuitry rises, and the oscillator starts ticking. The first few ticks 
activate a circuit that sends a reset signal to the processor...".

The moral of the story: it was way more complicated than I thought, while the 
author was stating that the answer is somewhat oversimplified...

On Saturday 18 August 2007 19:27, Karl Larsen wrote:
>     All computers have a bios that does many things.

True. But beware of what the bios exactly is!! The screen that you get to see 
when pressing the del button is called "bios setup", which is a user 
interface between a human and the bios, not the bios itself. The bios is a 
set of routines (programs) that are stored in ROM (ie. on a chip) somewhere 
on the motherboard. These routines include the information on how to write 
letters to the screen, how to read or write the data on the disk, how to 
communicate with serial controller (for example), how to allocate memory, 
etc. In essence, it is an *operating system*, a rudimentary one, but 
conceptually much like the Linux or Windows kernels. It serves the same 
purpose, if you do not have a better one loaded in RAM memory.

>     The one thing I'm 
> interested in today is it's assignment of hard drives that are ide
> devices. On my bios it reads the hard drive and finds out if it is set
> as a Master or a Slave. It puts them in it's list in that way.

Not quite. The bios does not "read the hard drive". It asks the disk 
controller to provide the data about the disk parameters. That data is stored 
on a chip on disks firmware, and the controller provides it. No data on the 
actual disk gets accessed. Yet.

>     Once it has the hard drives assigned the bios is done.

Wrong. Among other things, the bios decides from what device is the OS to boot 
(that is defined in the bootup sequence of the setup), and then needs to 
*read* the Master Boot Record (the very first 512 bytes of that device), and 
execute the code that should be there. Note that this is the first point 
where bios has to actually access the data on the disk. And that is the very 
beginning of the disk. And only 512 bytes. And that data is the primary 
bootloader. Grub, for example, or Windows bootloader, or whatever...

>     Now in Linux 
> there is the new Grub which is pretty simple. It stores all it's stuff
> at /boot/grub/ and it puts the file grub.conf in /etc/. It needs to put
> some information in the Master Boot Record part of a hard drive so it
> can read that information and boot the proper system.

Just to clarify a couple of points.

First, the "information" found in MBR, as I explained above, is grub itself. 
The executable program. A set of instructions that the processor is to 
execute in order to load the OS. A set of instructions that is AT MOST 512 
bytes long. It is a very *small* program. And being small, it is not very 
bright itself --- it depends on the bios routines. Remember, the bios is the 
only operating system at this point. Grub is a program executed under this 
OS, and depends on it, in the same way firefox or evolution or whatever 
depends on the Linux kernel.

Second, the system is not yet aware that there is a filesystem on the disk. 
The notions of /boot/grub/ and /etc/ are written somewhere on the disk, but 
there is no kernel to provide access to them. The only thing that *can* read 
the disk is the *bios*, and in a very clumsy way --- the program that wants 
the data needs to specify the hardware position where the data is written 
down, and ask the bios to instruct the controller to move the heads to the 
appropriate position and read off the data. When grub wants to read its 
configuration data from /etc/grub.conf, that is precisely what it needs to 
do. ASK THE BIOS TO READ IT. And based on that information, a kernel 
executable should be loaded (and executed). The kernel executable is a file 
called vmlinuz-something, residing in /boot/. So how does grub read the 
kernel file? ASK THE BIOS TO READ IT.

And now we get to the point. Some bioses do not read past the 1024 cylinder. 
If the kernel file is beyond that point, bios fails to read it. So grub fails 
also. And the computer does not boot.

Note that this has nothing to do with the number 160 GB that is displayed in 
the setup menus. That number is just what disk controller reports to the bios 
when asked for the size. It has nothing to do with bios being (un)able to 
read the disk of that size.

So, in order to make sure that the kernel file is below the 1024 cylinder, the 
user (the human user) must put it there. That amounts to setting up a 
separate /boot partition during the installation of Fedora. That partition 
must be on the beginning of the drive, and must be small enough in order to 
fit below the 1024 cylinders. If that is the case, all kernels will 
afterwards be written there, and the bios can access them during bootup.

However, if the human user thinks he is smart enough, and does not create 
the /boot partition, but only one huge / partition, the place where the 
kernel will be subsequently written down is a complete gamble. It *may* be 
below 1024 cylinder, but it also *may not*. When the OS gets updated, newer 
kernel files are written down to disk, *somewhere*, and there is no guarantee 
that this "somewhere" is below the famous cylinder limit. So the next time 
system boots, grub tries to load the newest kernel, bios hits the wall, grub 
fails, and the computer won't boot.

What the user sees, is that after a yum update grub fails on reboot. 
Naturally, the user blames the update and grub, while the actual culprits are 
the hardware limit of an old bios, and pebkac, playing smart with partitions.

Two last notes to clarify --- if the bios succeeds to load the kernel, it gets 
executed. But the kernel is much larger than 512 bytes, and there is space 
for the code of the "driver", a program that can talk directly to the disk 
controller, and read/write files to the disk **without** asking bios to do it 
for him. That's why once the kernel is loaded, it is irrelevant whether or 
not the bios can access beyond 1024 cylinder. The kernel sees the whole 
drive, irrespective of the bios. And the other note --- this whole story is 
(or should be, afaik) completely the same whether one uses grub, lilo, 
ntloader (ie. windows), Fedora, SuSE, *buntu, Windows, *BSD, MS DOS, or 
whatever. The process of booting is the same, for all PC hardware, whatever 
the software is to be booted. While of course it may be different for Mac, 
Commodore, Sparc, Amiga, Atari, and other non-PC architectures.

I hope that it is now a bit more clear as to what is going on between bios, 
grub, kernel and the disk, during the boot process. ;-)

Best regards, :-)
Marko

Marko Vojinovic
Institute of Physics
University of Belgrade
======================
e-mail: vmarko at phy.bg.ac.yu