Fedora 15, new and exciting plans

Alasdair G Kergon agk at redhat.com
Mon Nov 15 16:37:23 UTC 2010


On Sun, Nov 14, 2010 at 10:15:20AM +0000, Richard W.M. Jones wrote:
> On Sun, Nov 14, 2010 at 01:14:18AM +0100, Lennart Poettering wrote:
> > LVM actually slows down boot considerably. Not primarily because its
> > code was slow or anything, but simply because it isn't really written in
> > the way that things are expected to work these days. 

Almost - the delays are not any fundamental part of LVM2 itself, but
come about from the simplistic way it is called by scripts during
startup.

"Wait a long time for things to settle, then see what storage we can
assemble and hope we can see all the bits we need."

If instead you defined rules about what to activate and when, you could
switch to an event-driven mechanism hooked into udev:
"When you see the whole of the Volume Group called rootvg, activate it."

I once had a discussion about how we would incorporate this into upstart,
but it was never a priority and I don't think anything happened about
it.

> > The LVM assembly at
> > boot is expected to be run at a time where all disks have been found by
> > the kernel and identified. 

(Again, that's just a constraint imposed by the current scripts, not
anything fundamental to LVM.)

> > The right way how to implement a logic like this is to wait
> > exactly until all disks actually *needed* have shown up and at that time
> > assemble LVM. 

Absolutely!  (On a VG-by-VG basis.)

> > Currently, to make LVM work, we however try to wait until
> > *everything* thinkable is enumerated, not only the disks that are
> > actually needed. 

A convenience - it makes the scripts simpler - but not a necessity.
(And lvm2 can be - and is being slowly - improved to make it easier for
such scripts.)

> I'd really like to hear from an LVM expert or two about this, because
> I can't believe that it's impossible to make this work better for the
> common single-disk-is-boot-disk single-PV case.  The LVM metadata
> (which I've written code to read and decode in the past) contains the
> information needed.

It's just about differing priorities not helped by responsibilities
being split between developers of different packages.

An example of the way I see it working is like this:
Say you have a Volume Group VG1 across two PVs, PV1 and PV2, containing Logical
Volume LV1 containing the root filesystem.
You have a trigger rule saying "When you see the whole of VG1, activate LV1
inside it" and another saying "When you see the filesystem with UUID X, mount
it."  (Default rules can be generic of course, like 'activate any VG when you
see it' and 'activate any LV when you see it' and 'mount any filesystem when
you see it'.)

   Device containing PV1 appears on system.
   Kernel sends uevent.
   udev rules ask each storage subsystem in turn "Is this yours?"
   - lvm2 subsystem spots the PV signature and claims ownership of it.  It
     caches the LVM metadata from it.  
     - Rule check performed but no rules match.

   Device containing PV2 appears on system.
   Kernel sends uevent.
   udev rules ask each storage subsystem in turn "Is this yours?"
   - lvm2 subsystem spots the PV signature and claims ownership of it.  It caches the LVM
     metadata from it.  
     - The rule to activate LV1 when the whole of VG1 is seen is triggered.

   LV1 is activated.
   Kernel sends uevent for arrival of the new VG1-LV1 block device on the system.
   udev rules ask each storage subsystem in turn "Is this yours?"
   - lvm2 subsystem finds no signature and answers "No."
   - filesystem subsystem spots the filesystem signature and claims ownership.
     - The rule to mount the (root) filesystem is triggered.

In summary:
  Each block device on the system has (at most) one owning subsystem
  recorded in a single database.  (perhaps based around libblkid)

  Storage subsystems may support actions like "scan" and "activate".

  Udev rules trigger "scan".

  The exact nature of the abstractions used by the triggers would need
  further exploration.  (Storage-subsystem-specific handling - central
  'blob' cache, general triggers vs. private caches, private triggers.)

Alasdair



More information about the devel mailing list