Preupgrade still sucks. Maybe sucks less, maybe sucks more.

Fri Jun 3 18:34:51 UTC 2011

Ok...

The subject line may well get some unwanted attention and ignite a flame
war but just bare with me.  This has not been a fun last couple of days
for me.

Understand one thing about me.  I manage a number of remote servers to
which I had no console access other than serial consoles.  I do have
remote power control over them.  If I have to drive an hour out to a
colo facility to fix a broken install or upgrade, it's a very bad day.

Classically, for those servers (some of which originally started out on
FC1) have been upgraded using the yum upgrade method.  There have been
one or two times when that has been challenging, thanks to odd
dependencies, but not many.  I've gotten that down to a science where I
dump the current rpm table using "rpm -qa --qf '%{NAME}\n" | sort -u"
into a file and simply remove any conflicts until it upgrades and then
reinstall based on what's in that file.  The work that has been done on
the yum upgrade page simplifying the process to the point where it's
just a "yum update ; yum clean all ; yum --releasever=??" is incredible.
It works so smoothly now compared to what you had to do years ago.  And
the server stays up the entire time of the upgrade.  I don't loose any
significant downtime with those machines.

But...  Each click of Fedora, I do try and give preupgrade a shot and
see how bad it is or see if they've improved things.  I do this on my
local workstations and I can see how much downtime is involved and if
there are any gotchas.

So it was this time.  My intent was to take two of my 64 bit machines
and upgrade one, "Forest",  using preupgrade and one, "MtKing" (both
names from the old game of Colossal Caverns Adventure) using yum upgrade
to compare the upgrade time, downtime and the resulting rpm sets.  Both
machines had the same rpms installed and both machines were up to date.
I also use the pkgcacher package on one of my other servers so I'm only
downloading copies of packages once and both machines can suck them in
from that cache.

So...  Forest ran preupgrade and downloaded all the packages and was
ready to reboot, which I did.  MtKing downloaded all the packages and
ran into dependency problems, which isn't uncommon with the yum updated
path.  So I decided, what the hell and ran preupgrade on it as well and
then rebooted it.  With the package cacher, the downloads of something
like 2000-3000 packages took less that 5 minutes for each machine.

While it was down and grinding on its disk, Forest got an non-specific
unhandled exception and was stuck at the console requiring a manual
reboot.  Well, that tells me right there that preupgrade is still not
deployment grade yet.  Not for remote servers at least.  That would have
cost me a trip to the colo if it had been remote.  Not good.  MtKing
also failed preupgrade but this was because it was short disk space in
one partition.  That was easy to fix but, again, it's requiring console
interaction or it's dead.  The yum upgrade also would have told me it
was short on diskspace for the install before getting this far down the
road and having the bloody server off line at the time, saving me a
couple of reboots and other jacking around.

So, I switch plans, knowing what the problem was on MtKing and having no
clue what caused anaconda to hurl chunks on Forest.  I freed up some
space on MtKing and reran preupgrade while I ran yum upgrade on Forrest.

Forest had some dependency problems, like MtKing (to be expected - they
were the same), but this time I simply dumped the rpm table to a file,
like I always do, and started removing the bad boys.  Couple of minor
things, really.  Stick in the mud was avidmux which really had it tied
in a knot with some missing upgrade library but had no problem pulling
that and then yum upgrade is chugging away (still can't reinstall
avidmux because of that missing library).  Half hour later, it's done
and the machine is rebooted and up, more or less.  Then I found that
someone had screwed up IPv6 over bridges by forcing accept_ra = 0 and
forwarding = 1 in the bloody scripts.  I'll deal with that with a bug
report later.  Absolutely stupid.  A bridge is not forwarding, it's
bridging and they go and break autoconf by this misguided step.  But
that's another story.  Shortly after sorting that out, I have Forest up
and the half dozen LXC virtual machines running on him and everything is
right with the world.

ITMT...  I rebooted MtKing into the preupgrade process and turned it
loose.  Strangely, it DIDN'T run into the unhandled exception like
Forest had.  The machines should have been the same.  Oh well.
Something like two HOURS later, though, and it's still grinding on the
disk.  WTH?  Why is preupgrade taking 4 times longer to upgrade a system
and that's with the system down and out of service during the entire
process.  Well, it finally finished and I rebooted into the F15 kernel
and was almost immediately greeted with a kernel panic unable to mount
root fs on unknown block(0,0).  Sigh...  This would be real great at a
remote location.  Ok, I'm screwed.  Yum upgrade worked over on Forest
where preupgrade demonstrated an epic fail, and now MtKing has succumbed
to another failure.  Tried booting into one of the F14 kernels that were
still on the system.  You can forget that noise as well.  I ended up at
the "Welcome to emergency mode. Use systemctl default or ^D to activate
the default mode".  Grrr...  Log in and tried that "systemctl
default"...  No joy.  "Failed to issue method call: Transaction is
destructive."  Great.  That's a delightfully spooky error that tells you
absolutely nothing.  Looks like it burned my bridges on the way out the
door.

OTOH...  My son, who is another skilled developer and Linux enthusiast,
has used preupgrade successfully on one of his 64 bit stations but he
also noticed that the upgrade took seemingly forever.  Like hours.  So
that's not just me.

Well, I've got a dead machine to try and recover from now.  I've heard
all the arguments about how preupgrade should be so much better because
you're running anaconda on an install kernel.  That has simply NOT been
my experience at all.  On the contrary - exactly to the opposite.
Preupgrade fails to do the necessary disk space checking and dependency
checking that ultimately causes these failures, all of which could be
resolved remotely on a live running system without requiring repeated
reboots.  I have no idea what anaconda is doing that is so broken that
it takes over 4 times longer to upgrade a system than yum, but the yum
upgrade path has worked flawlessly (not always effortlessly, but
flawlessly) for years.  For now - preupgrade => epic fail * 2.  If
anyone has any thoughts on what has caused either of the two remaining
problems (the kernel panic on the F15 kernel or the failure to run on
the F14 kernel) I'd be happy to hear them.  ITMT, I guess I'll start
building a recovery CD to try and fix this mess.

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
Url : http://lists.fedoraproject.org/pipermail/users/attachments/20110603/ff257306/attachment.bin