Action plan for koji01 reboots

Jon Stanley jonstanley at gmail.com
Sun Apr 4 20:46:57 UTC 2010


So I'd like to put together an action plan to deal with the koji01
reboot issues. Right now, we're not capturing crash dumps on this
machine (or any other, but I'm not sure there's value in doing so
unless we have an active, systemic problem like we're facing here) -
not saying that there'd be any *to* capture, but there probably are.
I'd like to setup kdump on this machine after the beta freeze is over,
but I'd like buyin from other people before doing it. Here's what I'd
propose:

1) Present another LV from bxen02 to koji01 and mount it at /var/crash
(the rootfs on koji01 is only 10GB, and we'd need more for a crash
dump or two - I'd say 20GB would be sufficient to hold two crashes,
since it's an 8GB domain). It looks like VolGroup01 where koji01 lives
has about 80G free.
2) Install kexec-tools and configure appropriately (includes adding
crashkernel=128M at 16M to grub.conf)
3) Reboot machine, and wait for it to crash again.
4) Analyze the (hopefully) resulting crash dumps :)
5) Profit!

Any objections?


More information about the infrastructure mailing list