[Fedora-livecd-list] pxeboot images out of space

Frederick Grose fgrose at gmail.com
Mon May 2 21:21:27 UTC 2011


On Mon, May 2, 2011 at 4:46 PM, Howard Powell <hbp4c at virginia.edu> wrote:

> Hi -
>
> I've been using the livecd set of tools to build a pxeboot image for a set
> of compute nodes in our local HPC environment.  The livecd project has
> allowed me to make all of the compute nodes diskless, and any software
> errors are trivial to fix (just reboot).
>
> I've run into one problem - there appears to be a problem with my image
> where if any process on a node produces a large amount of disk I/O to /tmp -
> somewhere around 0.5GiB or more in one operation, causes the root filesystem
> to panic and the node must be rebooted.
>
> Creating the image is as simple as:
> # LANG=C livecd-creator --config=/local/nodes/hyades-nodes.cfg
> --fslabel=hyades -t /local/nodes/
> # livecd-iso-to-pxeboot /local/nodes/hyades.iso
>
> The exact error caused during the I/O operation on a compute node is logged
> as:
> May  2 16:11:32 eth-c31.cluster kernel: device-mapper: snapshots:
> Invalidating snapshot: Unable to allocate exception.
> May  2 16:11:32 eth-c31.cluster syslogd: /var/log/messages: Read-only file
> system
> May  2 16:11:32 eth-c31.cluster kernel: Buffer I/O error on device dm-0,
> logical block 997925
> May  2 16:11:32 eth-c31.cluster kernel: lost page write due to I/O error on
> dm-0
> May  2 16:11:32 eth-c31.cluster kernel: Aborting journal on device dm-0.
> May  2 16:11:32 eth-c31.cluster kernel: __journal_remove_journal_head:
> freeing b_committed_data
> May  2 16:11:32 eth-c31.cluster last message repeated 5 times
> May  2 16:11:32 eth-c31.cluster kernel: journal commit I/O error
> May  2 16:11:32 eth-c31.cluster kernel: ext3_abort called.
> May  2 16:11:32 eth-c31.cluster kernel: EXT3-fs error (device dm-0):
> ext3_journal_start_sb: Detected aborted journal
> May  2 16:11:32 eth-c31.cluster kernel: Remounting filesystem read-only
> May  2 16:11:32 eth-c31.cluster kernel: __journal_remove_journal_head:
> freeing b_committed_data
> May  2 16:11:32 eth-c31.cluster kernel: __journal_remove_journal_head:
> freeing b_committed_data
> May  2 16:11:32 eth-c31.cluster kernel: __journal_remove_journal_head:
> freeing b_frozen_data
> May  2 16:11:32 eth-c31.cluster kernel: __journal_remove_journal_head:
> freeing b_frozen_data
> May  2 16:11:32 eth-c31.cluster kernel: __journal_remove_journal_head:
> freeing b_committed_data
> May  2 16:11:32 eth-c31.cluster kernel: __journal_remove_journal_head:
> freeing b_frozen_data
> May  2 16:11:43 eth-c31.cluster kernel: printk: 259144 messages suppressed.
> May  2 16:11:43 eth-c31.cluster kernel: Buffer I/O error on device dm-0,
> logical block 737
> May  2 16:11:43 eth-c31.cluster kernel: lost page write due to I/O error on
> dm-0
> May  2 16:11:43 eth-c31.cluster kernel: Buffer I/O error on device dm-0,
> logical block 115035
> May  2 16:11:43 eth-c31.cluster kernel: lost page write due to I/O error on
> dm-0
>
>
> Googling for information suggests that the device underlying the filesystem
> is running out of space, which explains why the filesystem crashes.   df
> reports that the / filesystem should have space:
>
> [root at c31 ~]# df -h
> /dev/mapper/live-rw   6.0G  1.2G  4.8G  19% /
>
> I've adjusted the "part / -size 6144" parameter in my kickstart file, but I
> see no effective results other than the size that df reports changes to
> match what I specify. Writing a file to /tmp larger than about 512MB causes
> the filesystem to continue to crash even if the space is reported as
> available.
>
> Each compute node has 32GB of system memory, and is running an x86_64
> kernel.
>
> I'm open to any suggestions on how to fix this issue.
>
> Thanks!
> Howard
>

I'm not familiar with livecd-iso-to-pxeboot, but a standard LiveOS image
places /tmp in a tmpfs.  See
http://git.fedorahosted.org/git/?p=spin-kickstarts.git;a=blob;f=fedora-live-base.ks;h=88bbf7057d099eb872f844b09fbf596bbee5eb32;hb=master#l171

You may try adjusting that line in /etc/rc.d/init.d/livesys, if that fits
your situation.

       --Fred
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.fedoraproject.org/pipermail/livecd/attachments/20110502/d95ad92b/attachment.html 


More information about the livecd mailing list