On Thu, Mar 7, 2013 at 10:41 AM, <John.Florian@dart.biz> wrote:
> From: Frederick Grose <fgrose@gmail.com>
> On Wed, Mar 6, 2013 at 3:59 PM, <John.Florian@dart.biz> wrote:
> root@aos-61:46 # # Lets now make it all go wonky:
> root@aos-61:46 # time dd if=/dev/zero of=/foo
> Bus error
> real    1m15.775s
> user    0m2.818s
> sys     0m24.129s
> root@aos-61:46 #
> root@aos-61:46 # ls /root
> -bash: /bin/ls: Input/output error
> root@aos-61:46 # df -h
> -bash: /usr/bin/df: Input/output error                              
> root@aos-61:46 # mount                                              
> -bash: /usr/bin/mount: Input/output error                          
> root@aos-61:46 # cat /proc/meminfo                                  
> -bash: /usr/bin/cat: Input/output error                            
> Is this expected?  Is there anything I can do, e.g., configuration-
> wise, that can prevent this?  Ideally this would fail much like any
> other full disk situation.  I understand that the overlay consumes
> space, i.e., memory, for this file growth, including file removals,
> but I'd at least like to be able to remotely reboot a system when in
> this state, however I can't even do that because the reboot command
> will either return the same I/O error or it may succeed but get the
> I/O error when systemd tries to read /usr/lib/systemd/system/reboot.target.
> I dug around in bugzilla, but found nothing there.  I can file a
> bug, but which package is likely at fault here?
> --
> John Florian

> See https://fedoraproject.org/wiki/LiveOS_image for some background
> and potential workarounds.

>         --Fred --

There's really not much on that page that helps me here.  I'm trying to use Live images for a mostly-stateless embedded appliance OS deployed to hundreds or thousands of devices.  I realize that the COW design is always going to be limited, but a more graceful failure mode is really needed, somehow.  For our use, the biggest gain in stability here actually comes from systemd's journal with its trim-before-write approach instead of the legacy write now, trim asynchronously approach we used to have.  However, that only covers one specific use case: logging.  Writing to proper persistent storage allows me to avoid the root file system overlay, but most of these embedded devices use CF or SD cards for storage, which have limited write cycles that must be respected.

Is there a way to implement an artificial capacity limit that would prevent processes from exhausting the overlay so that the reserve might be used for recording the event and rebooting back to a safer state?

At the very least, I think this page could benefit from a little stronger, more explicit wording of this failure case.  While it talks a little about some work-arounds, it actually says very little about why they are needed.  Only in the "Overlay Recovery" section does it hint at the crash potential.

John Florian

Thank you for the review!  I've updated the wiki page based on your comments,

Documenting that a temporary overlay is a 0.5 GiB sparse file in a RAM filesystem gave me the idea to try using an overlay size greater than available memory, and hope that kernel out-of-memory warnings would intervene before the device-mapper filesystem invalidation.

I modified /usr/sbin/dmsquash-live-root in the initramfs to create a temporary 500 GiB sparse overlay:

dd if=/dev/null of=/overlay bs=1024 count=1 seek=$((512*1024*1024)) 2> /dev/null

Then after booting an updated, Fedora 18 Live desktop, LiveUSB read only and running your failure demo,

time dd if=/dev/zero of=/foo

I got out-of-memory warnings after a file of about 450 MiB was written and the command returned--no crash!

Some post test output:

[root@localhost ~]# dmsetup status
live-osimg-min: 0 8388608 snapshot 2584/2584 24
live-rw: 0 8388608 snapshot 921720/1073741824 3600

top - 18:11:53 up 17 min,  3 users,  load average: 0.68, 0.75, 0.57
Tasks: 182 total,   2 running, 180 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.6 us,  1.6 sy,  0.0 ni, 96.5 id,  0.0 wa,  0.2 hi,  0.0 si,  0.0 st
KiB Mem:   3339812 total,  3260284 used,    79528 free,   316384 buffers
KiB Swap:  3341308 total,        0 used,  3341308 free,  1948108 cached

You might test this method in your systems and let us know how it works.