From: Frederick Grose <fgrose(a)gmail.com>
On Wed, Mar 6, 2013 at 3:59 PM, <John.Florian(a)dart.biz> wrote:
<snip>
root@aos-61:46 # # Lets now make it all go wonky:
root@aos-61:46 # time dd if=/dev/zero of=/foo
Bus error
real 1m15.775s
user 0m2.818s
sys 0m24.129s
root@aos-61:46 #
root@aos-61:46 # ls /root
-bash: /bin/ls: Input/output error
root@aos-61:46 # df -h
-bash: /usr/bin/df: Input/output error
root@aos-61:46 # mount
-bash: /usr/bin/mount: Input/output error
root@aos-61:46 # cat /proc/meminfo
-bash: /usr/bin/cat: Input/output error
Is this expected? Is there anything I can do, e.g., configuration-
wise, that can prevent this? Ideally this would fail much like any
other full disk situation. I understand that the overlay consumes
space, i.e., memory, for this file growth, including file removals,
but I'd at least like to be able to remotely reboot a system when in
this state, however I can't even do that because the reboot command
will either return the same I/O error or it may succeed but get the
I/O error when systemd tries to read
/usr/lib/systemd/system/reboot.target.
I dug around in bugzilla, but found nothing there. I can file a
bug, but which package is likely at fault here?
--
John Florian
See
https://fedoraproject.org/wiki/LiveOS_image for some background
and potential workarounds.
--Fred --
There's really not much on that page that helps me here. I'm trying to
use Live images for a mostly-stateless embedded appliance OS deployed to
hundreds or thousands of devices. I realize that the COW design is always
going to be limited, but a more graceful failure mode is really needed,
somehow. For our use, the biggest gain in stability here actually comes
from systemd's journal with its trim-before-write approach instead of the
legacy write now, trim asynchronously approach we used to have. However,
that only covers one specific use case: logging. Writing to proper
persistent storage allows me to avoid the root file system overlay, but
most of these embedded devices use CF or SD cards for storage, which have
limited write cycles that must be respected.
Is there a way to implement an artificial capacity limit that would
prevent processes from exhausting the overlay so that the reserve might be
used for recording the event and rebooting back to a safer state?
At the very least, I think this page could benefit from a little stronger,
more explicit wording of this failure case. While it talks a little about
some work-arounds, it actually says very little about why they are needed.
Only in the "Overlay Recovery" section does it hint at the crash
potential.
--
John Florian