> From: Frederick Grose <fgrose@gmail.com>
> On Wed, Mar 6, 2013 at 3:59 PM, <John.Florian@dart.biz> wrote:
<snip>
> root@aos-61:46 # # Lets now make it all go wonky:
> root@aos-61:46 # time dd if=/dev/zero of=/foo
> Bus error
>
> real    1m15.775s
> user    0m2.818s
> sys     0m24.129s
> root@aos-61:46 #
> root@aos-61:46 # ls /root
> -bash: /bin/ls: Input/output error
> root@aos-61:46 # df -h
> -bash: /usr/bin/df: Input/output error                              
>                                                
> root@aos-61:46 # mount                                              
>                                                
> -bash: /usr/bin/mount: Input/output error                          
>                                                
> root@aos-61:46 # cat /proc/meminfo                                  
>                                                
> -bash: /usr/bin/cat: Input/output error                            
>                                                
>
> Is this expected?  Is there anything I can do, e.g., configuration-
> wise, that can prevent this?  Ideally this would fail much like any
> other full disk situation.  I understand that the overlay consumes
> space, i.e., memory, for this file growth, including file removals,
> but I'd at least like to be able to remotely reboot a system when in
> this state, however I can't even do that because the reboot command
> will either return the same I/O error or it may succeed but get the
> I/O error when systemd tries to read /usr/lib/systemd/system/reboot.target.
>
> I dug around in bugzilla, but found nothing there.  I can file a
> bug, but which package is likely at fault here?
> --
> John Florian

>
> See https://fedoraproject.org/wiki/LiveOS_image for some background
> and potential workarounds.

>
>         --Fred --



There's really not much on that page that helps me here.  I'm trying to use Live images for a mostly-stateless embedded appliance OS deployed to hundreds or thousands of devices.  I realize that the COW design is always going to be limited, but a more graceful failure mode is really needed, somehow.  For our use, the biggest gain in stability here actually comes from systemd's journal with its trim-before-write approach instead of the legacy write now, trim asynchronously approach we used to have.  However, that only covers one specific use case: logging.  Writing to proper persistent storage allows me to avoid the root file system overlay, but most of these embedded devices use CF or SD cards for storage, which have limited write cycles that must be respected.

Is there a way to implement an artificial capacity limit that would prevent processes from exhausting the overlay so that the reserve might be used for recording the event and rebooting back to a safer state?

At the very least, I think this page could benefit from a little stronger, more explicit wording of this failure case.  While it talks a little about some work-arounds, it actually says very little about why they are needed.  Only in the "Overlay Recovery" section does it hint at the crash potential.

--
John Florian