(I'm getting a sense of deja vu, that I'm learning the same lesson someone else recently learned here. Lets see if the 3rd time is the charm...)
Attached is a revision to the persistence implementation that I posted a couple weeks ago. This is mainly for Jeremy, Tim, and anyone else who is interested in working on this, or something similar. I.e. at the very least, it is worth a read to look at the issues I've dealt with, and the several that are in comments as TODO.
It may well be that a simpler persistence implementation that involves just extracting tarballs from usbsticks into the normal ram overlay, may be useful instead of (or even in addition to) this kind of implementation. (or perhaps some implementation of unionfs will make it into fedora eventually?)
The main points of note, since the first post are-
- all sorts of bugs fixed
- I moved the overlay storage filesystem to be visible as /mnt/overlayfs always. This solves some aspects of the current problem of not easily being able to see how much writable space you really have available on the rootfs. (the real answer is a combination of the device mapper overlay file AND the filesystem it resides on).
- I've included modified /etc/rc.d/init.d/halt and functions, to handle getting things cleanly shutdown (which is VERY important)
- ntfs is somewhat present, but not really working. I have tested with vfat and ext3. Note that ext3 is a PITA when not cleanly unmounted- see TODOs.
- rudimentary testing of the choice selection when multiple possible overlay images are detected suggests it works.
- the patch format merely reflects my educational process with git, and not that I suggest that code this immature is anywhere near ready for merging. (i.e. inclusion of halt&functions and the origs I based them off of. Refer to list archives for documentation on how to use addidir/addsdir if needed)
As always, comments/criticisms/suggestions are more than welcome.
peace...
-dmc
On Mon, 2007-08-20 at 05:23 -0500, Douglas McClendon wrote:
(I'm getting a sense of deja vu, that I'm learning the same lesson someone else recently learned here. Lets see if the 3rd time is the charm...)
It looks like you're getting hit by what Colin was where the list is eating some attachments :-/ FWIW, the "best" way of sending patches directly from git is git-format-patch followed by git-send-email; I need to sit down and write up some simple docs for using git + livecd-creator and best practices. Where's that 36 hour day? ;-)
I'll try to get access to the list admin page later and try to tweak stuff to avoid the problem later as I suspect it's that one of the default mailman settings blocks attachments that look like mail to avoid some of the WINMAIL.DAT crap.
Attached is a revision to the persistence implementation that I posted a couple weeks ago. This is mainly for Jeremy, Tim, and anyone else who is interested in working on this, or something similar. I.e. at the very least, it is worth a read to look at the issues I've dealt with, and the several that are in comments as TODO.
Since the patch isn't attached, I'll guess. This is still doing a file which is loopback mounted and then added to the dm device?
It may well be that a simpler persistence implementation that involves just extracting tarballs from usbsticks into the normal ram overlay, may be useful instead of (or even in addition to) this kind of implementation. (or perhaps some implementation of unionfs will make it into fedora eventually?)
There's work on doing unionfs via fuse which could be interesting for that in the medium term. But I'm not sure how useful tarballs/unionfs are when we think about the user experience. If it's going to persist, we want changes to "persist" as soon as they're done, not after some set of stuff is done to apply them.
The main points of note, since the first post are-
all sorts of bugs fixed
I moved the overlay storage filesystem to be visible as /mnt/overlayfs
always. This solves some aspects of the current problem of not easily being able to see how much writable space you really have available on the rootfs. (the real answer is a combination of the device mapper overlay file AND the filesystem it resides on).
This sounds good.
- I've included modified /etc/rc.d/init.d/halt and functions, to handle
getting things cleanly shutdown (which is VERY important)
And can see about getting these integrated with the initscripts bits. Shouldn't be too bad to do
- ntfs is somewhat present, but not really working. I have tested with
vfat and ext3. Note that ext3 is a PITA when not cleanly unmounted- see TODOs.
Let's make things simpler for the moment and just ignore ntfs. If we get things happy with ext3 and vfat, then we can start to think about ntfs.
- rudimentary testing of the choice selection when multiple possible
overlay images are detected suggests it works.
Cool.
Jeremy
On Mon, 2007-08-20 at 11:10 -0400, Jeremy Katz wrote:
On Mon, 2007-08-20 at 05:23 -0500, Douglas McClendon wrote:
(I'm getting a sense of deja vu, that I'm learning the same lesson someone else recently learned here. Lets see if the 3rd time is the charm...)
It looks like you're getting hit by what Colin was where the list is eating some attachments :-/
This should hopefully be fixed now. We'll find out the next time someone tries :-)
Jeremy
Jeremy Katz wrote:
On Mon, 2007-08-20 at 05:23 -0500, Douglas McClendon wrote:
(I'm getting a sense of deja vu, that I'm learning the same lesson someone else recently learned here. Lets see if the 3rd time is the charm...)
It looks like you're getting hit by what Colin was where the list is eating some attachments :-/ FWIW, the "best" way of sending patches directly from git is git-format-patch followed by git-send-email; I need to sit down and write up some simple docs for using git + livecd-creator and best practices. Where's that 36 hour day? ;-)
I'll try to get access to the list admin page later and try to tweak stuff to avoid the problem later as I suspect it's that one of the default mailman settings blocks attachments that look like mail to avoid some of the WINMAIL.DAT crap.
Hmm... The 3rd time did appear to be the charm for me. Perhaps your email client is eating a broader class of attachments than the list itself. To see the patch, see the archive link of the post you replied to-
https://www.redhat.com/archives/fedora-livecd-list/2007-August/msg00168.html
Attached is a revision to the persistence implementation that I posted a couple weeks ago. This is mainly for Jeremy, Tim, and anyone else who is interested in working on this, or something similar. I.e. at the very least, it is worth a read to look at the issues I've dealt with, and the several that are in comments as TODO.
Since the patch isn't attached, I'll guess. This is still doing a file which is loopback mounted and then added to the dm device?
correct.
It may well be that a simpler persistence implementation that involves just extracting tarballs from usbsticks into the normal ram overlay, may be useful instead of (or even in addition to) this kind of implementation. (or perhaps some implementation of unionfs will make it into fedora eventually?)
There's work on doing unionfs via fuse which could be interesting for that in the medium term. But I'm not sure how useful tarballs/unionfs are when we think about the user experience. If it's going to persist, we want changes to "persist" as soon as they're done, not after some set of stuff is done to apply them.
Well, the way ubuntu is trying to do it of course, is with unionfs (since of course they use unionfs rather than dm-snapshot to get cow in the first place).
And as such, unionfs can provide just as persistful an implementation as the direction I've been going. In both cases you can think of the persistence as another embedded layer in the total root filesystem.
The main points of note, since the first post are-
all sorts of bugs fixed
I moved the overlay storage filesystem to be visible as /mnt/overlayfs
always. This solves some aspects of the current problem of not easily being able to see how much writable space you really have available on the rootfs. (the real answer is a combination of the device mapper overlay file AND the filesystem it resides on).
This sounds good.
- I've included modified /etc/rc.d/init.d/halt and functions, to handle
getting things cleanly shutdown (which is VERY important)
And can see about getting these integrated with the initscripts bits. Shouldn't be too bad to do
- ntfs is somewhat present, but not really working. I have tested with
vfat and ext3. Note that ext3 is a PITA when not cleanly unmounted- see TODOs.
Let's make things simpler for the moment and just ignore ntfs. If we get things happy with ext3 and vfat, then we can start to think about ntfs.
I was ignoring ntfs, though not enough to remove the stuff that could support it.
-dmc
- rudimentary testing of the choice selection when multiple possible
overlay images are detected suggests it works.
Cool.
Jeremy
-- Fedora-livecd-list mailing list Fedora-livecd-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-livecd-list
On Mon, 2007-08-20 at 14:10 -0500, Douglas McClendon wrote:
Jeremy Katz wrote:
On Mon, 2007-08-20 at 05:23 -0500, Douglas McClendon wrote:
(I'm getting a sense of deja vu, that I'm learning the same lesson someone else recently learned here. Lets see if the 3rd time is the charm...)
It looks like you're getting hit by what Colin was where the list is eating some attachments :-/ FWIW, the "best" way of sending patches directly from git is git-format-patch followed by git-send-email; I need to sit down and write up some simple docs for using git + livecd-creator and best practices. Where's that 36 hour day? ;-)
I'll try to get access to the list admin page later and try to tweak stuff to avoid the problem later as I suspect it's that one of the default mailman settings blocks attachments that look like mail to avoid some of the WINMAIL.DAT crap.
Hmm... The 3rd time did appear to be the charm for me. Perhaps your email client is eating a broader class of attachments than the list itself. To see the patch, see the archive link of the post you replied to-
https://www.redhat.com/archives/fedora-livecd-list/2007-August/msg00168.html
Weird. In any case, I'm also now an admin for the list, and like I said, I've _hopefully_ adjusted things so they should work anyway. We'll see I guess. And review coming up from looking at the patch next...
It may well be that a simpler persistence implementation that involves just extracting tarballs from usbsticks into the normal ram overlay, may be useful instead of (or even in addition to) this kind of implementation. (or perhaps some implementation of unionfs will make it into fedora eventually?)
There's work on doing unionfs via fuse which could be interesting for that in the medium term. But I'm not sure how useful tarballs/unionfs are when we think about the user experience. If it's going to persist, we want changes to "persist" as soon as they're done, not after some set of stuff is done to apply them.
Well, the way ubuntu is trying to do it of course, is with unionfs (since of course they use unionfs rather than dm-snapshot to get cow in the first place).
And as such, unionfs can provide just as persistful an implementation as the direction I've been going. In both cases you can think of the persistence as another embedded layer in the total root filesystem.
It's been quite a while since I looked at unionfs, but I vaguely remember that it was more subtree overlays. I guess you could perhaps do a subtree of /. But even so, I don't know that supporting multiple ways of achieving the same goal is really where we want to go. But it's somewhat academic at the moment, so probably not much discussion needed.
Let's make things simpler for the moment and just ignore ntfs. If we get things happy with ext3 and vfat, then we can start to think about ntfs.
I was ignoring ntfs, though not enough to remove the stuff that could support it.
I say let's just pull it. If it doesn't work right now, we might as well save the effort of testing it and/or someone trying it, finding it doesn't work, and filing a bug. Plus, it then makes the changes clearer for looking at.
Jeremy
Jeremy Katz wrote:
On Mon, 2007-08-20 at 14:10 -0500, Douglas McClendon wrote:
Well, the way ubuntu is trying to do it of course, is with unionfs (since of course they use unionfs rather than dm-snapshot to get cow in the first place).
And as such, unionfs can provide just as persistful an implementation as the direction I've been going. In both cases you can think of the persistence as another embedded layer in the total root filesystem.
It's been quite a while since I looked at unionfs, but I vaguely remember that it was more subtree overlays. I guess you could perhaps do a subtree of /. But even so, I don't know that supporting multiple ways of achieving the same goal is really where we want to go. But it's somewhat academic at the moment, so probably not much discussion needed.
I'm not sure if there is some meaning of subtree that is different than subdir. But the way most livecds work, is by having a big squashfs with your root filesystem (all of it, not seperated into subdirs or anything), and then having a tmpfs, and then using unionfs to make the tmpfs act as a layer over the squashfs, and then doing pivotroot to that single unionfs filesystem.
Kadischi used the method that was predominant before that unionfs method, which was to have many subdirs (/usr, /opt) be read only, and then have some subdirs (/tmp, /var, ...) be read only. Perhaps using bindmounting or symlinks to handle some specific sub-subdirs.
Back to unionfs- The major disadvantage of unionfs is that it is not 'perfect' as a real rootfs (why AFAIK fedora/rh refused to merge it). I.e. there are some known bugs, which knoppix and ubuntu just take as an acceptable tradeoff.
The major advantage of unionfs, for the specific persistence topic at hand, is that when you delete a file from the COW rootfs, in unionfs, the memory is actually freed. Whereas for the dm-snapshot implementation of persistence, that is not the case.
This may be acceptible. There may be workarounds for it (using shred to delete files into 0's, and then resparsifying the persistence overlay?)
Anyway... yes, academic.
And unionfs can't get rebootless installation (bwa ha ha....)
-dmc
Douglas McClendon wrote:
Jeremy Katz wrote:
On Mon, 2007-08-20 at 14:10 -0500, Douglas McClendon wrote:
Well, the way ubuntu is trying to do it of course, is with unionfs (since of course they use unionfs rather than dm-snapshot to get cow in the first place).
And as such, unionfs can provide just as persistful an implementation as the direction I've been going. In both cases you can think of the persistence as another embedded layer in the total root filesystem.
It's been quite a while since I looked at unionfs, but I vaguely remember that it was more subtree overlays. I guess you could perhaps do a subtree of /. But even so, I don't know that supporting multiple ways of achieving the same goal is really where we want to go. But it's somewhat academic at the moment, so probably not much discussion needed.
I'm not sure if there is some meaning of subtree that is different than subdir. But the way most livecds work, is by having a big squashfs with your root filesystem (all of it, not seperated into subdirs or anything), and then having a tmpfs, and then using unionfs to make the tmpfs act as a layer over the squashfs, and then doing pivotroot to that single unionfs filesystem.
Kadischi used the method that was predominant before that unionfs method, which was to have many subdirs (/usr, /opt) be read only, and then have some subdirs (/tmp, /var, ...) be read only. Perhaps using bindmounting or symlinks to handle some specific sub-subdirs.
Back to unionfs- The major disadvantage of unionfs is that it is not 'perfect' as a real rootfs (why AFAIK fedora/rh refused to merge it). I.e. there are some known bugs, which knoppix and ubuntu just take as an acceptable tradeoff.
the problem is in symlinks(unionfs). incase of devmaper this problem is not there. But there is another issue. The snapshot size. The method I follow is ofcourse Devmaper. But i tryed with fuse and funionfs but not tested vigourously.
The major advantage of unionfs, for the specific persistence topic at hand, is that when you delete a file from the COW rootfs, in unionfs, the memory is actually freed. Whereas for the dm-snapshot implementation of persistence, that is not the case.
This may be acceptible. There may be workarounds for it (using shred to delete files into 0's, and then resparsifying the persistence overlay?)
Anyway... yes, academic.
And unionfs can't get rebootless installation (bwa ha ha....)
-dmc
-- Fedora-livecd-list mailing list Fedora-livecd-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-livecd-list
ashok shankar das wrote:
Douglas McClendon wrote:
Jeremy Katz wrote:
On Mon, 2007-08-20 at 14:10 -0500, Douglas McClendon wrote:
Well, the way ubuntu is trying to do it of course, is with unionfs (since of course they use unionfs rather than dm-snapshot to get cow in the first place).
And as such, unionfs can provide just as persistful an implementation as the direction I've been going. In both cases you can think of the persistence as another embedded layer in the total root filesystem.
It's been quite a while since I looked at unionfs, but I vaguely remember that it was more subtree overlays. I guess you could perhaps do a subtree of /. But even so, I don't know that supporting multiple ways of achieving the same goal is really where we want to go. But it's somewhat academic at the moment, so probably not much discussion needed.
I'm not sure if there is some meaning of subtree that is different than subdir. But the way most livecds work, is by having a big squashfs with your root filesystem (all of it, not seperated into subdirs or anything), and then having a tmpfs, and then using unionfs to make the tmpfs act as a layer over the squashfs, and then doing pivotroot to that single unionfs filesystem.
Kadischi used the method that was predominant before that unionfs method, which was to have many subdirs (/usr, /opt) be read only, and then have some subdirs (/tmp, /var, ...) be read only. Perhaps using bindmounting or symlinks to handle some specific sub-subdirs.
Back to unionfs- The major disadvantage of unionfs is that it is not 'perfect' as a real rootfs (why AFAIK fedora/rh refused to merge it). I.e. there are some known bugs, which knoppix and ubuntu just take as an acceptable tradeoff.
the problem is in symlinks(unionfs). incase of devmaper this problem is not there. But there is another issue. The snapshot size. The method I follow is ofcourse Devmaper. But i tryed with fuse and funionfs but not tested vigourously.
Yes, I just ran across a page describing how ubuntu actually used to use devicemapper snapshot, but switched to unionfs, because they said it was faster and more flexible (and the overlay memory usage not decreasing when cow created files get deleted).
As mentioned, it might be interesting work (for some day far from today), to see if you could get rm to zero out files, and then have them not take up space in the overlay device afterwords. I notice from man chattr(and shred) there is already some work on the zero-out parts, and perhaps dm-snapshot is already smart enough to detect that the newly changed overlay blocks match the base blocks (0s in both cases), and free the associated memory.
It would be nice however if there was a bulletproof unionfs implementation though... that was as reliable for the rootfs as say, LVM has proven to be.
-dmc
The major advantage of unionfs, for the specific persistence topic at hand, is that when you delete a file from the COW rootfs, in unionfs, the memory is actually freed. Whereas for the dm-snapshot implementation of persistence, that is not the case.
This may be acceptible. There may be workarounds for it (using shred to delete files into 0's, and then resparsifying the persistence overlay?)
Anyway... yes, academic.
And unionfs can't get rebootless installation (bwa ha ha....)
-dmc
-- Fedora-livecd-list mailing list Fedora-livecd-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-livecd-list
Douglas McClendon wrote:
ashok shankar das wrote:
Douglas McClendon wrote:
Jeremy Katz wrote:
On Mon, 2007-08-20 at 14:10 -0500, Douglas McClendon wrote:
Well, the way ubuntu is trying to do it of course, is with unionfs (since of course they use unionfs rather than dm-snapshot to get cow in the first place).
And as such, unionfs can provide just as persistful an implementation as the direction I've been going. In both cases you can think of the persistence as another embedded layer in the total root filesystem.
It's been quite a while since I looked at unionfs, but I vaguely remember that it was more subtree overlays. I guess you could perhaps do a subtree of /. But even so, I don't know that supporting multiple ways of achieving the same goal is really where we want to go. But it's somewhat academic at the moment, so probably not much discussion needed.
I'm not sure if there is some meaning of subtree that is different than subdir. But the way most livecds work, is by having a big squashfs with your root filesystem (all of it, not seperated into subdirs or anything), and then having a tmpfs, and then using unionfs to make the tmpfs act as a layer over the squashfs, and then doing pivotroot to that single unionfs filesystem.
Kadischi used the method that was predominant before that unionfs method, which was to have many subdirs (/usr, /opt) be read only, and then have some subdirs (/tmp, /var, ...) be read only. Perhaps using bindmounting or symlinks to handle some specific sub-subdirs.
Back to unionfs- The major disadvantage of unionfs is that it is not 'perfect' as a real rootfs (why AFAIK fedora/rh refused to merge it). I.e. there are some known bugs, which knoppix and ubuntu just take as an acceptable tradeoff.
the problem is in symlinks(unionfs). incase of devmaper this problem is not there. But there is another issue. The snapshot size. The method I follow is ofcourse Devmaper. But i tryed with fuse and funionfs but not tested vigourously.
Yes, I just ran across a page describing how ubuntu actually used to use devicemapper snapshot, but switched to unionfs, because they said it was faster and more flexible (and the overlay memory usage not decreasing when cow created files get deleted).
As mentioned, it might be interesting work (for some day far from today), to see if you could get rm to zero out files, and then have them not take up space in the overlay device afterwords. I notice from man chattr(and shred) there is already some work on the zero-out parts, and perhaps dm-snapshot is already smart enough to detect that the newly changed overlay blocks match the base blocks (0s in both cases), and free the associated memory.
It would be nice however if there was a bulletproof unionfs implementation though... that was as reliable for the rootfs as say, LVM has proven to be.
I agree with your views. And The LVM stuffs are really nice, but why not experiment with new things? That is why if sombody can make *unionfs/funionfs* rock solid then the livecd/persistent issue would be solved. Even i could dream of a diskless persistent handheld device too ;)
-dmc
The major advantage of unionfs, for the specific persistence topic at hand, is that when you delete a file from the COW rootfs, in unionfs, the memory is actually freed. Whereas for the dm-snapshot implementation of persistence, that is not the case.
This may be acceptible. There may be workarounds for it (using shred to delete files into 0's, and then resparsifying the persistence overlay?)
Anyway... yes, academic.
And unionfs can't get rebootless installation (bwa ha ha....)
-dmc
-- Fedora-livecd-list mailing list Fedora-livecd-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-livecd-list
-- Fedora-livecd-list mailing list Fedora-livecd-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-livecd-list
Jeremy Katz wrote:
On Mon, 2007-08-20 at 05:23 -0500, Douglas McClendon wrote:
(I'm getting a sense of deja vu, that I'm learning the same lesson someone else recently learned here. Lets see if the 3rd time is the charm...)
It looks like you're getting hit by what Colin was where the list is eating some attachments :-/ FWIW, the "best" way of sending patches directly from git is git-format-patch followed by git-send-email; I need to sit down and write up some simple docs for using git + livecd-creator and best practices. Where's that 36 hour day? ;-)
While I may use git-send-email someday, for the moment I prefer attaching the patch manually to a thunderbird composed email. What I did to get the 3rd one through, was to manually delete the 4 email header style lines from the git-format-patch patch (which I then verified is still a validly apply-able patch syntax).
Anyway, FWIW, here are the notes I made to myself while learning the second minimal level of git functionality. Obviously no need for a livecd-tools rehash of the fine git tutorials and docs already available. Or at least not for anything much more complicated than what is attached.
-dmc
# initial checkout git clone git://git.fedoraproject.org/git/hosted/livecd
cd livecd
# create new overlay branch git branch overlay
# list branches (now master and overlay) git branch
# switch to overlay branch git checkout overlay
# edit Makefile, first thing
# run git gui, notice nothing reflecting the Makefile edit, as it is # uncommitted gitk
# set identity (commit complains if not set) git config user.email "dmc@filteredperception.org" git config user.name "Douglas McClendon"
# commit makefile (and all changes) git commit -a
# add findoverlay script cd creator cp /somewhere/findoverlay ./findoverlay git add findoverlay
# commit findoverlay git commit -a
# notice gitk coolness gitk
# generate patchset (1 per commit in mailer format) git-format-patch master
# notice that sending in that format as an attachment causes problems, # due to embedded mailer fields, so manually edit and remove those top # 4 lines, then verified that patch still understands the format, and # the result does make it to fedora-livecd-list
On Mon, 2007-08-20 at 17:26 -0500, Douglas McClendon wrote:
Anyway, FWIW, here are the notes I made to myself while learning the second minimal level of git functionality. Obviously no need for a livecd-tools rehash of the fine git tutorials and docs already available. Or at least not for anything much more complicated than what is attached.
One little addition for anyone else who might be in the same boat.
# set identity (commit complains if not set) git config user.email "dmc@filteredperception.org" git config user.name "Douglas McClendon"
You can use "git config --global" to set this globally and not have to do so on a per-project basis.
Jeremy
On Mon, 2007-08-20 at 05:23 -0500, Douglas McClendon wrote:
Attached is a revision to the persistence implementation that I posted a couple weeks ago. This is mainly for Jeremy, Tim, and anyone else who is interested in working on this, or something similar. I.e. at the very least, it is worth a read to look at the issues I've dealt with, and the several that are in comments as TODO.
Couple of little things just to make reviewing easier, but not huge problems * Might be good to keep the initscripts changes as patches rather than an orig and a modified version. Will make things work even if other things in initscripts change and also makes it easier to know what's going on. * It's good to get into the habit of doing git commits for each separate change. Then you can get a patch per change. And that would avoid having the addidir/addsdir stuff being in the same changes
Now to get to the meat of things
index 0000000..8962720 --- /dev/null +++ b/creator/etc_rc.d_init.d_functions
I suspect that Bill might have some reservations about the hard-coded overlayfs piece. At the same time, it's all I can think of and it's not that out of line from other things in halt/rc.sysinit.
For the overlay info bit, we could potentially just stuff it in /sbin/halt.local for now I think.
diff --git a/creator/findoverlay b/creator/findoverlay new file mode 100755 index 0000000..e0674cc --- /dev/null +++ b/creator/findoverlay
This looks pretty good to me...
+# load filesystem modules that may be required for overlay +# TODO: only load these conditionally if vol_id detects a fs that needs them
Do they not get loaded automatically on the filesystem mount? That at least used to work.
+# IMPORTANT TODO: while mount scanning find a way to determine if the +# filesystem was not cleanly unmounted. If so, IGNORE IT, +# as it may be part of a hibernated OS !!!!!!!
Maybe instead of using cleanly unmounted vs not as the key, we should look at swaps to see if they have the SWSUSP signature? That's a pretty straight-forward thing to check, but I can't quite convince myself if it's as safe or not.
+# CAVEAT: If the overlay file has a kin file with the suffix .inuse, this +# is evidence that that the overlay device was not unmounted cleanly. +# In _this_ case, look at the filesystem(???) and determine whether +# or not the most recent mount of the filesystem is more recent than +# the inuse file. *If and only if* NOT, then it is safe to assume +# that the filesystem is not part of a hibernated OS, and rather was +# most recently used as a persistence device that failed to be +# shutdown cleanly, thus it is safe to fsck the overlayfs, and then +# fsck the overlay-rootfs
If we checked for the swsusp case instead, would we be able to skip this?
+# IMPORTANT TODO: since ext3 is such a pain (possible?) to mount readonly, +# and since similar issues may exist in other fs (ntfs???), +# I think it would be good to have a function called +# really_mount_readonly() which does a blockdev --setro, then +# does a devicemapper snapshot to ram, then does a mount of +# the snapshotted device, then checks for existence of +# overlay and .inuse files.
If the blockdev is read-only, do we really need to snapshot it too?
+# RELATED: Given the above function, if a persistence file is detected, +# but the above above inuse/recent-mount-stamp test fails, give +# the user a 30-60 second timeout option to force an fsck and mount +# of the uncleanly mounted overlayfs, defaulting to not using it.
Probably fair. fsck in the initramfs might have fun around controlling terminals and sometimes wanting to drop to a shell, so needs some testing to make sure it's sane
+# TODO: All this multiple candidate code hasn't been tested recently (can't +# remember if it ever really did work). Though I have tested the +# typical auto case where one overlay is found and used.
Probably the most important one :)
+# TODO: verify that filesystems other than ext3 work. I know this will +# probably mean some interesting special case code. + +# TODO: handle nfs/network(fuse-httpfs?) persistence devices. This will +# require the ability to set up the network here, which is probably +# not trivial.
This is one of the reasons I want to get rid of mayflower and build up mkinitrd; mkinitrd already has all kinds of network setup code for nfs/iscsi root and then we could take advantage of that. And fwiw, I spent a little bit of time getting a branch of mkinitrd started being able to do so, but then ran into a need for modprobe to do something more. Will get back there eventually. On the plus side, trying to make sure that we can do that switch without it mattering much for things like the overlay finding code (just have to do the little plug-in similar to the mayflower change)
+# TODO: handle fsck'ing the rootfs if need be correctly? Or does the right +# thing just happen. I know that trying to use a persistence file +# from something that got unmounted uncleanly, seems to cause problems +# VERY quickly. This may be a fatal flaw... (or at least require +# some work)
fsck of the combined fs should happen fine once we get into the normal userspace. And the ro rootfs shouldn't need fsck'ing. So I *think* we should be fine. Only needing to then worry about the case of a persistence file from an uncleanly mounted filesystem. Which maybe can be punted by saying you use ext3 (with journal, therefore no need for fsck usually) or vfat (unclean unmount is less disasterous)
+ losetup /dev/loop119 /mnt/overlayfs/overlay + echo "overlayfs_dev=tmpfs" > /mnt/overlayfs/overlay.inuse + echo "overlayfs_fstype=tmpfs" >> /mnt/overlayfs/overlay.inuse + echo "overlayfs_path=/overlay" >> /mnt/overlayfs/overlay.inuse + echo "/mnt/overlayfs/overlay.inuse" > /overlay.info
Am I missing where this is used or is it just informational?
diff --git a/creator/mayflower b/creator/mayflower index c1c5258..29cc8ec 100755 --- a/creator/mayflower +++ b/creator/mayflower @@ -268,6 +290,21 @@ for o in `cat /proc/cmdline` ; do live_locale=*) live_locale=${o#live_locale=} ;; + # + # dmc overlay: aesthetics, undecided about name persistence vs overlay
I actually kind of like overlay. But yeah, aesthetics :)
+ # dmc overlay: if non-ram overlay searching is desired, do it, + # otherwise, create overlay in ram as usual + if [ "x${overlay}" != "x" ]; then + /sbin/findoverlay "$overlay" + else + mkdir -p /mnt/overlayfs + mount -n -t tmpfs -o mode=0755 none /mnt/overlayfs + dd if=/dev/null of=/mnt/overlayfs/overlay bs=1024 count=1 seek= $((512*1024)) 2> /dev/null + losetup /dev/loop119 /mnt/overlayfs/overlay + echo "overlayfs_dev=tmpfs" > /mnt/overlayfs/overlay.inuse + echo "overlayfs_fstype=tmpfs" >> /mnt/overlayfs/overlay.inuse + echo "/mnt/overlayfs/overlay.inuse" > /overlay.info + fi
This looks good; though as we had previously discussed, once this is working, we probably want auto to be the default and to be able to have overlay=off or overlay=ram or something to go back to the current mode.
So yeah, overall, this is looking pretty spiffily good to me and I'm leaning towards starting to get it merged in so that we can start getting real use of it
Jeremy
Jeremy Katz wrote:
On Mon, 2007-08-20 at 05:23 -0500, Douglas McClendon wrote:
Attached is a revision to the persistence implementation that I posted a couple weeks ago. This is mainly for Jeremy, Tim, and anyone else who is interested in working on this, or something similar. I.e. at the very least, it is worth a read to look at the issues I've dealt with, and the several that are in comments as TODO.
Couple of little things just to make reviewing easier, but not huge problems
- Might be good to keep the initscripts changes as patches rather than
an orig and a modified version. Will make things work even if other things in initscripts change and also makes it easier to know what's going on.
Agreed, this was still just a second pass....
- It's good to get into the habit of doing git commits for each separate
change. Then you can get a patch per change. And that would avoid having the addidir/addsdir stuff being in the same changes
Actually, I was sort of making a combined point, that I was using addsdir as my method of including the modified initscripts.
Long term, some sort elegant flexible permanent change to the initscripts are needed. Medium term, I was planning on having just the patches, and actually copying the patches into the initrd and having the livecd boot sequence patch the init scripts.
Hopefully this illustrates why as a developer, having addsdir (or better named, --add-dir-to-system) functionality is very nice. If you can show me a better workflow...
Now to get to the meat of things
index 0000000..8962720 --- /dev/null +++ b/creator/etc_rc.d_init.d_functions
I suspect that Bill might have some reservations about the hard-coded overlayfs piece. At the same time, it's all I can think of and it's not that out of line from other things in halt/rc.sysinit.
I agree with the reservations. I'm open to suggestions. For the moment, there is a lot of much uglier stuff to deal with first.
For the overlay info bit, we could potentially just stuff it in /sbin/halt.local for now I think.
I saw halt.local. I don't think you noticed how brutally ugly what I was doing was.
The goal of that code after halt.local is to get the overlayfs cleanly unmounted.
The way I currently accomplished that, was to YANK the snapshot overlay out of the root device. The only thing that makes this even remotely palatable, is the fact that the root device has been remounted read only. Which is the one thing that has happened between this code and the halt.local. (thus making halt.local not a workable place for this code)
Thinking about it, the way to make it less horrendously ugly, would be to copy the binaries used from the rootfs (dmsetup, losetup, rm, mount) to a tmpfs first, since after the yanking, there is really no guarantee that any data read from the rootfs can be relied on.
Or at least those are my thoughts on the issue right now.
diff --git a/creator/findoverlay b/creator/findoverlay new file mode 100755 index 0000000..e0674cc --- /dev/null +++ b/creator/findoverlay
This looks pretty good to me...
+# load filesystem modules that may be required for overlay +# TODO: only load these conditionally if vol_id detects a fs that needs them
Do they not get loaded automatically on the filesystem mount? That at least used to work.
Probably. These were liberal notes. Though maybe the fuse ntfs doesn't work as nicely. Not a big deal.
+# IMPORTANT TODO: while mount scanning find a way to determine if the +# filesystem was not cleanly unmounted. If so, IGNORE IT, +# as it may be part of a hibernated OS !!!!!!!
Maybe instead of using cleanly unmounted vs not as the key, we should look at swaps to see if they have the SWSUSP signature? That's a pretty straight-forward thing to check, but I can't quite convince myself if it's as safe or not.
My worry about this- is things like *3* current hibernate implementations for linux. That means that you have many possible signatures to check, and there is no way to predict signature changes in future versions of hibernation.
Another possible clincher is things like suspend2's (sorry, 'tux-on-ice' now) support for hibernation to files in the rootfs. I.e. I used to, and intend in the future, set up my system with no swap partition at all, doing swapfiles, and suspend2-suspend-to-file. (though I admit I'm currently getting some milage out of F7's much improved suspend out of the box)
+# CAVEAT: If the overlay file has a kin file with the suffix .inuse, this +# is evidence that that the overlay device was not unmounted cleanly. +# In _this_ case, look at the filesystem(???) and determine whether +# or not the most recent mount of the filesystem is more recent than +# the inuse file. *If and only if* NOT, then it is safe to assume +# that the filesystem is not part of a hibernated OS, and rather was +# most recently used as a persistence device that failed to be +# shutdown cleanly, thus it is safe to fsck the overlayfs, and then +# fsck the overlay-rootfs
If we checked for the swsusp case instead, would we be able to skip this?
see above...
Also, I just noticed that dumpe2fs does get me cleanly vs uncleanly mounted detection for ext2/3. And vfat I almost don't care about. I would like the same for ntfs, but as I'll mention again, I agree, ntfs support can be saved for the long term.
+# IMPORTANT TODO: since ext3 is such a pain (possible?) to mount readonly, +# and since similar issues may exist in other fs (ntfs???), +# I think it would be good to have a function called +# really_mount_readonly() which does a blockdev --setro, then +# does a devicemapper snapshot to ram, then does a mount of +# the snapshotted device, then checks for existence of +# overlay and .inuse files.
If the blockdev is read-only, do we really need to snapshot it too?
The point is that when blockdev is read-only, you just can't mount it. (I think. I'm pretty sure I even tried mounting ro as ext2 and that failed. But that seems so wrong, I wouldn't bet on it without trying first.).
I'll do more experimentation and things will become clearer.
+# RELATED: Given the above function, if a persistence file is detected, +# but the above above inuse/recent-mount-stamp test fails, give +# the user a 30-60 second timeout option to force an fsck and mount +# of the uncleanly mounted overlayfs, defaulting to not using it.
Probably fair. fsck in the initramfs might have fun around controlling terminals and sometimes wanting to drop to a shell, so needs some testing to make sure it's sane
+# TODO: All this multiple candidate code hasn't been tested recently (can't +# remember if it ever really did work). Though I have tested the +# typical auto case where one overlay is found and used.
Probably the most important one :)
Actually this was a relic. As I mentioned in the mail, I actually had tested this. And in fact, I learned, or relearned, a bit more about bash arrays, and the code doing this will look much cleaner soon.
+# TODO: verify that filesystems other than ext3 work. I know this will +# probably mean some interesting special case code.
+# TODO: handle nfs/network(fuse-httpfs?) persistence devices. This will +# require the ability to set up the network here, which is probably +# not trivial.
This is one of the reasons I want to get rid of mayflower and build up mkinitrd; mkinitrd already has all kinds of network setup code for nfs/iscsi root and then we could take advantage of that. And fwiw, I spent a little bit of time getting a branch of mkinitrd started being able to do so, but then ran into a need for modprobe to do something more. Will get back there eventually. On the plus side, trying to make sure that we can do that switch without it mattering much for things like the overlay finding code (just have to do the little plug-in similar to the mayflower change)
Yeah, I had noticed the nfs root stuff, which is part of what made me think of network sorts of possibilities here.
+# TODO: handle fsck'ing the rootfs if need be correctly? Or does the right +# thing just happen. I know that trying to use a persistence file +# from something that got unmounted uncleanly, seems to cause problems +# VERY quickly. This may be a fatal flaw... (or at least require +# some work)
fsck of the combined fs should happen fine once we get into the normal userspace. And the ro rootfs shouldn't need fsck'ing. So I *think* we should be fine. Only needing to then worry about the case of a persistence file from an uncleanly mounted filesystem. Which maybe can be punted by saying you use ext3 (with journal, therefore no need for fsck usually) or vfat (unclean unmount is less disasterous)
As mentioned by the 'fatal flaw', my apprehension is based on seeing how _very quickly_ things seem to fall over dead when trying to use a persistence file that did not get cleanly shut down. (while trying to access the fsck binary even...?)
More experimentation again, will flush this issue out. Obviously if the whole system/mechanism cannot robustly deal with repeated yank-the-plug situations, then it isn't going to work for real users.
I think I can put together a much more testable-quality patch fairly soon.
I'm not entirely sure about merge worthy within a week... But we'll see. And I guess I can see something safe enough to merge within a week, given the safe default code paths (i.e. not default to auto for f8t2)
- losetup /dev/loop119 /mnt/overlayfs/overlay
- echo "overlayfs_dev=tmpfs" > /mnt/overlayfs/overlay.inuse
- echo "overlayfs_fstype=tmpfs" >> /mnt/overlayfs/overlay.inuse
- echo "overlayfs_path=/overlay" >> /mnt/overlayfs/overlay.inuse
- echo "/mnt/overlayfs/overlay.inuse" > /overlay.info
Am I missing where this is used or is it just informational?
This isn't really necessary in the traditional tmpfs overlay case that you referenced here. I did it mainly for consistency. Also, as alluded to before, a userspace tool that could online grow the overlay file, would use this. As this /overlay.info file becomes the .inuse file which is visible later. (again, maybe unnecessary. We'll see if I actually find a real use for it)
diff --git a/creator/mayflower b/creator/mayflower index c1c5258..29cc8ec 100755 --- a/creator/mayflower +++ b/creator/mayflower @@ -268,6 +290,21 @@ for o in `cat /proc/cmdline` ; do live_locale=*) live_locale=${o#live_locale=} ;;
- #
- # dmc overlay: aesthetics, undecided about name persistence vs
overlay
I actually kind of like overlay. But yeah, aesthetics :)
I agree. Persistence is perhaps a better description of the feature for end users. But overlay has the dual benefits of being easier to type, and exposes a fairly appropriate amount of information about how it is implemented.
- # dmc overlay: if non-ram overlay searching is desired, do it,
- # otherwise, create overlay in ram as usual
- if [ "x${overlay}" != "x" ]; then
/sbin/findoverlay "\$overlay"
- else
mkdir -p /mnt/overlayfs
mount -n -t tmpfs -o mode=0755 none /mnt/overlayfs
dd if=/dev/null of=/mnt/overlayfs/overlay bs=1024 count=1 seek=
$((512*1024)) 2> /dev/null
losetup /dev/loop119 /mnt/overlayfs/overlay
echo "overlayfs_dev=tmpfs" > /mnt/overlayfs/overlay.inuse
echo "overlayfs_fstype=tmpfs" >> /mnt/overlayfs/overlay.inuse
echo "/mnt/overlayfs/overlay.inuse" > /overlay.info
- fi
This looks good; though as we had previously discussed, once this is working, we probably want auto to be the default and to be able to have overlay=off or overlay=ram or something to go back to the current mode.
Agreed. But this may be a safe avenue if you really want to put code this immature in f8t2.
So yeah, overall, this is looking pretty spiffily good to me and I'm leaning towards starting to get it merged in so that we can start getting real use of it
We'll see where I'm at in another 24-48 hours, cleaning up the most obviously ugly things and perhaps making a more testable patch.
-dmc
On Tue, 2007-08-21 at 16:29 -0500, Douglas McClendon wrote:
Jeremy Katz wrote:
On Mon, 2007-08-20 at 05:23 -0500, Douglas McClendon wrote: For the overlay info bit, we could potentially just stuff it in /sbin/halt.local for now I think.
I saw halt.local. I don't think you noticed how brutally ugly what I was doing was.
The goal of that code after halt.local is to get the overlayfs cleanly unmounted.
The way I currently accomplished that, was to YANK the snapshot overlay out of the root device. The only thing that makes this even remotely palatable, is the fact that the root device has been remounted read only. Which is the one thing that has happened between this code and the halt.local. (thus making halt.local not a workable place for this code)
Ah yeah, oh well. Was at least a thought. Although, continuing with just stupid thinking out loud, why yank it? We've mounted the root device ro, so we should also be able to make the overlay device read only at that point. Which would then lead to it being clean on the next boot and should be reliable. Unless I'm missing something, which I probably am since it's still early
Thinking about it, the way to make it less horrendously ugly, would be to copy the binaries used from the rootfs (dmsetup, losetup, rm, mount) to a tmpfs first, since after the yanking, there is really no guarantee that any data read from the rootfs can be relied on.
Alternately (and I've had this discussion before with someone, although I forget whom), we really want to be able to get back to running from the initramfs on shutdown. eg, that's the only way we'll ever be able to eject the CD for reboot. And at that point, we do have the binaries we care about and can rely on them and maybe could have this be cleaner.
+# IMPORTANT TODO: while mount scanning find a way to determine if the +# filesystem was not cleanly unmounted. If so, IGNORE IT, +# as it may be part of a hibernated OS !!!!!!!
Maybe instead of using cleanly unmounted vs not as the key, we should look at swaps to see if they have the SWSUSP signature? That's a pretty straight-forward thing to check, but I can't quite convince myself if it's as safe or not.
My worry about this- is things like *3* current hibernate implementations for linux. That means that you have many possible signatures to check, and there is no way to predict signature changes in future versions of hibernation.
There's no way to predict the future, period. Generally, as the world around the live image being created changes, things like initrds, etc have to evolve too. But, I'm not at all tied one way or another. I wonder if we could actually just take advantage of fsck to tell us if it's clean or dirty -- I guess only if we did a "don't change anything run" and not anything more programatic.
[snip]
So yeah, overall, this is looking pretty spiffily good to me and I'm leaning towards starting to get it merged in so that we can start getting real use of it
We'll see where I'm at in another 24-48 hours, cleaning up the most obviously ugly things and perhaps making a more testable patch.
If we want to get to where it's available by default in F8, it'll definitely be good to have something for test2, even if some of the "pull the plug" corner cases aren't happy. The only reason I feel okay with making the auto case the default is that while it's automatic to use if setup, you still have to setup the overlay file. So unless you've already done the setup work to opt-in, you're not going to get hit that hard.
Jeremy
Jeremy Katz wrote:
On Tue, 2007-08-21 at 16:29 -0500, Douglas McClendon wrote:
Jeremy Katz wrote:
On Mon, 2007-08-20 at 05:23 -0500, Douglas McClendon wrote: For the overlay info bit, we could potentially just stuff it in /sbin/halt.local for now I think.
I saw halt.local. I don't think you noticed how brutally ugly what I was doing was.
The goal of that code after halt.local is to get the overlayfs cleanly unmounted.
The way I currently accomplished that, was to YANK the snapshot overlay out of the root device. The only thing that makes this even remotely palatable, is the fact that the root device has been remounted read only. Which is the one thing that has happened between this code and the halt.local. (thus making halt.local not a workable place for this code)
Ah yeah, oh well. Was at least a thought. Although, continuing with just stupid thinking out loud, why yank it? We've mounted the root device ro, so we should also be able to make the overlay device read only at that point. Which would then lead to it being clean on the next boot and should be reliable. Unless I'm missing something, which I probably am since it's still early
No, I think you you're right. The same thought hit me last night. I think the key (if it works) is using losetup -r to change the overlay loopback device to readonly before trying the unmount. I think not realizing that that is possible is what put me on the wrong track.
Thinking about it, the way to make it less horrendously ugly, would be to copy the binaries used from the rootfs (dmsetup, losetup, rm, mount) to a tmpfs first, since after the yanking, there is really no guarantee that any data read from the rootfs can be relied on.
Alternately (and I've had this discussion before with someone, although I forget whom), we really want to be able to get back to running from the initramfs on shutdown. eg, that's the only way we'll ever be able to eject the CD for reboot. And at that point, we do have the binaries we care about and can rely on them and maybe could have this be cleaner.
I also came to this conclusion. It is also quite logical in the sense of tearing things down in the reverse order of how they are built up. Also if the ideas about putting gdm and other stuff into initramfs, that also makes more sense. Part of it, would be doing something similar to what I did with /mnt/overlayfs to make the originial initramfs mount point remain visible after the pivotroot. I'll give it a shot.
+# IMPORTANT TODO: while mount scanning find a way to determine if the +# filesystem was not cleanly unmounted. If so, IGNORE IT, +# as it may be part of a hibernated OS !!!!!!!
Maybe instead of using cleanly unmounted vs not as the key, we should look at swaps to see if they have the SWSUSP signature? That's a pretty straight-forward thing to check, but I can't quite convince myself if it's as safe or not.
My worry about this- is things like *3* current hibernate implementations for linux. That means that you have many possible signatures to check, and there is no way to predict signature changes in future versions of hibernation.
There's no way to predict the future, period. Generally, as the world around the live image being created changes, things like initrds, etc have to evolve too. But, I'm not at all tied one way or another. I wonder if we could actually just take advantage of fsck to tell us if it's clean or dirty -- I guess only if we did a "don't change anything run" and not anything more programatic.
That sounds promising.
[snip]
So yeah, overall, this is looking pretty spiffily good to me and I'm leaning towards starting to get it merged in so that we can start getting real use of it
We'll see where I'm at in another 24-48 hours, cleaning up the most obviously ugly things and perhaps making a more testable patch.
If we want to get to where it's available by default in F8, it'll definitely be good to have something for test2, even if some of the "pull the plug" corner cases aren't happy. The only reason I feel okay with making the auto case the default is that while it's automatic to use if setup, you still have to setup the overlay file. So unless you've already done the setup work to opt-in, you're not going to get hit that hard.
The reason why I would disagree (fairly strongly), is that even if you haven't set up the persistent device, all this code is going to be running at boot time, that is probing and looking at all the partitions on the system, in a much drastically more invasive fashion than has historically been the case. I don't think that there is anything that could happen in a week or two that would make me feel comfortable enough about the solidness of the code that I would endorse that.
Now, only doing that if the f8t2 user reads the disclaimer in the docs before finding out they need to type "overlay=auto" at the boot prompt, that I have no problem with.
-dmc
Hey,
On Tue, 2007-08-21 at 16:18 -0400, Jeremy Katz wrote:
- It's good to get into the habit of doing git commits for each
separate change. Then you can get a patch per change. And that would avoid having the addidir/addsdir stuff being in the same changes
FWIW, I don't think this approach useful, unless you're the type of person that gets every change perfect first time :-)
Basically, git records a history of how you developed a set of changes, rather than allowing you to work individual changes separately.
I've tried stgit (stacked git), but found it fairly cumbersome and confusing. I may try it again sometime, but right now I still use quilt for managing a set of patches ...
(And yes, quilt isn't perfect either - as demonstrated by me sending an older version of a patch yesterday)
Cheers, Mark.
On Tue, Sep 04, 2007 at 09:17:59AM +0100, Mark McLoughlin wrote:
Hey,
On Tue, 2007-08-21 at 16:18 -0400, Jeremy Katz wrote:
- It's good to get into the habit of doing git commits for each
separate change. Then you can get a patch per change. And that would avoid having the addidir/addsdir stuff being in the same changes
FWIW, I don't think this approach useful, unless you're the type of person that gets every change perfect first time :-)
Basically, git records a history of how you developed a set of changes, rather than allowing you to work individual changes separately.
Branches are "free" in git. So, you can make a branch for what you want to work on, do it there with lots of commits, then when you're really happy with the change and want to apply it as one or more commits to your master branch, you can create a new branch from master; git diff master..yourdevbranch; apply the patches and commit as you like in your new branch; git push master. So you get the best of both worlds.
-Matt
On Tue, 2007-09-04 at 06:19 -0500, Matt Domsch wrote:
On Tue, Sep 04, 2007 at 09:17:59AM +0100, Mark McLoughlin wrote:
Hey,
On Tue, 2007-08-21 at 16:18 -0400, Jeremy Katz wrote:
- It's good to get into the habit of doing git commits for each
separate change. Then you can get a patch per change. And that would avoid having the addidir/addsdir stuff being in the same changes
FWIW, I don't think this approach useful, unless you're the type of person that gets every change perfect first time :-)
Basically, git records a history of how you developed a set of changes, rather than allowing you to work individual changes separately.
Branches are "free" in git. So, you can make a branch for what you want to work on, do it there with lots of commits, then when you're really happy with the change and want to apply it as one or more commits to your master branch, you can create a new branch from master; git diff master..yourdevbranch; apply the patches and commit as you like in your new branch; git push master. So you get the best of both worlds.
Granted, but if you're working on a few interrelated patches, this gets seriously cumbersome. Yes, you could use branches of branches but ...
Point is that git doesn't do a good job of this right now - although stgit is promising - so I think git would ultimately just discourage people from the concept of splitting their patches into nice easy-to-review chunks.
It may not be terribly sophisticated, but at least quilt has the right model and is simple to use and understand. A stack of patches, each independently hackable.
Cheers, Mark.
livecd@lists.fedoraproject.org