Suggestion: bmap files and bmaptool

Fri Aug 16 08:46:26 UTC 2013

Hi Eric,

thanks for the question. Sorry my answers contain extra information, but
I assume that other people read this, and may benefit from the info.
After all, I am trying to get people interested, this is a good tool
IMO, and I would like to get more users and hopefully contributors. :-)

On Thu, 2013-08-15 at 12:34 -0500, Eric Sandeen wrote:
> On 8/13/13 8:58 AM, Artem Bityutskiy wrote:
> > # Make the image to be sparse
> > $ cp --sparse=always Fedora-x86_64-19-20130627-sda.raw
> Fedora-x86_64-19-20130627-sda.raw.sparse
> > 
> > # Generate the bmap file
> > $ bmaptool create Fedora-x86_64-19-20130627-sda.raw.sparse -o
> Fedora-x86_64-19-20130627-sda.raw.sparse.bmap
> 
> So this is the part that interests me . . .

Before going further, I want to quckly note that the Tizen image
generation software uses the BmapCreate library API directly, instead of
running the bmaptool command-line utility. The library comes with the
bmap-tools project.

http://git.infradead.org/users/dedekind/bmap-tools.git/blob/refs/heads/devel:/bmaptools/BmapCreate.py

> There seem to be two issues here; how do we efficiently (compress and)
> transport sparse files while retaining sparseness, and how do we
> efficiently operate on files which are already sparse.

Yes, it is our assumption is that sparseness gets lost as soon as the
image is  compressed or copied to the download server, or anywhere else.

The idea is that as soon as you generate the raw image on the build
server, you generate the bmap file right away, _before_ the sparseness
gets lost.

The sparseness information is then saved in the bmap file. Then you can
compress the image, copy it around, and lose the sparseness. The bmap
file preserves it.

And of course this means that you should not modify the image later on,
otherwise the bmap file becomes incorrect, and the checksums, which are
inside the bmap file, will probably mismatch.

> For the latter, you're using your bmap tool to map what is hopefully a
> static file (via fibmap or fiemap, I guess?).

Yes, we use FIEMAP. Here is the python module which does the job:

http://git.infradead.org/users/dedekind/bmap-tools.git/blob/refs/heads/devel:/bmaptools/Fiemap.py

> I haven't looked at how you've done it, but you do need to be very
> careful that the file is stable & quiesced on disk.

Right. We generate the bmap file on the _build server_, inside the tool
which generates the raw image. At this point we do know we fully control
the image, and no one touches it while we generate the bmap.

But yes, this is a good point, may be I need to put it to the man page.
Which, by the way, as I figure out now, needs to be somewhat updated:

http://git.infradead.org/users/dedekind/bmap-tools.git/blob/refs/heads/devel:/docs/man1/bmaptool.1

>   Mapping it this way can be fraught with errors if the file is
> changing, or has delalloc blocks, etc.

Good point. Tizen image generator fsync()'s the file before creating
bmap, but I guess I have to do this in the BmapCreate library too, to be
safe.

Thanks!

>   And of course getting the mapping wrong means data corruption.

Right. But as I said, we are using bmaptool for a year now, and nothing
which looks like a corruption was reported so far.

But the importance of fsync() is a very good point, I'll improve the
library and make it explicitely fsync, and probably ignore the EROFS
error, in case the file is R/O.

>   If the file is known to be sparse, then going forward, using
> SEEK_HOLE / SEEK_DATA is probably the best approach.

Why are they better than FIEMAP?

I did consider them, actually, but they are very new, and build servers
tend to use older kernels, so I chose FIEMAP. I actually first used
FIBMAP, but it is too slow, so I switched to FIEMAP.
> 
> But then there's the issue of transporting these sparse files around.
> We have had the same problem in the past with large e2image metadata
> image files, which may be terabytes in length, with only gigabytes or
> megabytes of real data.  e2image _itself_ creates a sparse file, but
> bzipping it or rsyncing it still processes terabytes of zeros, and
> loses all notion of sparseness.

Right, but the scenario I keep in mind is that the bmap file is created
at the _very_ beginning, and carried/published together with the image,
as a stand-alone file with the same basename and ".bmap" extension.

The zeroes in the image can be very well compressed with xz, so people
download/copy a lot less than Terabytes. And then people just run this
command to re-create the original sparse file:

$ bmaptool copy --bmap huge.img.bmap huge.img.xz a_sparse_copy.img

This will decompress huge.img.xz on-the-fly and put it to
a_sparse_copy.img. The a_sparse_copy.img file will be sparse.

Note, it bmaptool auto-discovers the bmap file if it has a common
basename with the image, and if it sits in the same directory, so this
command can instead be:

bmaptool copy huge.img.xz a_sparse_copy

(analogy to "cp from to").

And of course, "huge.img.xz" can be, say:

bmaptool copy http://my.server/x/y/huge.img.xz a_sparse_copy

When the target is a block device, bmaptool has some optimizations to
copy faster (e.g., switching to noop I/O scheduler), trash RAM less,
react on Ctrl-C, and some more.

> xfs_metadump worked around this by creating its own compact format
> describing a sparse file's data & sparseness, which is "unpacked" into
> a normal sparse file by xfs_mdrestore.

Frankly, I know little about XFS so I do not really understand the
above.
> 
> More recently e2image gained something slightly similar, but used the
> existing qcow format to encode the sparseness.  qemu-image convert to
> "raw" type turns it back into a "normal" sparse file readable by
> e2fsprogs tools.

OK. So it could in theory use/generate a stand-alone bmapfile too.

> So I guess your solution requires 2 pieces of information; the
> existing file, and the mapping file.  

Right.

> Are there mechanisms to ensure that they are in sync?

There is built-in SHA1 for all mapped areas, you can look for example
here:

http://download.tizen.org/releases/milestone/tizen/ivi/latest/images/ivi-release-efi-i586/tizen_20130729.2_ivi-release-efi-i586-sdb.bmap

For unmapped areas we could check that they are all-zeros, but bmaptool
does not currently do this.

And this is another good idea, thanks!

So if sha1's for mapped areas match, and unmapped areas are all-zeros,
the integrity is fine.

Bmap file also carries sha1 of itself, which bmaptool verifies.

And yes, I already agreed that I should change sha1 with sha256.

And yes, I agreed that bmaptool should allow for a GPG signature in the
bmap file.
> 
> Another approach which might (?) be more robust, is to somehow encode
> that sparseness in a single file format that can be
> transported/compressed/copied w/o losing the sparseness information,
> and another tool to operate efficiently on that format at the
> destination, either by unpacking it to a normal sparse file or piping
> it to some other process.

Err, not sure I fully understand, but it sounds like what bmap-tools
project actually does.

Piping is not implemented, because sparseness cannot be easily passed
though a pipe.

> Just some thoughts...
> 
Thanks a lot for the feed-back!

-- 
Best Regards,
Artem Bityutskiy