Suggestion: bmap files and bmaptool

Sun Aug 18 15:43:56 UTC 2013

On 08/16/2013 09:46 AM, Artem Bityutskiy wrote:
> Hi Eric,
> 
> thanks for the question. Sorry my answers contain extra information, but
> I assume that other people read this, and may benefit from the info.
> After all, I am trying to get people interested, this is a good tool
> IMO, and I would like to get more users and hopefully contributors. :-)
> 
> On Thu, 2013-08-15 at 12:34 -0500, Eric Sandeen wrote:
>> On 8/13/13 8:58 AM, Artem Bityutskiy wrote:
>>> # Make the image to be sparse
>>> $ cp --sparse=always Fedora-x86_64-19-20130627-sda.raw
>> Fedora-x86_64-19-20130627-sda.raw.sparse
>>>
>>> # Generate the bmap file
>>> $ bmaptool create Fedora-x86_64-19-20130627-sda.raw.sparse -o
>> Fedora-x86_64-19-20130627-sda.raw.sparse.bmap
>>
>> So this is the part that interests me . . .
> 
> Before going further, I want to quckly note that the Tizen image
> generation software uses the BmapCreate library API directly, instead of
> running the bmaptool command-line utility. The library comes with the
> bmap-tools project.
> 
> http://git.infradead.org/users/dedekind/bmap-tools.git/blob/refs/heads/devel:/bmaptools/BmapCreate.py
> 
>> There seem to be two issues here; how do we efficiently (compress and)
>> transport sparse files while retaining sparseness, and how do we
>> efficiently operate on files which are already sparse.
> 
> Yes, it is our assumption is that sparseness gets lost as soon as the
> image is  compressed or copied to the download server, or anywhere else.
> 
> The idea is that as soon as you generate the raw image on the build
> server, you generate the bmap file right away, _before_ the sparseness
> gets lost.
> 
> The sparseness information is then saved in the bmap file. Then you can
> compress the image, copy it around, and lose the sparseness. The bmap
> file preserves it.
> 
> And of course this means that you should not modify the image later on,
> otherwise the bmap file becomes incorrect, and the checksums, which are
> inside the bmap file, will probably mismatch.
> 
>> For the latter, you're using your bmap tool to map what is hopefully a
>> static file (via fibmap or fiemap, I guess?).
> 
> Yes, we use FIEMAP. Here is the python module which does the job:
> 
> http://git.infradead.org/users/dedekind/bmap-tools.git/blob/refs/heads/devel:/bmaptools/Fiemap.py
> 
>> I haven't looked at how you've done it, but you do need to be very
>> careful that the file is stable & quiesced on disk.
> 
> Right. We generate the bmap file on the _build server_, inside the tool
> which generates the raw image. At this point we do know we fully control
> the image, and no one touches it while we generate the bmap.
> 
> But yes, this is a good point, may be I need to put it to the man page.
> Which, by the way, as I figure out now, needs to be somewhat updated:
> 
> http://git.infradead.org/users/dedekind/bmap-tools.git/blob/refs/heads/devel:/docs/man1/bmaptool.1
> 
>>   Mapping it this way can be fraught with errors if the file is
>> changing, or has delalloc blocks, etc.
> 
> Good point. Tizen image generator fsync()'s the file before creating
> bmap, but I guess I have to do this in the BmapCreate library too, to be
> safe.
> 
> Thanks!
> 
>>   And of course getting the mapping wrong means data corruption.
> 
> Right. But as I said, we are using bmaptool for a year now, and nothing
> which looks like a corruption was reported so far.
> 
> But the importance of fsync() is a very good point, I'll improve the
> library and make it explicitely fsync, and probably ignore the EROFS
> error, in case the file is R/O.
> 
>>   If the file is known to be sparse, then going forward, using
>> SEEK_HOLE / SEEK_DATA is probably the best approach.
> 
> Why are they better than FIEMAP?
> 
> I did consider them, actually, but they are very new, and build servers
> tend to use older kernels, so I chose FIEMAP. I actually first used
> FIBMAP, but it is too slow, so I switched to FIEMAP.
>>
>> But then there's the issue of transporting these sparse files around.
>> We have had the same problem in the past with large e2image metadata
>> image files, which may be terabytes in length, with only gigabytes or
>> megabytes of real data.  e2image _itself_ creates a sparse file, but
>> bzipping it or rsyncing it still processes terabytes of zeros, and
>> loses all notion of sparseness.
> 
> Right, but the scenario I keep in mind is that the bmap file is created
> at the _very_ beginning, and carried/published together with the image,
> as a stand-alone file with the same basename and ".bmap" extension.
> 
> The zeroes in the image can be very well compressed with xz, so people
> download/copy a lot less than Terabytes. And then people just run this
> command to re-create the original sparse file:
> 
> $ bmaptool copy --bmap huge.img.bmap huge.img.xz a_sparse_copy.img
> 
> This will decompress huge.img.xz on-the-fly and put it to
> a_sparse_copy.img. The a_sparse_copy.img file will be sparse.
> 
> Note, it bmaptool auto-discovers the bmap file if it has a common
> basename with the image, and if it sits in the same directory, so this
> command can instead be:
> 
> bmaptool copy huge.img.xz a_sparse_copy
> 
> (analogy to "cp from to").
> 
> And of course, "huge.img.xz" can be, say:
> 
> bmaptool copy http://my.server/x/y/huge.img.xz a_sparse_copy
> 
> When the target is a block device, bmaptool has some optimizations to
> copy faster (e.g., switching to noop I/O scheduler), trash RAM less,
> react on Ctrl-C, and some more.
> 
>> xfs_metadump worked around this by creating its own compact format
>> describing a sparse file's data & sparseness, which is "unpacked" into
>> a normal sparse file by xfs_mdrestore.
> 
> Frankly, I know little about XFS so I do not really understand the
> above.
>>
>> More recently e2image gained something slightly similar, but used the
>> existing qcow format to encode the sparseness.  qemu-image convert to
>> "raw" type turns it back into a "normal" sparse file readable by
>> e2fsprogs tools.
> 
> OK. So it could in theory use/generate a stand-alone bmapfile too.
> 
>> So I guess your solution requires 2 pieces of information; the
>> existing file, and the mapping file.  
> 
> Right.
> 
>> Are there mechanisms to ensure that they are in sync?
> 
> There is built-in SHA1 for all mapped areas, you can look for example
> here:
> 
> http://download.tizen.org/releases/milestone/tizen/ivi/latest/images/ivi-release-efi-i586/tizen_20130729.2_ivi-release-efi-i586-sdb.bmap
> 
> For unmapped areas we could check that they are all-zeros, but bmaptool
> does not currently do this.
> 
> And this is another good idea, thanks!
> 
> So if sha1's for mapped areas match, and unmapped areas are all-zeros,
> the integrity is fine.

You definitely need the fsync before doing the fiemap.
We saw this on certain file systems including ext4 when adding
fiemap support (efficient reading of holes) to cp.
This is a bug in the fiemap interface IMHO in that it returns
fairly useless data unless FIEMAP_FLAG_SYNC is specified.
For a general utility like cp, we couldn't sync each file before copying
(even only large files), so we restrict fiemap usage to files that
have a different disk usage than apparent size and so probably contain holes.

cheers,
Pádraig.