Suggestion: bmap files and bmaptool

Thu Aug 15 17:34:26 UTC 2013

On 8/13/13 8:58 AM, Artem Bityutskiy wrote:
> # Make the image to be sparse
> $ cp --sparse=always Fedora-x86_64-19-20130627-sda.raw Fedora-x86_64-19-20130627-sda.raw.sparse
> 
> # Generate the bmap file
> $ bmaptool create Fedora-x86_64-19-20130627-sda.raw.sparse -o Fedora-x86_64-19-20130627-sda.raw.sparse.bmap

So this is the part that interests me . . . 

There seem to be two issues here; how do we efficiently (compress and) transport sparse files while retaining sparseness, and how do we efficiently operate on files which are already sparse.

For the latter, you're using your bmap tool to map what is hopefully a static file (via fibmap or fiemap, I guess?).

I haven't looked at how you've done it, but you do need to be very careful that the file is stable & quiesced on disk.  Mapping it this way can be fraught with errors if the file is changing, or has delalloc blocks, etc.  And of course getting the mapping wrong means data corruption.  If the file is known to be sparse, then going forward, using SEEK_HOLE / SEEK_DATA is probably the best approach.

But then there's the issue of transporting these sparse files around.  We have had the same problem in the past with large e2image metadata image files, which may be terabytes in length, with only gigabytes or megabytes of real data.  e2image _itself_ creates a sparse file, but bzipping it or rsyncing it still processes terabytes of zeros, and loses all notion of sparseness.

xfs_metadump worked around this by creating its own compact format describing a sparse file's data & sparseness, which is "unpacked" into a normal sparse file by xfs_mdrestore.

More recently e2image gained something slightly similar, but used the existing qcow format to encode the sparseness.  qemu-image convert to "raw" type turns it back into a "normal" sparse file readable by e2fsprogs tools.

So I guess your solution requires 2 pieces of information; the existing file, and the mapping file.  Are there mechanisms to ensure that they are in sync?

Another approach which might (?) be more robust, is to somehow encode that sparseness in a single file format that can be transported/compressed/copied w/o losing the sparseness information, and another tool to operate efficiently on that format at the destination, either by unpacking it to a normal sparse file or piping it to some other process.

Just some thoughts...

-Eric