Chris Murphy wrote on Wed, Jul 01, 2020:
This is called 'dup' profile in Btrfs. Two copies of a block
group. It
can be set on metadata only, or both metadata and data block groups.
It is the default mkfs option for HDDs. It is not enabled by default
on SSDs because concurrent writes of metadata i.e. they happen
essentially at the exact same time, means the data is likely to end up
on the same erase block, and typical corruptions affect the whole
block so it's widely considered to be pointless to use dup on flash
media. You can use it anyway, either with mkfs, or by converting the
block group from the single profile to dup. This is a safe procedure.
Does anyone know if anything in the nvme spec says that creating two
namespaces should or could prevent coalescing IO like this?
perhaps is the blocksize is different?
(this doesn't really help with default setup case, but it could make
sense to split the disk in two with data single + metadata raid1 over
a nvme namespace for people who can bother creating one. Unfortunately
nvme namespaces are rather messy and I don't think autopartitionning
tools should mess with that, but having a raid just for metadata is one
of btrfs' strength so it's a shame to pass on it... Alternatively it
would require something like async copyback of the second metadata copy
but that in itself has a lot of other problems and don't really look
like an option)
--
Dominique