On Fri, Mar 26, 2021 at 9:41 AM Chris Murphy lists@colorremedies.com wrote:
On Thu, Mar 25, 2021 at 6:00 AM Richard Shaw hobbes1069@gmail.com wrote:
So how long do you wait until you consider the drive "good"? :)
I'm not in a hurry so I could setup two of the drives in a RAID1 mirror
and copy my media over and just let it run for a while before I add disks 3 & 4.
We don't have much control over when and how bugs manifest. It could be true that drive firmware is improperly reordering 1 in 100 commits, or only a particular sequence of flushing, or always. Such behavior not a problem by itself, it takes a power fail or a crash to expose the problem at just the right time. The whole point of write ordering is to make sure the file system is consistent.
In the case of btrfs, the write order is simplistically: data->metadata->flush/fua->superblock->flush/fua
What makes this safe with copy on write is no data or metadata is overwritten, so there's no in between state. All the data writes in a commit are represented by metadata (the btrees) in that commit, and those trees aren't pointed to as current and valid until they are on stable media. That's the point of the first flush. Then comes the superblock which is what points to the new trees.
The problem happens if the super is written pointing to new trees before all the metadata (or data) has arrived on stable media. And then you get a power fail or crash. Now the super block is pointing to locations that don't have consistent tree state and you get some variation on mount failure.
Ok, so I'm struggling a bit here :)
I appreciate all the detailed response, but at the same time the answers seem often bi-polar... You can do all these great things with BTRFS! But even if you test your raid array multiple times, a bad firmware may still eat all your data :)
I know you can't give absolute answers sometimes, but it feels like you're often a btrfs cheerleader and critic at the same time :)
It sounds like there needs to be an easy to find list of known good and known bad drives to use (be it btrfs or other similar filesystem).
My plan is to use Seagate Terascale drives which as far as I can tell are CMR and not SMR at least.
Thanks, Richard