On Tue, Oct 11, 2016 at 4:17 PM, Gerald B. Cox <gbcox(a)bzb.us> wrote:
On Tue, Oct 11, 2016 at 2:38 PM, Tomasz Kłoczko <kloczko.tomasz(a)gmail.com>
> You have still 4th option:
> - create from snapshots separated boot environment (BE)
> - do upgrade in BE
> - restart system from new BE
> To be honest only tis method solves all hazards which are not listed in
> your 3 points and additionally you will have possibility to create packages
> without any post install/uninstall scripts which are biggest risk factor of
> ANY upgrades.
Thats fine, but first of all - those aren't my points - they are from the
link I included regarding Project Tracer.
My comment was directed at folks who were concerned about running dnf from a
terminal within a DE - and
were interested in some type of risk mitigation now. Your suggestion
requires a bit more work.
Running an update in the DE on an out of tree snapshot of the file
system is fairly trivial. The harder part is merging the changes that
happen from snapshot time to reboot time, and managing rollbacks.
As far as BTRFS
is concerned however, I believe that ship has sailed. I used it for 4
years, but after the recent news regarding RAID
The only news about Btrfs RAID I can think of that you're referring to
is the raid5 scrub bug that corrupts parity in a very specific
situation. It's a bad bug, but it's also really rare. First it
requires a data strip to contain corruption. During scrub, the data
block fails checksum, Btrfs does a good reconstruction from parity and
repairs the bad data strip, but then goes on to *sometimes* wrongly
recomputing parity and overwriting the good parity with bad parity. So
in effect, it has silently transposed the corruption from a data strip
to parity strip. Normal operation, you get your data, uncorrupted. If
you lose a drive, and now that bad parity is needed, it results in a
bad reconstruction of data, which results in EIO because the bad data
fails checksum. So to get this form of data loss you need an already
bad data strip, a scrub that hits this particular bug, and then lose a
device. But you don't get corrupt data propagated upward.
It's uncertain how this manifests on raid6, I haven't tested it. My
expectation is the failed csum from reconstruction using p-parity will
result in another attempt using q-parity, and then fixing up the data
and p-parities if the reconstruction passes the data checksum.
Understand that in the identical situation with mdraid and lvm raid, a
scrub check would only report a mismatch, it wouldn't tell you which
was correct. And if you did a scrub repair, it will assume data strips
are valid, and in this case it'd *always* result in the wrong parity
being written over good.
Anyway it's a bad bug. But it's not correct to assign this bug to the
other three raid profiles or Btrfs in general - where it has numerous
features not all of which have the same maturity.
I switched everything to XFS.
There are many good and valid reasons to use it.