On Thu, Dec 12, 2019 at 11:05 PM Damian Ivanov <damianatorrpm(a)gmail.com> wrote:
Hello Chris,
Thanks for the response.
What a data loss or data corruption is has to be carefully defined
(whole partition or file based etc).
As an indirect result of fstrim not being enabled I have experienced data loss:
The performance was so degraded that I had tried setting commit=120
on all partitions and after pulling the plug since a simple copy operation may
still result in kernel oops - I have only seen that one with the MuQss
scheduler which I tried once - after completely freezing 2 minutes of
changes may be lost.
(Yes so slow copying files result in kernel oops - or permanently
frozen system, at the maximum I left the computer for hours it was
still doing intensively something so that the heat was rising
and the system under heavy load, mentioning hardware weariness, still
talking a simple short copy operation here, which now takes less than
five minutes - with journalctl's last boot words being that the
libinput could't process events of the razer mouse).
So as an indirect result of fstrim not being enabled I have
experienced data loss.
There also can be a distinction between defaults in RHEL and Fedora.
Though I don't think that enterprise SSD or VM's are affected by
fstrim causing data issues.
I consider all of these examples are buggy device behavior. TRIM is
supposed to be an optimization, not a requirement. And the
optimization shouldn't cause any corruption or data loss. It should be
true the manufacturer offers a firmware update to fix all of these
problems, including yours. What you're describing for your case is
that it's mandatory, or the experience is so bad that it's essentially
unusable.
The problem with some devices hanging on TRIM does kinda bug me.
That's in something of a gray area, whether it's a bug or bad design,
or just a consequence of the technology at the time. I'd say discard
mount option should not be considered, but fstrim.timer (once per
week) could be considered. That would mean instead of frequent small
hangs, there'd be a bigger one once a week - I have no way of
assessing how noticeable it is, it would be make/model and workload
specific.
It might be worth doing a system wide change proposal to enable
fstrim.timer by default so it has the proper visibility. There was a
feature a few releases ago to enable discard pass down with LUKS
encrypted volumes. It was approved and it is in place, but its not
taken advantage of because neither fstrim.timer nor discard mount
option are set by default.
--
Chris Murphy