Re: BTRFS partition corrupted after deleting files in /home
by Sreyan Chakravarty
On Mon, Jan 4, 2021 at 10:14 PM Chris Murphy <lists(a)colorremedies.com> wrote:
> transid errors like this indicate out of order writes due to drive
> firmware not honoring file system write ordering and then getting a
> badly timed crash/powerfail/shutdown.
First of all thanks for your quick response.
So would I be correct assuming that the problem is in my firmware ? Or
is it too early to say anything like that ?
Is my firmware so outdated that it can't handle BTRFS ?
> You report that the file system went read only while using it. This
> suggests a dropped write and the file system went read-only to limit
> the damage. Ideally we'd get the log, if it made it to disk, to see
> what lead up to this so we can determine what the problem is and get
> it fixed. What I can tell you is this is not user error but that's not
> much comfort.
>
Well it doesn't provide comfort but at least I can say that it wasn't
me who messed up my filesystem.
>
> Yeah that's bad. I think it's fixable. We need to get a metadata dump
> of the file system to see if fsck will fix it.
>
> btrfs-image -c9 -t4 /dev/sdXY /mnt/path/to/file
>
> That will include filenames but not any data. If you need to mask
> filenames, add -ss option to the above. (-s won't help here). And the
> path to file if you're on a live USB stick can just be something like
> ~/sreyan-btrfs.img and then put it up on the google drive.
I don't think there is any hope for my data, as I can't even create
the meta-data image:
# btrfs-image -c9 -t4 /dev/mapper/dm_crypt /run/media/liveuser/Backup\
Plus/btrfs_meta.img
parent transid verify failed on 55640064 wanted 44146 found 44438
parent transid verify failed on 55640064 wanted 44146 found 44438
parent transid verify failed on 55640064 wanted 44146 found 44438
Ignoring transid failure
parent transid verify failed on 55902208 wanted 44170 found 44438
Ignoring transid failure
parent transid verify failed on 56410112 wanted 44170 found 44439
Ignoring transid failure
parent transid verify failed on 58621952 wanted 44170 found 44439
Ignoring transid failure
ERROR: child eb corrupted: parent bytenr=178081497088 item=246 parent
level=1 child level=2
ERROR: cannot go to next leaf -5
ERROR: create failed: -5
What do I do now ?
> I'm on irc.freenode.net as cmurf that's usually the easier way to get
> help, on #fedora channel.
>
Do I need to have a bouncer ? I am in India, and I believe you are in
the US, so when you are active, I am usually sleeping.
> Also, have you ever done a balance on this file system? (That is not a
> suggestion that you should or shouldn't have. Just a yes or no
> question to try and piece together some other data points.)
>
No never did anything like that.
--
Regards,
Sreyan Chakravarty
3 years, 3 months
Re: BTRFS partition corrupted after deleting files in /home
by Chris Murphy
On Mon, Jan 4, 2021 at 11:32 AM Sreyan Chakravarty <sreyan32(a)gmail.com> wrote:
>
> On Mon, Jan 4, 2021 at 10:14 PM Chris Murphy <lists(a)colorremedies.com> wrote:
> > transid errors like this indicate out of order writes due to drive
> > firmware not honoring file system write ordering and then getting a
> > badly timed crash/powerfail/shutdown.
>
> First of all thanks for your quick response.
>
> So would I be correct assuming that the problem is in my firmware ? Or
> is it too early to say anything like that ?
Too early. The usual case of transid errors is drive firmware bugs
*and* ill timed shutdown. Since you don't have an ill timed shutdown,
it's less likely this is a drive firmware bug, but can't be ruled out.
i.e. I'm proposing there might be a software bug here and we just need
to figure it out. Bad memory usually shows up as bit flips and doesn't
result in damage like this - but it has to be considered whether a
bitflip can affect code.
It can also be a kernel bug - the storage stack has many layers, not
just Btrfs and dm-crypt. But no one wants to go blaming other people's
work without understanding the problem.
> Is my firmware so outdated that it can't handle BTRFS ?
No. It's a bit complicated.
Buggy drive firmware is common. But normally it doesn't matter mainly
due to good luck. More than one thing has to go wrong to cause a
problem like (a) firmware bug exists (b) firmware bug is triggered (c)
crash/powerfail. If one of those is not true, then it's not a problem.
There is also the transient hardware defect problem that can act like
a bug but it's just rotting the metadata or data. It's not obvious but
it is possible to piece together what's happened when we have enough
information.
> # btrfs-image -c9 -t4 /dev/mapper/dm_crypt /run/media/liveuser/Backup\
> Plus/btrfs_meta.img
>
> parent transid verify failed on 55640064 wanted 44146 found 44438
> parent transid verify failed on 55640064 wanted 44146 found 44438
> parent transid verify failed on 55640064 wanted 44146 found 44438
> Ignoring transid failure
> parent transid verify failed on 55902208 wanted 44170 found 44438
> Ignoring transid failure
> parent transid verify failed on 56410112 wanted 44170 found 44439
> Ignoring transid failure
> parent transid verify failed on 58621952 wanted 44170 found 44439
> Ignoring transid failure
> ERROR: child eb corrupted: parent bytenr=178081497088 item=246 parent
> level=1 child level=2
> ERROR: cannot go to next leaf -5
> ERROR: create failed: -5
>
> What do I do now ?
Rats. Can you retry by adding -w option? In the meantime I'll report
back to upstream and see what they recommend next.
> > I'm on irc.freenode.net as cmurf that's usually the easier way to get
> > help, on #fedora channel.
> >
>
> Do I need to have a bouncer ? I am in India, and I believe you are in
> the US, so when you are active, I am usually sleeping.
An alternative is matrix. We have a matrix-irc bridge in #fedora and
pretty soon I think the plan is to switch mainly to matrix. So if you
know about matrix then you can join #fedora - but I don't know how to
explain it very well since I don't use matrix yet. I think it keeps
the history for you, unlike IRC (I use a bouncer so I will see your
messages later). I keep weird hours so it might overlap at some point.
--
Chris Murphy
3 years, 3 months
Re: maybe OT
by George N. White III
On Fri, 18 Mar 2022 at 19:47, Paolo Galtieri <pgaltieri(a)gmail.com> wrote:
> I'm having issues with a VM.
>
It would be useful to mention the host OS. From the name, I guess your
VM is running Fedora 34.
>
> The VM was originally created under VMware and has worked fine for a
> while. Today when I booted it up instead of seeing the usual MATE login
> screen I get a login prompt:
>
> f34-01-vm:
>
> no matter what I enter, root or pgaltieri as login it never asks for
> password and immediately says login incorrect. While it's booting I see
> several [FAILED]... messages, e.g. [FAILED] to start CUPS Scheduler
>
> I booted the system again and this time it dropped into emergency mode.
> In emergency mode I see the following messages in dmesg:
>
> BTRFS info (device sda2): flagging fs with big metadata feature
> BTRFS info (device sda2): disk space caching is enabled
> BTRFS info (device sda2): has skinny extents
> BTRFS info (device sda2): start tree-log replay
> BTRFS info (device sda2): parent transid verify failed on 61849600
> wanted 145639 fount 145637
> BTRFS info (device sda2): parent transid verify failed on 61849600
> wanted 145639 fount 145637
> BTRFS: error (device sda2) in btrfs_replay_log:2423 errno=-5 IO failure
> (Failed to recover log tree)
> BTRFS error (device sda2) open_ctree failed
>
> I ran btrfs check in emergency mode and it came up with a lot of errors.
>
> How do i recover the partition(s) so I can boot the system, or at least
> mount them?
>
The underlying problem could be the physical disk that holds the VM's
virtual disk file, or a corrupt btrfs. Avoid doing anything that would
write to the
virtual disk. Make a backup copy of the virtual disk. If the physical
drive
is OK, use a separate VM to mount the Fedora 34 virtual disk for repair
attempts.
Try: https://btrfs.wiki.kernel.org/index.php/FAQ
How do I recover from a parent transid verify failed error?
At one time VirtualBox had issues with btrfs. You should check for similar
reports for VMWare and btrfs.
--
George N. White III
2 years, 1 month
Re: maybe OT
by Paolo Galtieri
The host OS is also F34.
On 3/20/22 08:14, George N. White III wrote:
> On Fri, 18 Mar 2022 at 19:47, Paolo Galtieri <pgaltieri(a)gmail.com> wrote:
>
> I'm having issues with a VM.
>
>
> It would be useful to mention the host OS. From the name, I guess your
> VM is running Fedora 34.
>
>
> The VM was originally created under VMware and has worked fine for a
> while. Today when I booted it up instead of seeing the usual MATE
> login
> screen I get a login prompt:
>
> f34-01-vm:
>
> no matter what I enter, root or pgaltieri as login it never asks for
> password and immediately says login incorrect. While it's booting
> I see
> several [FAILED]... messages, e.g. [FAILED] to start CUPS Scheduler
>
> I booted the system again and this time it dropped into emergency
> mode.
> In emergency mode I see the following messages in dmesg:
>
> BTRFS info (device sda2): flagging fs with big metadata feature
> BTRFS info (device sda2): disk space caching is enabled
> BTRFS info (device sda2): has skinny extents
> BTRFS info (device sda2): start tree-log replay
> BTRFS info (device sda2): parent transid verify failed on 61849600
> wanted 145639 fount 145637
> BTRFS info (device sda2): parent transid verify failed on 61849600
> wanted 145639 fount 145637
> BTRFS: error (device sda2) in btrfs_replay_log:2423 errno=-5 IO
> failure
> (Failed to recover log tree)
> BTRFS error (device sda2) open_ctree failed
>
> I ran btrfs check in emergency mode and it came up with a lot of
> errors.
>
> How do i recover the partition(s) so I can boot the system, or at
> least
> mount them?
>
>
> The underlying problem could be the physical disk that holds the VM's
> virtual disk file, or a corrupt btrfs. Avoid doing anything that
> would write to the
> virtual disk. Make a backup copy of the virtual disk. If the physical
> drive
> is OK, use a separate VM to mount the Fedora 34 virtual disk for repair
> attempts.
>
> Try: https://btrfs.wiki.kernel.org/index.php/FAQ
> How do I recover from a parent transid verify failed error?
>
> At one time VirtualBox had issues with btrfs. You should check for
> similar
> reports for VMWare and btrfs.
>
> --
> George N. White III
>
>
> _______________________________________________
> users mailing list --users(a)lists.fedoraproject.org
> To unsubscribe send an email tousers-leave(a)lists.fedoraproject.org
> Fedora Code of Conduct:https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines:https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:https://lists.fedoraproject.org/archives/list/users@lists.fedora...
> Do not reply to spam on the list, report it:https://pagure.io/fedora-infrastructure
2 years, 1 month
Re: BTRFS partition corrupted after deleting files in /home
by Chris Murphy
On Mon, Jan 4, 2021 at 6:59 AM Sreyan Chakravarty <sreyan32(a)gmail.com> wrote:
>
> On Mon, Jan 4, 2021 at 1:16 AM Chris Murphy <lists(a)colorremedies.com> wrote:
> >
> > Try to mount normally, then:
>
> I am unable to mount normally :
>
> # mount -t btrfs /dev/mapper/dm_crypt /mnt/
> mount: /mnt: wrong fs type, bad option, bad superblock on
> /dev/mapper/dm_crypt, missing codepage or helper program, or other
> error.
>
> >
> > dmesg
>
> This is what I get in dmesg:
>
> [29867.234062] BTRFS info (device dm-4): disk space caching is enabled
> [29867.234067] BTRFS info (device dm-4): has skinny extents
> [29867.317955] BTRFS error (device dm-4): parent transid verify failed
> on 55640064 wanted 44146 found 44438
> [29867.326701] BTRFS error (device dm-4): parent transid verify failed
> on 55640064 wanted 44146 found 44438
> [29867.326727] BTRFS warning (device dm-4): failed to read root (objectid=9): -5
> [29867.333668] BTRFS error (device dm-4): open_ctree failed
transid errors like this indicate out of order writes due to drive
firmware not honoring file system write ordering and then getting a
badly timed crash/powerfail/shutdown. However...
You report that the file system went read only while using it. This
suggests a dropped write and the file system went read-only to limit
the damage. Ideally we'd get the log, if it made it to disk, to see
what lead up to this so we can determine what the problem is and get
it fixed. What I can tell you is this is not user error but that's not
much comfort.
>
> > btrfs check --readonly
>
> A lot of errors, could not even upload to pastebin.
>
> This is in my Google Drive:
> https://drive.google.com/file/d/1dpW7aftB3FuD8i1J7d4nRrzZHaGF4vuN/view?us...
Yeah that's bad. I think it's fixable. We need to get a metadata dump
of the file system to see if fsck will fix it.
btrfs-image -c9 -t4 /dev/sdXY /mnt/path/to/file
That will include filenames but not any data. If you need to mask
filenames, add -ss option to the above. (-s won't help here). And the
path to file if you're on a live USB stick can just be something like
~/sreyan-btrfs.img and then put it up on the google drive. By the way
I'm on irc.freenode.net as cmurf that's usually the easier way to get
help, on #fedora channel.
Also, have you ever done a balance on this file system? (That is not a
suggestion that you should or shouldn't have. Just a yes or no
question to try and piece together some other data points.)
--
Chris Murphy
3 years, 3 months
Re: maybe OT
by Chris Murphy
On Fri, Mar 18, 2022 at 4:47 PM Paolo Galtieri <pgaltieri(a)gmail.com> wrote:
>
> I'm having issues with a VM.
>
> The VM was originally created under VMware and has worked fine for a
> while. Today when I booted it up instead of seeing the usual MATE login
> screen I get a login prompt:
>
> f34-01-vm:
>
> no matter what I enter, root or pgaltieri as login it never asks for
> password and immediately says login incorrect. While it's booting I see
> several [FAILED]... messages, e.g. [FAILED] to start CUPS Scheduler
>
> I booted the system again and this time it dropped into emergency mode.
> In emergency mode I see the following messages in dmesg:
>
> BTRFS info (device sda2): flagging fs with big metadata feature
> BTRFS info (device sda2): disk space caching is enabled
> BTRFS info (device sda2): has skinny extents
> BTRFS info (device sda2): start tree-log replay
> BTRFS info (device sda2): parent transid verify failed on 61849600
> wanted 145639 fount 145637
> BTRFS info (device sda2): parent transid verify failed on 61849600
> wanted 145639 fount 145637
> BTRFS: error (device sda2) in btrfs_replay_log:2423 errno=-5 IO failure
> (Failed to recover log tree)
> BTRFS error (device sda2) open_ctree failed
That's not good. The tree-log is used during fsync as an optimization
to avoid having to do full file system metadata updates. Since the
tree-log exists, we know this file system was undergoing some fsync
write operations which were then interrupted. Either the VM or host
crashed, or one of them was forced to shutdown, or there's a bug that
otherwise prevented the guest operations from completing. Further, the
parent transid verification failure messages indicate some out of
order writes, as if the virtual drive+controller+cache is occasionally
ignoring flush/FUA requests.
I regularly use qemu-kvm VM with cache mode "unsafe". The VM can crash
all day long and at most I lose ~30s of the most recent writes,
depending on the fsync policy of the application doing the writes. But
the file system mounts normally otherwise following the crash. However
if the host crashes while the guest is writing, that file system can
be irreparably damaged. This is expected. So you might want to check
the cache policy being used, make sure that the guest VM is really
shutting down properly before rebooting/shutting down the host.
>
> I ran btrfs check in emergency mode and it came up with a lot of errors.
>
> How do i recover the partition(s) so I can boot the system, or at least
> mount them?
I'd start with
mount -o ro,nologreplay,rescue=usebackuproot
Followed by
mount -o ro,nologreplay,rescue=all
The second one is a bit of a heavy hammer but it's safe insofar as
it's mounting the fs read only and making no changes. It is also
disabling csum checking so any corrupt files still get copied out, and
without any corruption warnings. You can check man 5 btrfs to read a
bit more about the other options and vary the selection. This is
pretty much a recovery operation, i.e. get the important data out.
The repair comment for this particular set of errors:
btrfs rescue zero-log
btrfs check --repair --init-extent-tree
btrfs check --repair
I have somewhat low confidence that it can be repaired rather than
make things worse. So you should start out with the earlier mount
commands to get anything important out of the fs first. IF those don't
work and there's important information to get out, you need to use
btrfs restore.
--
Chris Murphy
2 years, 1 month
Re: BTRFS partition corrupted after deleting files in /home
by Sreyan Chakravarty
On Mon, Jan 4, 2021 at 1:16 AM Chris Murphy <lists(a)colorremedies.com> wrote:
>
> Try to mount normally, then:
I am unable to mount normally :
# mount -t btrfs /dev/mapper/dm_crypt /mnt/
mount: /mnt: wrong fs type, bad option, bad superblock on
/dev/mapper/dm_crypt, missing codepage or helper program, or other
error.
>
> dmesg
This is what I get in dmesg:
[29867.234062] BTRFS info (device dm-4): disk space caching is enabled
[29867.234067] BTRFS info (device dm-4): has skinny extents
[29867.317955] BTRFS error (device dm-4): parent transid verify failed
on 55640064 wanted 44146 found 44438
[29867.326701] BTRFS error (device dm-4): parent transid verify failed
on 55640064 wanted 44146 found 44438
[29867.326727] BTRFS warning (device dm-4): failed to read root (objectid=9): -5
[29867.333668] BTRFS error (device dm-4): open_ctree failed
> btrfs check --readonly
A lot of errors, could not even upload to pastebin.
This is in my Google Drive:
https://drive.google.com/file/d/1dpW7aftB3FuD8i1J7d4nRrzZHaGF4vuN/view?us...
Let me know if you are not able to download. It's compressed via gzip.
>
> mount -o ro,usebackuproot
>
mount -o ro,usebackuproot /dev/mapper/dm_crypt /mnt/
mount: /mnt: wrong fs type, bad option, bad superblock on
/dev/mapper/dm_crypt, missing codepage or helper program, or other
error.
Something is horribly wrong.
--
Regards,
Sreyan Chakravarty
3 years, 3 months
maybe OT
by Paolo Galtieri
I'm having issues with a VM.
The VM was originally created under VMware and has worked fine for a
while. Today when I booted it up instead of seeing the usual MATE login
screen I get a login prompt:
f34-01-vm:
no matter what I enter, root or pgaltieri as login it never asks for
password and immediately says login incorrect. While it's booting I see
several [FAILED]... messages, e.g. [FAILED] to start CUPS Scheduler
I booted the system again and this time it dropped into emergency mode.
In emergency mode I see the following messages in dmesg:
BTRFS info (device sda2): flagging fs with big metadata feature
BTRFS info (device sda2): disk space caching is enabled
BTRFS info (device sda2): has skinny extents
BTRFS info (device sda2): start tree-log replay
BTRFS info (device sda2): parent transid verify failed on 61849600
wanted 145639 fount 145637
BTRFS info (device sda2): parent transid verify failed on 61849600
wanted 145639 fount 145637
BTRFS: error (device sda2) in btrfs_replay_log:2423 errno=-5 IO failure
(Failed to recover log tree)
BTRFS error (device sda2) open_ctree failed
I ran btrfs check in emergency mode and it came up with a lot of errors.
How do i recover the partition(s) so I can boot the system, or at least
mount them?
Also in emergency mode:
vi /run/initramfs/rdsosreport.txt
results in:
/usr/bin/vi: line 23: /usr/libexec/vi: No such file or directory
/usr/bin/vi is a script:
if test -f /usr/bin/vim
then
exec /usr/bin/vim "$@"
fi
exec /usr/libexec/vi "$@"
neither /usr/bin/vim nor /usr/libexec/vi exist.
======================================================================================
I tried booting the vm under VirtualBox with the same result.
I converted the image:
qemu-img convert -O qcow ../VMware/VMs/f34-01-vm/f34-01-vm.vmdk
f34-01-vm.qcow2
which worked without errors. I then ran virt-manager to try to boot the
image. This fails with this error
Unable to complete install: 'internal error: process exited while
connecting to monitor: 2022-03-18T19:13:15.196710Z qemu-system-x86_64:
-blockdev
{"driver":"file","filename":"/run/media/pgaltieri/SDNVIRTLAB02/VirtualMachines/KVM/f34-01-vm.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}:
Could not open
'/run/media/pgaltieri/SDNVIRTLAB02/VirtualMachines/KVM/f34-01-vm.qcow2':
Permission denied'
Traceback (most recent call last):
File "/usr/share/virt-manager/virtManager/asyncjob.py", line 65, in
cb_wrapper
callback(asyncjob, *args, **kwargs)
File "/usr/share/virt-manager/virtManager/createvm.py", line 2001, in
_do_async_install
installer.start_install(guest, meter=meter)
File "/usr/share/virt-manager/virtinst/install/installer.py", line
701, in start_install
domain = self._create_guest(
File "/usr/share/virt-manager/virtinst/install/installer.py", line
649, in _create_guest
domain = self.conn.createXML(install_xml or final_xml, 0)
File "/usr/lib64/python3.9/site-packages/libvirt.py", line 4366, in
createXML
raise libvirtError('virDomainCreateXML() failed')
libvirt.libvirtError: internal error: process exited while connecting to
monitor: 2022-03-18T19:13:15.196710Z qemu-system-x86_64: -blockdev
{"driver":"file","filename":"/run/media/pgaltieri/SDNVIRTLAB02/VirtualMachines/KVM/f34-01-vm.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}:
Could not open
'/run/media/pgaltieri/SDNVIRTLAB02/VirtualMachines/KVM/f34-01-vm.qcow2':
Permission denied
I added my user id to both the qemu and libvirt entries in /etc/group
and logged out and logged back in and I get the same error. I also get
SELinux alerts:
The first alert:
You need to change the label on f34-01-vm.qcow2'
# semanage fcontext -a -t virt_image_t
'/run/media/pgaltieri/SDNVIRTLAB02/VirtualMachines/KVM/f34-01-vm.qcow2'
# restorecon -v
'/run/media/pgaltieri/SDNVIRTLAB02/VirtualMachines/KVM/f34-01-vm.qcow2'
subsequent alerts tell me to run:
# /sbin/restorecon -v
/run/media/pgaltieri/SDNVIRTLAB02/VirtualMachines/KVM/f34-01-vm.qcow2
I have run these commands, especially the restorecon, several times and
I still get the alerts.
One thing the semanage command as shown fails with:
ValueError: File spec
/run/media/pgaltieri/SDNVIRTLAB02/VirtualMachines/KVM/f34-01-vm.qcow2
conflicts with equivalency rule '/run /var/run'; Try adding
'/var/run/media/pgaltieri/SDNVIRTLAB02/VirtualMachines/KVM/f34-01-vm.qcow2'
instead
If I add the /var then it works.
here is the context of the file:
-rwxrwxrwx. 1 pgaltieri pgaltieri system_u:object_r:fusefs_t:s0
15041695744 Mar 18 11:46 f34-01-vm.qcow2*
So how the heck do I boot the image and get it running?
Paolo
2 years, 1 month
Fw: mount fails for btrfs filesystem, need help please.
by George R Goffe
Hi,
I started getting i/o error messages accessing this filesystem so I rebooted the system. This might have been the wrong thing to do. This subsequent boot went to maintenance mode due the filesystem's path being in /etc/fstab.
I need some help with this please. Here is what mount says:
mount /dev/sda6 /opt.
kernel: BTRFS info (device sda6): flagging fs with big metadata feature
kernel: BTRFS info (device sda6): disk space caching is enabled
kernel: BTRFS info (device sda6): has skinny extents
kernel: BTRFS error (device sda6): parent transid verify failed on 148312850432 wanted 73476 found 73484
kernel: BTRFS error (device sda6): failed to read block groups: -5
kernel: BTRFS error (device sda6): open_ctree failed
1 year, 8 months