People,
I know this is not strictly a Fedora issue but I only use Fedora so I am hoping people here can help - maybe we should have a separate mailing list or forum topic for this sort of hard disk stuff?
Just after a full backup (fortunately) the 7.2TB /home partition (/dev/sda5) on my email server somehow got corrupted. After I realised there was a problem, I unmounted the partition and tried:
e2fsck -y /dev/sda5
but the process hangs after “Clone multiply-claimed blocks<y>?” and the disk goes quiet - I could still break out with CTRL-C but I can't get past this point in the attempted fix process. So I thought I would just produce a list of the affected files and then just delete the inodes or just restore from backup but when I tried:
debugfs -R "ncheck 187536544" /dev/sda5
it took hours to find nothing but printed screenfulls of:
ncheck: "Directory block checksum" does not match directory block while calling ext2_dir_iterate
and there are 1069 inodes to check!
I am guessing that if I just try to delete each of the inodes with:
debugfs -R "clri <inode>" /dev/sda5
that it would take weeks! So unless someone can suggest a faster method of fixing the partition (mainly just as an exercise now) or at least just working out what is wrong with it, I guess I will just have to re-create the partition?
Thanks,
Phil.
On 8/30/20 12:07 AM, Philip Rhoades wrote:
that it would take weeks! So unless someone can suggest a faster method of fixing the partition (mainly just as an exercise now) or at least just working out what is wrong with it, I guess I will just have to re-create the partition?
Since you have a very recent full backup, I would recommend just reformatting the partition. If the damage is that extensive, it's not worth trying to fix it.
Samuel,
On 2020-08-30 17:12, Samuel Sieb wrote:
On 8/30/20 12:07 AM, Philip Rhoades wrote:
that it would take weeks! So unless someone can suggest a faster method of fixing the partition (mainly just as an exercise now) or at least just working out what is wrong with it, I guess I will just have to re-create the partition?
Since you have a very recent full backup, I would recommend just reformatting the partition. If the damage is that extensive, it's not worth trying to fix it.
You are probably right - it would be an interesting exercise if it could be done though . .
P.
On 8/30/20 3:01 AM, Philip Rhoades wrote:
Samuel,
On 2020-08-30 17:12, Samuel Sieb wrote:
On 8/30/20 12:07 AM, Philip Rhoades wrote:
that it would take weeks! So unless someone can suggest a faster method of fixing the partition (mainly just as an exercise now) or at least just working out what is wrong with it, I guess I will just have to re-create the partition?
Since you have a very recent full backup, I would recommend just reformatting the partition. If the damage is that extensive, it's not worth trying to fix it.
You are probably right - it would be an interesting exercise if it could be done though . .
There is no point in even trying. "Multiply claimed blocks" is an unrecoverable situation. Sure, each block can be cloned to give each file its own copy. The filesystem will now be consistent, but the some files will have corrupted content. Only one of those files (probably the newest one) claiming the block will hold its correct data. It's the job of _fsck_ to make the filesystem consistent, and not necessarily to preserve user data in the process. Having a consistent filesystem with corrupted file content is arguably a worse situation than a filesystem with known, detectable corruption.
On Sun, 2020-08-30 at 08:10 -0500, Robert Nichols wrote:
On 8/30/20 3:01 AM, Philip Rhoades wrote:
Samuel,
On 2020-08-30 17:12, Samuel Sieb wrote:
On 8/30/20 12:07 AM, Philip Rhoades wrote:
that it would take weeks! So unless someone can suggest a faster method of fixing the partition (mainly just as an exercise now) or at least just working out what is wrong with it, I guess I will just have to re-create the partition?
Since you have a very recent full backup, I would recommend just reformatting the partition. If the damage is that extensive, it's not worth trying to fix it.
You are probably right - it would be an interesting exercise if it could be done though . .
There is no point in even trying. "Multiply claimed blocks" is an unrecoverable situation. Sure, each block can be cloned to give each file its own copy. The filesystem will now be consistent, but the some files will have corrupted content. Only one of those files (probably the newest one) claiming the block will hold its correct data. It's the job of _fsck_ to make the filesystem consistent, and not necessarily to preserve user data in the process. Having a consistent filesystem with corrupted file content is arguably a worse situation than a filesystem with known, detectable corruption.
I always used to tell people that "an empty filesystem is consistent".
poc
On Sun, 30 Aug 2020 at 04:08, Philip Rhoades phil@pricom.com.au wrote:
People,
I know this is not strictly a Fedora issue but I only use Fedora so I am hoping people here can help - maybe we should have a separate mailing list or forum topic for this sort of hard disk stuff?
Just after a full backup (fortunately) the 7.2TB /home partition (/dev/sda5) on my email server somehow got corrupted.
Think about possible hardware issues including: overheating, bad cables, failed disk. smartmontools can tell you about problems with the drive and run the drive's built-in tests. Some vendors will issue a warranty return authorization on the strength of linux smartctl results.
After I realised there was a problem, I unmounted the partition and tried:
e2fsck -y /dev/sda5
but the process hangs after “Clone multiply-claimed blocks<y>?” and the disk goes quiet - I could still break out with CTRL-C but I can't get past this point in the attempted fix process. So I thought I would just produce a list of the affected files and then just delete the inodes or just restore from backup but when I tried:
debugfs -R "ncheck 187536544" /dev/sda5
it took hours to find nothing but printed screenfulls of:
ncheck: "Directory block checksum" does not match directory block while calling ext2_dir_iterate
and there are 1069 inodes to check!
I am guessing that if I just try to delete each of the inodes with:
debugfs -R "clri <inode>" /dev/sda5
that it would take weeks! So unless someone can suggest a faster method of fixing the partition (mainly just as an exercise now) or at least just working out what is wrong with it, I guess I will just have to re-create the partition?
I wouldn't spend any time on this drive until I had confidence in the hardware.
On Sun, Aug 30, 2020, 2:01 AM Philip Rhoades phil@pricom.com.au wrote:
Samuel,
On 2020-08-30 17:12, Samuel Sieb wrote:
On 8/30/20 12:07 AM, Philip Rhoades wrote:
that it would take weeks! So unless someone can suggest a faster method of fixing the partition (mainly just as an exercise now) or at least just working out what is wrong with it, I guess I will just have to re-create the partition?
Since you have a very recent full backup, I would recommend just reformatting the partition. If the damage is that extensive, it's not worth trying to fix it.
You are probably right - it would be an interesting exercise if it could be done though . .
It's pretty good.
You could file a bug against e2fsprogs in RHBZ. But I think you'll get a faster response just going direct to the linux-ext4 list.
Are there any kernel messages at the time of the original problem? Complete dmesg preferred.
smartctl -x /dev/ as well.
And et them know e2fsprogs version.
-- Chris Murphy