[SOLVED] Ext4 errror?

fedora fedora at ayni.com
Sun Aug 1 06:28:38 UTC 2010


Hi Bill
on July 5 you replied kindly on my message.
The last few days i mean to have solved the problem by facing out LVM. 
As you said LVM adds considerable overhead and makes the system error 
prone. What previousely has been on an lv is now on a real partition. 
Serial disk access is now twice as fast as with LVM, you see that when 
resuming from hibernation: 116 MB/s as opposed to 60 MB/s

suomi




fedora wrote:
 > Hi listers
 >
 > i got file system errors on a new machine (hw errors should therefore
 > not be an issue, also smartctl does not indicate any errors), which
 > holds two disks on SATA controllers. Both disks contain a fully fleged
 > Fedora 13, so that i can boot from either of them.
 >
 > i usually boot from the first disk, and i take care not to cross-mount
 > the second disk or to unmount cross-mounts before hibernating.
 >
 > [root at myws ~]# uname -a
 > Linux myws.lan 2.6.33.5-124.fc13.x86_64 #1 SMP Fri Jun 11 09:38:12 UTC
 > 2010 x86_64 x86_64 x86_64 GNU/Linux
 > [root at myws ~]#
 >
 > The complete log of a boot cycle follows in the next message.
 >
 > The file systems error manifests itself as follows in /var/log/messages:
 >
 >
 > Jul  5 07:04:59 myws kernel: EXT4-fs error (device dm-0):
 > ext4_free_inode: bit already cleared for inode 136802
 > Jul  5 07:04:59 myws kernel: EXT4-fs error (device dm-0):
 > ext4_free_inode: bit already cleared for inode 136803
 >
 >
 > When this error occurs i can no longer do such simple things as
 >
 > touch /tmp/abcd.txt
 >
 > which at this time gives me "No such file or directory"
 >
 > to shut down the system, i usually use the hibernate function (i.e. save
 > to the swap space), i mostly do not reboot the system. But then, after
 > some resume/thaw cycles from the swap space, the above error happens,
 > and i have to reboot.
 >
 > when rebooting, the system goes through one ore two fsck cycles whith
 > "File System has been modified, reboot needed" and reboots itself.
 >
 > when the system comes up after that, the above error does not happen
 > anymore, but i am not sure, whether the system is in the same state as
 > before, i.e. i am not sure, whether i have lost data.
 >

 > As you can see from the boot-log, the system has 4 CPUs, which made me
 > think that this is a "write barriers" issue, but from kernel 2.6.31 on,
 > write barriers in multi processor systems should pose no problems any 
more.
 >
 >
 > questions:
 > 1. is this a heavy issue, i.e. does this "error" corrupt my system with
 > time?
 >
 > 2. what can i do to avoid this ext4 error, it it were an error? going
 > back to ext3 is considered no solution.
 >
 > thanks for any hints.
 >
You have multiple boot drives, LVM, barriers with SMP, and repeated 
hibernate.
You didn't mention compiling with suspend2 patches (or whatever it's called
today), have you done that, too? I would start by not hibernating and 
seeing if
that's the issue, turn off barriers and see if that's the issue. Right 
now you
call this an ext4 problem, but I've been running TB of storage on ext4 
since the
early FC9 days, and not having issues. But stock hibernate has issues on 
some of
my machines, barrier code is still changing and has had issues with SMP 
in the
past, and LVM is really not needed unless things are likely to change 
(and adds
overhead, and possibly has issues with barriers).

I have the feeling that you have an overly high ratio of solution to 
problem on
the complexity scale.

-- 
Bill Davidsen <davidsen at tmr.com>
    "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot



More information about the users mailing list