Ext4 errror?

Bill Davidsen davidsen at tmr.com
Mon Jul 5 19:25:34 UTC 2010


fedora wrote:
> Hi listers
> 
> i got file system errors on a new machine (hw errors should therefore 
> not be an issue, also smartctl does not indicate any errors), which 
> holds two disks on SATA controllers. Both disks contain a fully fleged 
> Fedora 13, so that i can boot from either of them.
> 
> i usually boot from the first disk, and i take care not to cross-mount 
> the second disk or to unmount cross-mounts before hibernating.
> 
> [root at myws ~]# uname -a
> Linux myws.lan 2.6.33.5-124.fc13.x86_64 #1 SMP Fri Jun 11 09:38:12 UTC 
> 2010 x86_64 x86_64 x86_64 GNU/Linux
> [root at myws ~]#
> 
> The complete log of a boot cycle follows in the next message.
> 
> The file systems error manifests itself as follows in /var/log/messages:
> 
> 
> Jul  5 07:04:59 myws kernel: EXT4-fs error (device dm-0): 
> ext4_free_inode: bit already cleared for inode 136802
> Jul  5 07:04:59 myws kernel: EXT4-fs error (device dm-0): 
> ext4_free_inode: bit already cleared for inode 136803
> 
> 
> When this error occurs i can no longer do such simple things as
> 
> touch /tmp/abcd.txt
> 
> which at this time gives me "No such file or directory"
> 
> to shut down the system, i usually use the hibernate function (i.e. save 
> to the swap space), i mostly do not reboot the system. But then, after 
> some resume/thaw cycles from the swap space, the above error happens, 
> and i have to reboot.
> 
> when rebooting, the system goes through one ore two fsck cycles whith 
> "File System has been modified, reboot needed" and reboots itself.
> 
> when the system comes up after that, the above error does not happen 
> anymore, but i am not sure, whether the system is in the same state as 
> before, i.e. i am not sure, whether i have lost data.
> 

> As you can see from the boot-log, the system has 4 CPUs, which made me 
> think that this is a "write barriers" issue, but from kernel 2.6.31 on, 
> write barriers in multi processor systems should pose no problems any more.
> 
> 
> questions:
> 1. is this a heavy issue, i.e. does this "error" corrupt my system with 
> time?
> 
> 2. what can i do to avoid this ext4 error, it it were an error? going 
> back to ext3 is considered no solution.
> 
> thanks for any hints.
> 
You have multiple boot drives, LVM, barriers with SMP, and repeated hibernate.
You didn't mention compiling with suspend2 patches (or whatever it's called 
today), have you done that, too? I would start by not hibernating and seeing if 
that's the issue, turn off barriers and see if that's the issue. Right now you 
call this an ext4 problem, but I've been running TB of storage on ext4 since the 
early FC9 days, and not having issues. But stock hibernate has issues on some of 
my machines, barrier code is still changing and has had issues with SMP in the 
past, and LVM is really not needed unless things are likely to change (and adds 
overhead, and possibly has issues with barriers).

I have the feeling that you have an overly high ratio of solution to problem on 
the complexity scale.

-- 
Bill Davidsen <davidsen at tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot


More information about the users mailing list