OT - Journaling File Systems?
Edwards, Scott (MED, Kelly IT Resouces)
James.Edwards at med.ge.com
Tue Apr 27 17:03:35 UTC 2004
Does anyone know of any comparisons of ext3, jfs, xfs and reiser for
reliability? I have googled for it and I found several comparisons for
how fast they are and how well they store large and small files, but I
haven't found any that really talk about how well they recover.
I have been testing testing them myself and the results have not been
what I expected. I used Fedora Core 2 Test 1 for testing (because I had
so much trouble getting FC2T2 to install). I tested it by having the
NFS client running four different processes accessing it. One was
running the fsstress test and the others were just writing the date into
a file at different rates.
First I tested ext3. A normal bootup was about 21 seconds. A bootup
after a 'plug pull' was usually about 40 seconds, but a couple of times
it was only 22 seconds (this does not include the 5 second wait which I
have to remove - since in actual operation it won't have a keyboard or
screen). I never saw any error/warning messages or data corruption.
Other than doubling the bootup time most of the time it was good.
(It was at this point that I discovered that Anaconda will only allow
formatting in ext2 or ext3. I was disappointed to see that it didn't
give one the option to use any of the other journalling file systems.
To test the others I had to boot from a Knoppix CD and reformat the
partitions by hand, then run the Fedora Install and select "do not
format the partition".)
Next I tried XFS. I was excited at first because a normal bootup was
only 18 seconds. The first reboot after a 'plug pull' was only 27
seconds (and I think that included the 5 second wait). I was very
excited to see this improvement over ext3. However, it was short lived.
After the second 'plug pull' it took 1 minute and 16 seconds to boot.
But it claimed corrupted metadata and that the superblock was trashed
and could not even mount the partiton. I found this comment on the
Gentoo installation instructions
"We only recommend using this filesystem on Linux systems with high-end
SCSI and/or fibre channel storage and a uninterruptible power supply.
Because XFS aggressively caches in-transit data in RAM, improperly
designed programs (those that don't take proper precautions when writing
files to disk and there are quite a few of them) can lose a good deal of
data if the system goes down unexpectedly." So I assumed that this was
to be expected with XFS.
Then I tried JFS. Anaconda didn't like the JFS partitons and got a
"unhandled exception" in the middle of the install. So I had to install
into ext3 partitions and then tar the whole thing, convert the
partitions to JFS and then restore everything. I modified fstab,
grub.conf, and initrd.img and it booted. A normal boot was 26 seconds.
The plug pull boots were around 36 seconds. But one time it rebooted
twice (it got partway into the booting and rebooted itself). One time
during booting it complained about corrupted data in some files. After
the fourth boot I I realized I had forgotten to remove the line from
fstab for mounting /boot and it was complaining about that each time.
So I edited fstab and removed that line. I started up the NFS tests and
did the 'plug pull'. On reboot it complained about fstab and dropped me
into a maintenance mode command line. Sure enough the fstab was empty.
But it wouldn't let me fix anything because the file system was mounted
read-only. I couldn't seem to make it let me remount it read-only so I
rebooted into Knoppix again with the idea of repairing fstab. However,
when I tried to mount the partition in Knoppix it claimed the superblock
was trashed and I could not mount the file system at all.
Last, I tried ReiserFS. It seemed terribly slow. A normal bootup was
36 seconds. A bootup after a 'plug-pull' was 45 seconds. Then I
noticed a warning message saying that 'check' was enabled which made it
slow. So I recompiled the kernel with Reiser Check disabled. So it
really freaked me out when a normal bootup was 37 seconds!?! The bootup
after the 'plug-pull' was 44 seconds (not much of an improvement). The
second time I decided to try having it copying files when it lost power.
I started it copying /etc to /tmp and pulled the plug. This time when
it rebooted I got a "1 Fatal Corruption(s) in the root block" and it
dropped me back to the command line.
I'm completely confused now. I have been under the impression that this
was the main purpose for these (journaling) file systems? I knew a guy
that worked on BeOS and he claimed that you could flick the power on and
off all day and it wouldn't lose data. Am I doing something wrong? Do
I need to set a different mode or something on these file systems so
that they can recover?
If anyone can offer any advice or point me to some other resources it
would be most appreciated.
More information about the test