cp on ext3 silent wrong behaviour (propose to ask for fsck after an unclean stop)

Fri Oct 5 20:08:31 UTC 2007

Excuse in advance for the long post....
Some time ago, during f5-testx phase, I asked why, after an unclean shutdown
(crash or poweroff, ecc), it was no more proposed to do an fsck (with
default of no if no answer in a due interval for example...). I found it at:

https://www.redhat.com/archives/fedora-devel-list/2005-December/msg01165.html

Now I come again, during test phase of upcoming f8, explaining a problem I
intercepted.
My system is x86_64 with a quad core cpu and both a rawhide and a f7 on
different partitions
Yesterday, while inside rawhide updated at 2-3 October, I was downloading
7.92 dvd iso but in the meanwhile I found screen  apparently frozen (f
bubbles stopped)  without no option to give input...
So I was forced to keep power off button pressed until stop.
At restart, always in rawhide, I got automatically recovery of journal for
the fs that were mounted:

/dev/sda4 my current rawhide / filesystem

/dev/sda3 my fc7 x86_64 / filesystem

No particular message; in /var/log/messages I got:
Oct  4 23:00:15 tekkaman kernel: EXT3-fs: INFO: recovery required on
readonly filesystem.
Oct  4 23:00:15 tekkaman kernel: EXT3-fs: write access will be enabled
during recovery.
Oct  4 23:00:15 tekkaman kernel: kjournald starting.  Commit interval 5
seconds
Oct  4 23:00:15 tekkaman kernel: EXT3-fs: sda4: orphan cleanup on readonly
fs
Oct  4 23:00:15 tekkaman kernel: EXT3-fs: sda4: 1 orphan inode deleted
Oct  4 23:00:15 tekkaman kernel: EXT3-fs: recovery complete.
Oct  4 23:00:15 tekkaman kernel: EXT3-fs: mounted filesystem with ordered
data mode.
...
Oct  4 23:00:15 tekkaman kernel: kjournald starting.  Commit interval 5
seconds
Oct  4 23:00:15 tekkaman kernel: EXT3 FS on sda3, internal journal
Oct  4 23:00:15 tekkaman kernel: EXT3-fs: mounted filesystem with ordered
data mode.

I deleted the firefox .part file and restarted  the download from the
beginning.
sha1sum of the file was ok
I then copied the 3.5Gb file to  the fc7 partition to be able to do a hard
disk installation of f8-t3, but for three times the resulting file was not
the same as the target, even if cp command exited normally......
both sha1sum and cksum gave different results:
[root at tekkaman testuser2]# cp -p
Fedora-7.92-x86_64-DVD.iso/fc7_x86_64/home/gcecchi/
[root at tekkaman testuser2]# sha1sum /fc7_x86_64/home/gcecchi/Fedora-
7.92-x86_64-DVD.iso
6e5ba85f7b0f48ae0a2ccbd206a6979426818ad0  /fc7_x86_64/home/gcecchi/Fedora-
7.92-x86_64-DVD.iso
[root at tekkaman testuser2]# cksum Fedora-7.92-x86_64-DVD.iso
1026324974 3543584768 Fedora-7.92-x86_64-DVD.iso
[root at tekkaman testuser2]# cksum /fc7_x86_64/home/gcecchi/Fedora-
7.92-x86_64-DVD.iso
3095081338 3543584768 /fc7_x86_64/home/gcecchi/Fedora-7.92-x86_64-DVD.iso

dmesg gave nothing particular
I tried with dd: also this command completed correctly but generating a
wrong file....
[root at tekkaman testuser2]# dd
if=Fedora-7.92-x86_64-DVD.isoof=/fc7_x86_64/home/gcecchi/Fedora-
7.92-x86
_64-DVD.iso bs=1024k
3379+1 records in
3379+1 records out
3543584768 bytes (3.5 GB) copied, 181.017 s, 19.6 MB/s

This time with dmesg I found this output (I have nvidia proprietary, so it
is tainted):

BUG journal_head (Tainted: P       ): Object padding overwritten
-----------------------------------------------------------------------------

INFO: 0xffff81007961ffbd-0xffff81007961ffbd. First byte 0x1a instead of 0x5a
INFO: Allocated in journal_add_journal_head+0x27/0x15b [jbd] age=15635 cpu=0
pid=7223
INFO: Slab 0xffff810003a85ba0 used=1 fp=0xffff81007961fe70
flags=0x78080000000083
INFO: Object 0xffff81007961ff18 @offset=3864 fp=0x0000000000000000

Bytes b4 0xffff81007961ff08:  c4 2f 93 01 01 00 00 00 5a 5a 5a 5a 5a 5a 5a
5a �/......ZZZZZZZZ
  Object 0xffff81007961ff18:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 ................
  Object 0xffff81007961ff28:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 ................
  Object 0xffff81007961ff38:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 ................
  Object 0xffff81007961ff48:  00 00 00 00 00 00 00 00 00 90 8f 20 00 81 ff
ff ..............��
  Object 0xffff81007961ff58:  d8 b9 c9 45 00 81 ff ff 00 00 00 00 00 00 00
00 ع�E..��........
  Object 0xffff81007961ff68:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 ................
 Redzone 0xffff81007961ff78:  cc cc cc cc cc cc cc
cc                         ��������
 Padding 0xffff81007961ffb8:  5a 5a 5a 5a 5a 1a 5a
5a                         ZZZZZ.ZZ

Call Trace:
 [<ffffffff8109c4c4>] check_bytes_and_report+0xa3/0xc9
 [<ffffffff88029ae5>] :jbd:journal_remove_journal_head+0x24/0x38
 [<ffffffff8109c819>] check_object+0x156/0x23a
 [<ffffffff8109d695>] __slab_free+0x1ca/0x2e6
 [<ffffffff88029ae5>] :jbd:journal_remove_journal_head+0x24/0x38
 [<ffffffff8109dee1>] kmem_cache_free+0xb9/0xe0
 [<ffffffff88029ae5>] :jbd:journal_remove_journal_head+0x24/0x38
 [<ffffffff88025827>] :jbd:journal_commit_transaction+0x4ef/0x10c0
 [<ffffffff88029920>] :jbd:kjournald+0xba/0x21d
 [<ffffffff8104a1ea>] autoremove_wake_function+0x0/0x2e
 [<ffffffff88029866>] :jbd:kjournald+0x0/0x21d
 [<ffffffff8104a0a8>] kthread+0x47/0x73
 [<ffffffff812692eb>] trace_hardirqs_on_thunk+0x35/0x37
 [<ffffffff8100ca88>] child_rip+0xa/0x12
 [<ffffffff81033bd2>] schedule_tail+0x6c/0xd4
 [<ffffffff8100c19c>] restore_args+0x0/0x30
 [<ffffffff8101bfda>] lapic_next_event+0x0/0xa
 [<ffffffff81049f31>] kthreadd+0x117/0x13c
 [<ffffffff8104a061>] kthread+0x0/0x73
 [<ffffffff8100ca7e>] child_rip+0x0/0x12

FIX journal_head: Restoring 0xffff81007961ffbd-0xffff81007961ffbd=0x5a

I don't know if this was related to source or target problems...
I tried to fsck the target:

[root at tekkaman testuser2]# fsck -fy /dev/sda3
fsck 1.40.2 (12-Jul-2007)
e2fsck 1.40.2 (12-Jul-2007)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/1: 326714/9768928 files (6.4% non-contiguous), 8406328/9767520 blocks

But nothing changed trying to do a copy.
So I did a
touch /forcefsck
and rebooted again in rawhide

The rawhide / filesystem was checked with no particular message....
Finally I was able to correctly copy the file....

So, if you had the patience:
wouldn't it be better to at least ask if one want to force a fsck after a
crash, with a timeout eventually, as in the old days?
Otherwise one is in need to touch the /forcefsck file and reboot again
without choice....
Or is this "only" a bug and these kind of things should never happen at all?

Thanks
Gianluca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.fedoraproject.org/pipermail/test/attachments/20071005/629154c8/attachment.html