Hi, I have a fedora34 system that I'm using as a mail server, and for the past few weeks, it's had a kernel crash at 6:30am every morning. Sometimes it results in the server going catatonic and unresponsive, and other times it just seems to report the kernel crash and continue running.
It looks to be caused by rsync and/or some crypt library?
I've also let it run through a memtest86 and it passed without any errors. I've also tried the previous three or four kernels over the last week or ten days and it appears to happen with all of them.
Here's a bit of the kernel message from dmesg ------------[ cut here ]------------ WARNING: CPU: 4 PID: 633983 at kernel/exit.c:739 do_exit+0x37/0xa90 general protection fault, probably for non-canonical address 0xcc2a8cfcb62a56a1: 0000 [#1] SMP PTI CPU: 4 PID: 633983 Comm: rsync Not tainted 5.14.18-200.fc34.x86_64 #1 Hardware name: To be filled by O.E.M. To be filled by O.E.M./P8B-M Series, BIOS 6801 05/07/2018 RIP: 0010:__bio_crypt_clone+0x28/0x60
abrt-cli list shows that it's not reportable
I don't see any similar reports for anything related to "general protection fault, probably for non-canonical" within the last year.
Anyone else experiencing similar problems with the latest kernels?
Hi,
On Sat, Feb 12, 2022 at 9:26 PM Joe Zeff joe@zeff.us wrote:
On 2/12/22 15:59, Alex wrote:
Anyone else experiencing similar problems with the latest kernels?
The fact that it happens at the same time every day makes me wonder if there's some job that's in process causing it.
Yes, there is an automated backup using rsync around that time, but there should be no userland process that could ever cause a kernel crash.
CPU: 4 PID: 633983 Comm: rsync Not tainted 5.14.18-200.fc34.x86_64 #1
Thanks, Alex
On 2/12/22 14:59, Alex wrote:
Here's a bit of the kernel message from dmesg ------------[ cut here ]------------ WARNING: CPU: 4 PID: 633983 at kernel/exit.c:739 do_exit+0x37/0xa90 general protection fault, probably for non-canonical address 0xcc2a8cfcb62a56a1: 0000 [#1] SMP PTI CPU: 4 PID: 633983 Comm: rsync Not tainted 5.14.18-200.fc34.x86_64 #1 Hardware name: To be filled by O.E.M. To be filled by O.E.M./P8B-M Series, BIOS 6801 05/07/2018 RIP: 0010:__bio_crypt_clone+0x28/0x60
bio_crypt_clone suggests something wrong in an encrypted block device. Maybe corrupt data that rsync traverses during the backup?
What does the output of "lsblk" look like for your system? What about "lvs"?
On Sun, 13 Feb 2022 10:41:36 -0800 Gordon Messmer wrote:
bio_crypt_clone suggests something wrong in an encrypted block device. Maybe corrupt data that rsync traverses during the backup?
Perhaps run the same rsync command with a -v option in a terminal and see if the crash happens on the same file every time.
Gordon Messmer writes:
On 2/12/22 14:59, Alex wrote:
Here's a bit of the kernel message from dmesg ------------[ cut here ]------------ WARNING: CPU: 4 PID: 633983 at kernel/exit.c:739 do_exit+0x37/0xa90 general protection fault, probably for non-canonical address 0xcc2a8cfcb62a56a1: 0000 [#1] SMP PTI CPU: 4 PID: 633983 Comm: rsync Not tainted 5.14.18-200.fc34.x86_64 #1 Hardware name: To be filled by O.E.M. To be filled by O.E.M./P8B-M Series, BIOS 6801 05/07/2018 RIP: 0010:__bio_crypt_clone+0x28/0x60
bio_crypt_clone suggests something wrong in an encrypted block device. Maybe corrupt data that rsync traverses during the backup?
What does the output of "lsblk" look like for your system? What about "lvs"?
This is worth tossing into bugzilla, for "kernel", notwithstanding abrt's reluctance in dealing with it.
Hi,
Here's a bit of the kernel message from dmesg ------------[ cut here ]------------ WARNING: CPU: 4 PID: 633983 at kernel/exit.c:739 do_exit+0x37/0xa90 general protection fault, probably for non-canonical address 0xcc2a8cfcb62a56a1: 0000 [#1] SMP PTI CPU: 4 PID: 633983 Comm: rsync Not tainted 5.14.18-200.fc34.x86_64 #1 Hardware name: To be filled by O.E.M. To be filled by O.E.M./P8B-M Series, BIOS 6801 05/07/2018 RIP: 0010:__bio_crypt_clone+0x28/0x60
bio_crypt_clone suggests something wrong in an encrypted block device. Maybe corrupt data that rsync traverses during the backup?
What does the output of "lsblk" look like for your system? What about "lvs"?
This is worth tossing into bugzilla, for "kernel", notwithstanding abrt's reluctance in dealing with it.
It's now gone two days without it having happened again, so I've rebooted and will watch it over the coming days. Here's also the output from lsblk and an fsck run.
I've also added a -v to the rsync backup script.
# fsck /dev/md2 -r -C fsck from util-linux 2.36.2 e2fsck 1.45.6 (20-Mar-2020) /dev/md2: clean, 26678858/274702336 files, 1534551536/2197600512 blocks /dev/md2: status 0, rss 11532, real 2.183027, user 1.455968, sys 0.009868
# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 2.7T 0 disk └─sda1 8:1 0 2.7T 0 part └─md2 9:2 0 8.2T 0 raid5 /var/backup sdb 8:16 0 55.9G 0 disk ├─sdb1 8:17 0 51.8G 0 part │ └─md127 9:127 0 51.8G 0 raid1 / ├─sdb2 8:18 0 501M 0 part │ └─md126 9:126 0 500.7M 0 raid1 /boot ├─sdb3 8:19 0 96M 0 part │ └─md125 9:125 0 95.9M 0 raid1 /boot/efi └─sdb4 8:20 0 3.5G 0 part [SWAP] sdc 8:32 0 2.7T 0 disk └─sdc1 8:33 0 2.7T 0 part └─md2 9:2 0 8.2T 0 raid5 /var/backup sdd 8:48 0 55.9G 0 disk ├─sdd1 8:49 0 51.8G 0 part │ └─md127 9:127 0 51.8G 0 raid1 / ├─sdd2 8:50 0 501M 0 part │ └─md126 9:126 0 500.7M 0 raid1 /boot ├─sdd3 8:51 0 96M 0 part │ └─md125 9:125 0 95.9M 0 raid1 /boot/efi └─sdd4 8:52 0 3.5G 0 part [SWAP] sde 8:64 0 2.7T 0 disk └─sde1 8:65 0 2.7T 0 part └─md2 9:2 0 8.2T 0 raid5 /var/backup sdf 8:80 0 3.6T 0 disk └─sdf1 8:81 0 3.6T 0 part └─md2 9:2 0 8.2T 0 raid5 /var/backup
Hi,
Here's a bit of the kernel message from dmesg ------------[ cut here ]------------ WARNING: CPU: 4 PID: 633983 at kernel/exit.c:739 do_exit+0x37/0xa90 general protection fault, probably for non-canonical address 0xcc2a8cfcb62a56a1: 0000 [#1] SMP PTI CPU: 4 PID: 633983 Comm: rsync Not tainted 5.14.18-200.fc34.x86_64 #1 Hardware name: To be filled by O.E.M. To be filled by O.E.M./P8B-M Series, BIOS 6801 05/07/2018 RIP: 0010:__bio_crypt_clone+0x28/0x60
This morning I noticed it crashed again, but with a different kernel message. I also discovered there was a 224GB log file being backed up over the internet from a mail server with a misconfigured /etc/rsyslog.conf that rsync was copying when the kernel crashed. I've since removed the huge log file and disabled the log entry in rsyslog.conf, so I'll now continue to watch it, but it's still a legit kernel crash.
aops:ext4_da_aops ino:bca118c dentry name:"rsyslog.log" flags: 0x17ffffc0060010(lru|mappedtodisk|reclaim|node=0|zone=2|lastcpupid=0x1fffff) raw: 0017ffffc0060010 ffffe613d17e0208 ffffe613d17e00c8 ffff8e452d1070d0 raw: 0000000002fd3d40 0000000000000000 00000001ffffffff ffff8e45031fd000 page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(folio)) ------------[ cut here ]------------ kernel BUG at mm/filemap.c:1516! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 3 PID: 922 Comm: md2_raid5 Not tainted 5.16.8-100.fc34.x86_64 #1
Do I submit this to the fedora bugzilla or the main kernel.org bugzilla?
On 2/15/22 2:10 PM, Alex wrote:
Hi,
Here's a bit of the kernel message from dmesg ------------[ cut here ]------------ WARNING: CPU: 4 PID: 633983 at kernel/exit.c:739 do_exit+0x37/0xa90 general protection fault, probably for non-canonical address 0xcc2a8cfcb62a56a1: 0000 [#1] SMP PTI CPU: 4 PID: 633983 Comm: rsync Not tainted 5.14.18-200.fc34.x86_64 #1 Hardware name: To be filled by O.E.M. To be filled by O.E.M./P8B-M Series, BIOS 6801 05/07/2018 RIP: 0010:__bio_crypt_clone+0x28/0x60
This morning I noticed it crashed again, but with a different kernel message. I also discovered there was a 224GB log file being backed up over the internet from a mail server with a misconfigured /etc/rsyslog.conf that rsync was copying when the kernel crashed. I've since removed the huge log file and disabled the log entry in rsyslog.conf, so I'll now continue to watch it, but it's still a legit kernel crash.
aops:ext4_da_aops ino:bca118c dentry name:"rsyslog.log" flags: 0x17ffffc0060010(lru|mappedtodisk|reclaim|node=0|zone=2|lastcpupid=0x1fffff) raw: 0017ffffc0060010 ffffe613d17e0208 ffffe613d17e00c8 ffff8e452d1070d0 raw: 0000000002fd3d40 0000000000000000 00000001ffffffff ffff8e45031fd000 page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(folio)) ------------[ cut here ]------------ kernel BUG at mm/filemap.c:1516! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 3 PID: 922 Comm: md2_raid5 Not tainted 5.16.8-100.fc34.x86_64 #1
Is the hardware ok? A rsync job with encryption and raid could stress the system (thermally too) and trigger instability.
Try to test your hardware in other ways (repeated gzip and md5 checks, or memtest86+) to be sure.
Regards.