Hi, I have a fedora35 system that uses rsync to operate as a backup server. It has a 8TB RAID5 array, and for the last few days, has crashed/segfaulted in what appears to be the same time that it starts to backup a particular remote host. This indicates to me that perhaps the there is some spot on the disk that is related to this particular host's data that is triggering this.

When it happens, there is a segfault message on the console, but nothing related to it in the logs. There are bits from the kernel about being unable to write prior to the crash, however:

Aug 1 12:24:32 mail03 kernel: [2415225.412978] EXT4-fs warning (device md2): ext4_end_bio:343: I/O error 10 writing to inode 232141206 starting block 3033088)
Aug 1 12:24:32 mail03 kernel: [2415225.412987] Buffer I/O error on device md2, logical block 3033088
Aug 1 12:24:32 mail03 kernel: [2415225.413025] Buffer I/O error on device md2, logical block 3033089

...

Aug 1 12:24:32 mail03 kernel: [2415225.526007] JBD2: Detected IO errors while flushing file data on md2-8
Aug 1 12:24:35 mail03 kernel: [2415227.560338] JBD2: Detected IO errors while flushing file data on md2-8

How do I identify which of the four disks this is? I've run smartctl short checks on each disk in the array, but all four passed without error. What is md2-8?

From /proc/mdstat:

md2 : active raid5 sde1[4] sdc1[7] sda1[5] sdf1[6]
8790402048 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
bitmap: 0/22 pages [0KB], 65536KB chunk

You'll also notice the array is fully operational.

I'm also now running a full fsck scan of the disk:

# fsck -Vfp -C0 /dev/md2
fsck from util-linux 2.37.4
[/usr/sbin/fsck.ext4 (1) -- /var/backup] fsck.ext4 -fp -C0 /dev/md2
/dev/md2: |=== | 5.7%

but it'll clearly take a while.

I also don't see any errors in the kernel log related to each of the four individual disks.