Hi,
My host is logging these errors about once a week, which lead to guest (qemu KVM on libvirt) been killed. Host CPU is at all times very low. Something related to ext4 and "performance will be poor"?
Thanks
kernel-2.6.32-279.11.1.el6.x86_64 libvirt-0.9.10-21.el6_3.5.x86_64 sanlock-2.3-1.el6.x86_64
Nov 18 20:42:51 appserver01 kernel: EXT4-fs (dm-0): Unaligned AIO/DIO on inode 48 by sanlock; performance will be poor. Nov 19 20:42:53 appserver01 kernel: EXT4-fs (dm-0): Unaligned AIO/DIO on inode 48 by sanlock; performance will be poor. Nov 19 21:38:19 appserver01 sanlock[17941]: 2012-11-19 21:38:19+0100 1040540 [18880]: s1 delta_renew reread mismatch Nov 19 21:38:19 appserver01 sanlock[17941]: 2012-11-19 21:38:19+0100 1040540 [18880]: leader1 delta_renew_last error 0 lockspace __LIBVIRT__DISKS__ host_id 1 Nov 19 21:38:19 appserver01 sanlock[17941]: 2012-11-19 21:38:19+0100 1040540 [18880]: leader2 path /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ offset 0 Nov 19 21:38:19 appserver01 sanlock[17941]: 2012-11-19 21:38:19+0100 1040540 [18880]: leader3 m 12212010 v 30002 ss 512 nh 0 mh 1 oi 1 og 14 lv 0 Nov 19 21:38:19 appserver01 sanlock[17941]: 2012-11-19 21:38:19+0100 1040540 [18880]: leader4 sn __LIBVIRT__DISKS__ rn 18fefd6b-45fe-4d2b-8774-7c890826d722.appserver0 ts 1040519 cs b1df2fa4 Nov 19 21:38:19 appserver01 sanlock[17941]: 2012-11-19 21:38:19+0100 1040540 [18880]: leader1 delta_renew_read error 0 lockspace __LIBVIRT__DISKS__ host_id 1 Nov 19 21:38:19 appserver01 sanlock[17941]: 2012-11-19 21:38:19+0100 1040540 [18880]: leader2 path /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ offset 0 Nov 19 21:38:19 appserver01 sanlock[17941]: 2012-11-19 21:38:19+0100 1040540 [18880]: leader3 m 12212010 v 30002 ss 512 nh 0 mh 1 oi 1 og 14 lv 0 Nov 19 21:38:19 appserver01 sanlock[17941]: 2012-11-19 21:38:19+0100 1040540 [18880]: leader4 sn __LIBVIRT__DISKS__ rn 18fefd6b-45fe-4d2b-8774-7c890826d722.appserver0 ts 1040499 cs a4181d8a Nov 19 21:38:19 appserver01 sanlock[17941]: 2012-11-19 21:38:19+0100 1040540 [18880]: s1 renewal error -261 delta_length 0 last_success 1040519 Nov 19 21:38:19 appserver01 sanlock[17941]: 2012-11-19 21:38:19+0100 1040540 [17941]: s1 check_our_lease corrupt -261
On Wed, Nov 21, 2012 at 10:10:49AM +0100, sysadmin@albasoft.com wrote:
Nov 18 20:42:51 appserver01 kernel: EXT4-fs (dm-0): Unaligned AIO/DIO on inode 48 by sanlock; performance will be poor. Nov 19 20:42:53 appserver01 kernel: EXT4-fs (dm-0): Unaligned AIO/DIO on inode 48 by sanlock; performance will be poor.
First, it seems your systems are misconfigured; sanlock is supposed to run on nfs, not ext4. You're probably not mounting nfs early enough or it's failing to mount.
Second, I think libvirt is getting the ability to use nfs locks for this use case rather than sanlock. You might check to see if that's possible yet, because it would probably work better.
Third, I don't know whether sanlock was running on ext4 or nfs at the time of these errors, so it's hard to place the blame on something specific...
sanlock[17941]: 2012-11-19 21:38:19+0100 1040540 [18880]:
leader4 sn __LIBVIRT__DISKS__ rn 18fefd6b-45fe-4d2b-8774-7c890826d722.appserver0 ts 1040519 cs b1df2fa4
sanlock[17941]: 2012-11-19 21:38:19+0100 1040540 [18880]:
leader4 sn __LIBVIRT__DISKS__ rn 18fefd6b-45fe-4d2b-8774-7c890826d722.appserver0 ts 1040499 cs a4181d8a
At time 1040499, sanlock wrote timestamp 1040499.
At time 1040519, sanlock wrote timestamp 1040519.
At time 1040540, sanlock read what it last wrote and got back timestamp 1040499, instead of 1040519.
Either there's bad caching going on in the kernel, or some layer reported that a write was successful when it wasn't.
On 11/21/2012 04:38 PM, David Teigland wrote:
On Wed, Nov 21, 2012 at 10:10:49AM +0100, sysadmin@albasoft.com wrote:
Nov 18 20:42:51 appserver01 kernel: EXT4-fs (dm-0): Unaligned AIO/DIO on inode 48 by sanlock; performance will be poor. Nov 19 20:42:53 appserver01 kernel: EXT4-fs (dm-0): Unaligned AIO/DIO on inode 48 by sanlock; performance will be poor.
First, it seems your systems are misconfigured; sanlock is supposed to run on nfs, not ext4. You're probably not mounting nfs early enough or it's failing to mount.
I didn't know nfs was mandatory. On this specific host it was running directly on ext4. I'll change it to nfs and report.
Second, I think libvirt is getting the ability to use nfs locks for this use case rather than sanlock. You might check to see if that's possible yet, because it would probably work better.
Third, I don't know whether sanlock was running on ext4 or nfs at the time of these errors, so it's hard to place the blame on something specific...
It was running on ext4 at that time.
sanlock[17941]: 2012-11-19 21:38:19+0100 1040540 [18880]:
leader4 sn __LIBVIRT__DISKS__ rn 18fefd6b-45fe-4d2b-8774-7c890826d722.appserver0 ts 1040519 cs b1df2fa4
sanlock[17941]: 2012-11-19 21:38:19+0100 1040540 [18880]:
leader4 sn __LIBVIRT__DISKS__ rn 18fefd6b-45fe-4d2b-8774-7c890826d722.appserver0 ts 1040499 cs a4181d8a
At time 1040499, sanlock wrote timestamp 1040499.
At time 1040519, sanlock wrote timestamp 1040519.
At time 1040540, sanlock read what it last wrote and got back timestamp 1040499, instead of 1040519.
Either there's bad caching going on in the kernel, or some layer reported that a write was successful when it wasn't.
Quite strange. That path is on ext4 filesystem on top of LVM2 logical volume on top of a md raid1 device. Does that say something to you?
On Wed, Nov 21, 2012 at 05:17:27PM +0100, sysadmin@albasoft.com wrote:
On this specific host it was running directly on ext4. I'll change it to nfs and report.
Aren't you using multiple hosts to run shared vm images? If not, then using sanlock is pointless.
That path is on ext4 filesystem on top of LVM2 logical volume on top of a md raid1 device. Does that say something to you?
ext4 or md are probably the cause of the problems; sanlock will not work properly with either.
sanlock-devel@lists.fedorahosted.org