Hi,
Our database server is getting some very high response times and I´m trying to understand if the disk configuration is responsible for this issue.
It is a Fedora Core release 4 (Stentz) box with 4 GB RAM and 2 GenuineIntel XEON CPU 3.20 GHz Cache: 1024 KB.
Some typical iostat -x data: cpu-moy: %user %nice %sys %iowait %idle 11,96 0,00 7,04 1,31 79,70
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0,00 30,44 0,81 3,43 14,52 270,97 7,26 135,48 67,43 0,19 46,05 7,29 3,08
cpu-moy: %user %nice %sys %iowait %idle 12,26 0,00 7,34 6,63 73,77
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0,00 32,26 0,00 4,41 0,00 293,39 0,00 146,69 66,55 1,51 342,32 25,14 11,08
cpu-moy: %user %nice %sys %iowait %idle 13,67 0,00 6,83 7,74 71,76
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0,00 31,99 0,00 3,82 0,00 286,52 0,00 143,26 74,95 2,93 767,58 55,84 21,35
cpu-moy: %user %nice %sys %iowait %idle 13,27 0,00 6,83 14,37 65,53
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0,00 31,73 0,00 4,02 0,00 285,94 0,00 142,97 71,20 2,73 680,40 49,25 19,78
cpu-moy: %user %nice %sys %iowait %idle 12,86 0,00 6,33 9,45 71,36
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0,00 30,92 0,00 3,41 0,00 274,70 0,00 137,35 80,47 2,33 681,35 57,53 19,64
cpu-moy: %user %nice %sys %iowait %idle 12,45 0,00 6,02 1,91 79,62
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0,00 33,00 0,60 4,23 6,44 297,79 3,22 148,89 63,00 0,41 85,29 20,00 9,66
Is it normal have the %util column showing a saturation of 20% with only 140wKB/s?
How can I be sure if there is a hardware or software raid running over the disks? I think it is hardware because there is no mdadm process running, am I right?
Some dmesg data: SCSI subsystem initialized Fusion MPT base driver 3.01.20 Copyright (c) 1999-2004 LSI Logic Corporation ACPI: PCI Interrupt 0000:02:04.0[A] -> GSI 42 (level, low) -> IRQ 185 mptbase: Initiating ioc0 bringup ioc0: 53C1030: Capabilities={Initiator,Target} Fusion MPT SCSI Host driver 3.01.20 scsi0 : ioc0: LSI53C1030, FwRev=01032300h, Ports=1, MaxQ=255, IRQ=185 input: AT Translated Set 2 keyboard on isa0060/serio0 input: AT Translated Set 2 keyboard on isa0060/serio0 megaraid cmm: 2.20.2.5 (Release Date: Fri Jan 21 00:01:03 EST 2005) megaraid: 2.20.4.5 (Release Date: Thu Feb 03 12:27:22 EST 2005) megaraid: probe new device 0x1000:0x1960:0x1028:0x0520: bus 2:slot 5:func 0 ACPI: PCI Interrupt 0000:02:05.0[A] -> GSI 37 (level, low) -> IRQ 193 megaraid: fw version:[351S] bios version:[1.10] scsi1 : LSI Logic MegaRAID driver scsi[1]: scanning scsi channel 0 [Phy 0] for non-raid devices Vendor: SDR Model: GEM318P Rev: 1 Type: Processor ANSI SCSI revision: 02 scsi[1]: scanning scsi channel 1 [virtual] for logical drives Vendor: MegaRAID Model: LD 0 RAID1 69G Rev: 351S Type: Direct-Access ANSI SCSI revision: 02 SCSI device sda: 143114240 512-byte hdwr sectors (73274 MB) sda: asking for cache data failed sda: assuming drive cache: write through SCSI device sda: 143114240 512-byte hdwr sectors (73274 MB) sda: asking for cache data failed sda: assuming drive cache: write through sda: sda1 sda2 sda3 sda4 < sda5 > Attached scsi disk sda at scsi1, channel 1, id 0, lun 0 libata version 1.10 loaded. ata_piix version 1.03 ACPI: PCI Interrupt 0000:00:1f.2[A] -> GSI 18 (level, low) -> IRQ 177 PCI: Setting latency timer of device 0000:00:1f.2 to 64 ata1: SATA max UDMA/133 cmd 0xBC98 ctl 0xBC92 bmdma 0xBC60 irq 177 ata2: SATA max UDMA/133 cmd 0xBC80 ctl 0xBC7A bmdma 0xBC68 irq 177 ata1: SATA port has no device. scsi2 : ata_piix ata2: SATA port has no device. scsi3 : ata_piix isa bounce pool size: 16 pages kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. SELinux: Disabled at runtime. SELinux: Unregistering netfilter hooks cfq: depth 4 reached, tagging now on Attached scsi generic sg0 at scsi1, channel 0, id 6, lun 0, type 3 Attached scsi generic sg1 at scsi1, channel 1, id 0, lun 0, type 0 Floppy drive(s): fd0 is 1.44M
Thank you in advance!
Reimer
1:39pm Carlos H. Reimer said:
How can I be sure if there is a hardware or software raid running over the disks? I think it is hardware because there is no mdadm process running, am I right?
Wrong logic. Look at /proc/mdstat to see whether raid is running in the kernel. The mdadm is just for userland admnistration and/or monitoring.
scsi[1]: scanning scsi channel 1 [virtual] for logical drives Vendor: MegaRAID Model: LD 0 RAID1 69G Rev: 351S Type: Direct-Access ANSI SCSI revision: 02
There's your hardware raid.
../C
Regarding these weird messages in the dmesg output:
sda: asking for cache data failed sda: assuming drive cache: write through
Does it mean that scsi controller does not have cache?
And even if there is cache in the controller does message mean that it will not be used?
Thank you!
Reimer
-----Mensagem original----- De: fedora-list-bounces@redhat.com [mailto:fedora-list-bounces@redhat.com]Em nome de Curtis Doty Enviada em: quinta-feira, 8 de fevereiro de 2007 21:51 Para: Fedora Assunto: Re: Disk saturation
1:39pm Carlos H. Reimer said:
How can I be sure if there is a hardware or software raid
running over the
disks? I think it is hardware because there is no mdadm process
running, am
I right?
Wrong logic. Look at /proc/mdstat to see whether raid is running in the kernel. The mdadm is just for userland admnistration and/or monitoring.
scsi[1]: scanning scsi channel 1 [virtual] for logical drives Vendor: MegaRAID Model: LD 0 RAID1 69G Rev: 351S Type: Direct-Access ANSI SCSI revision: 02
There's your hardware raid.
../C
-- fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Regarding these weird messages in the dmesg output:
sda: asking for cache data failed sda: assuming drive cache: write through
Does it mean that scsi controller does not have cache?
And even if there is cache in the controller does message mean that it will not be used?
Sorry for the late reply.
I've seen exactly this message on some of my hardware RAIDS. For me, it does mean that the cache on the RAID controller will not be used. I believe it means the same for you.
That could be part of your latency issue. write-through is supposed to be slower than cached as it has to wait for the operation to finish. This could be faulty logic due to incomplete information though...
O> I've seen exactly this message on some of my hardware RAIDS. For me, it
does mean that the cache on the RAID controller will not be used. I believe it means the same for you.
It means your RAID card or its driver is too daft to support the SCSI cache management features. In at least some cases (eg the AMI megaraid) this doesn't mean the card does not use the on board battery backed cache if fitted merely that it does so without the OS being able to manage it directly.
Alan