Disk IO issues

Wed Dec 31 20:42:27 UTC 2008

Lets pool some knowledge together because at this point, I'm missing
something.

I've been doing all measurements with sar as bonnie, etc, causes builds to
timeout.

Problem: We're seeing slower then normal disk IO.  At least I think we
are.  This is a PERC5/E and MD1000 array.

When I try to do a normal copy "cp -adv /mnt/koji/packages /tmp/" I get
around 4-6MBytes/s

When I do a cp of a large file "cp /mnt/koji/out /tmp/" I get
30-40MBytes/s.

Then I "dd if=/dev/sde of=/dev/null" I get around 60-70 MBytes/s read.

If I "cat /dev/sde > /dev/null" I get between 225-300MBytes/s read.

The above tests are pretty consistent.  /dev/sde is a raid5 array,
hardware raid.

So my question here is, wtf?  I've been working to do a backup which I
would think would either cause network utilization to max out, or disk io
to max out.  I'm not seeing either.  Sar says the disks are 100% utilized
but I can cause major increases in actual disk reads and writes by just
running additional commands.  Also, if the disks were 100% utilized I'd
expect we would see lots more iowait.  We're not though, iowait on the box
is only %0.06 today.

So, long story short, we're seeing much better performance when just
reading or writing lots of data (though dd is many times slower then cat).
But with our real-world traffic, we're just seeing crappy crappy IO.

Thoughts, theories or opinions?  Some of the sysadmin noc guys have access
to run diagnostic commands, if you want more info about a setting, let me
know.

I should also mention there's lots going on with this box, for example its
hardware raid, lvm and I've got xen running on it (though the tests above
were not in a xen guest).

	-Mike