Disk IO issues

Wed Dec 31 23:42:42 UTC 2008

On Wed, Dec 31, 2008 at 17:35, Mike McGrath <mmcgrath at redhat.com> wrote:

> On Wed, 31 Dec 2008, Corey Chandler wrote:
>
> > Mike McGrath wrote:
> > > Lets pool some knowledge together because at this point, I'm missing
> > > something.
> > >
> > > I've been doing all measurements with sar as bonnie, etc, causes builds
> to
> > > timeout.
> > >
> > > Problem: We're seeing slower then normal disk IO.  At least I think we
> > > are.  This is a PERC5/E and MD1000 array.
> > >
> >
> > 1. Are we sure the array hasn't lost a drive?
>
> I can't physically look at the drive (they're a couple hundred miles away)
> but we've seen no reports of it (via the drac anyway).  I'll have to get
> the raid software on there to be for sure.  I'd think a degraded raid
> array would affect both direct block access and file level access.
>
> > 2. What's your scheduler set to?  CFQ tends to not work in many
> applications
> > where the deadline scheduler works better...
> >
>
> I'd tried other schedulers earlier but they didn't seem to make much of a
> difference.  Even still, I'll get dealine setup and take a look.
>
> At least we've got the dd and cat problem figured out.  Now to figure out
> why there's such a discrepancy between file level reads and block level
> reads.  Anyone else have an array of this type and size to run those tests
> on?  I'd be curious to see what others are getting.
>

we are working on a rhel3 to 5 migration at my job.  We have 2 primary
filesystems.  one is large database files and the other is lots of small
documents.  As we were testing backup software for rhel5 we noticed a 60%
decrease in speed moving from rhel3 to rhel5 with the same file system, but
only on the document filesystem, the db file system was perfectly snappy.

After a lot of troubleshooting it was deemed to be related to the dir_index
btree hash.  The path was to long before there was a difference in the names
of the files, making the index incredibly slow.  Removing dir_index
recovered a bit of the difference, but didn't resolve the issue.  A quick
rename of one of the base directories recovered almost the entire 60%.

Thought I'd at least throw it out there, although I'm not sure that it is
the exact issue, it doesn't hurt to have it floating in the background.

-greg/xaeth
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.fedoraproject.org/pipermail/infrastructure/attachments/20081231/8034cf9f/attachment.html