intel graphics driver slow (was: slightly OT - X performance benchmark)

Wed Jun 3 14:46:37 UTC 2009

On Tue, 2009-06-02 at 18:38 -0400, David Malcolm wrote:
> On Tue, 2009-06-02 at 15:32 -0700, David L wrote:
> > On Tue, Jun 2, 2009 at 2:44 PM, Kevin DeKorte  wrote:
> > > On 06/02/2009 03:37 PM, David L wrote:
> > >> My f11 system seems extremely slow when running
> > >> some 2D gtk/cairo apps.  Is there a benchmark
> > >> suite that is yum installable for testing X performance?
> > <snip>
> > >>
> > >
> > > Personally I find that gtkperf is useful although not an exact science.
> > > I like to run it with the following options
> > >
> > > gtkperf -c 500 -a
> > >
> > Thanks Kevin,
> > 
> > That helped confirm my suspicions.  Here's my output with
> > the intel driver on my Intel 82865G:

KMS or UMS?

> [snip various results] 
> > GtkDrawingArea - Circles - time: 44.60
> > GtkDrawingArea - Text - time: 15.80
> [snip]
> 
> > Same computer using the vesa driver:
> > GtkPerf 0.40 - Starting testing: Tue Jun  2 15:19:09 2009
> 
> [snip]
> >GtkDrawingArea - Circles - time:  1.43
> > GtkDrawingArea - Text - time:  1.31
> [snip]
> 
> Looks like the circles/text tests are the most obvious differences,
> though most tests show marked differences.

As with everything, it helps to know what you're measuring.

Circles is the wide arc rendering path in the X server.  It's
essentially unused by gtk apps in general, but gtkperf does it anyway.
The arcs specified by the X protocol are insanely ugly (which is why
nobody uses them in real apps) and also not a hardware-accelerated
primitive.  We could break them down to spans inside the X server and
accelerate filling those spans, but we don't.

So they happen in software.  Which is also true for the vesa driver, but
there's a catch.  The vesa driver uses a trick called 'shadowfb', where
the whole screen is rendered in (cached) host memory and then the
updated regions are uploaded to the actual scanout memory.  This is
adequately fast, because it minimizes the number of memory cycles (read
cycles in particular) that you do to the framebuffer, which is typically
uncached.

In the intel driver, it's a different story.  We don't keep a shadow, so
the software fallback happens either cached or uncached, depending how
we map the framebuffer.  If it's cached, you have to do a big cache
flush when you finish rendering so the bits actually make it out from
the CPU's cache to the framebuffer.  If it's uncached, you're hitting
main memory on every cycle, which is also not great.

I don't remember offhand whether the text test is using Render or the
old core font path.  If the latter, then the same scenario applies; it's
not accelerated (because it's actually rather hard to accelerate well),
and the software path can't help but suck.

For the other stuff, 865 and 855 appear to have a chipset bug where the
command buffer doesn't always flush to the GPU reliably, so we have to
flush the entire CPU cache on every acceleration command:

http://cvs.fedoraproject.org/viewvc/rpms/kernel/F-11/drm-intel-big-hammer.patch?revision=1.1&view=markup

We do try to mitigate this by batching up big sequences of commands
rather than lots of little ones, but it still hurts.

So, two things.  Compare gtkperf results with different resolutions on
the intel driver; tests which are hitting the software fallback path
will be faster at smaller resolutions, because there's less framebuffer
to clflush out.  Also, figure out whether you're using KMS or not, and
try the other way.

- ajax
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : http://lists.fedoraproject.org/pipermail/test/attachments/20090603/f4f0a056/attachment.bin