Semi-OT: Profiling 10GbE devices... Help?

Gilboa Davara gilboad at gmail.com
Sat Jan 24 17:46:08 UTC 2009


Hello all,

I'm almost certain that is the not the right place to ask this question,
but if RedHat/Fedora's kernel engineers can't help me, I'm truly
screwed.

I'm are using two Intel 10GbE (ixgbe) cards to passively monitor 10GbE
lines (Under RHEL 5.2) either using the in-kernel dev_add_pack interface
(built-in ixgbe driver) or using a slightly modified ixgbe driver.
(built around Intel's latest ixgbe driver)

However, I'm experiencing odd performance issues - namely, once I
configure the driver to use MSI-X w/ multi-queue [MQ] (forcing pci=msi)
and assign each IRQ to one CPU core (irq cpu affinity), my software
requires -10x- more CPU cycles (measured using rdtsc; compared to
multiple GbE links and/or w/ MSI-X/MQ disabled) to process each packet,
causing massive missed IRQs (rx_missed_errors) induced packet loss.
Looking at mpstat I can see the each CPU core is handling a fairly low
number of interrupts (200-1000) while spending most of its time in
softIRQ. (>90%, most likely within my own code)

I decided to check newer kernels so I've installed F10 (24C Xeon-MP
Intel S7000FC4U) and F9 (16C Opteron DL585G5, *) on two machines, but
even with 2.6.27 kernels and I'm experiencing the same performance
issues.
Given the fact that the same code is used to process packets - no matter
what type of links are being used, my first instinct was to look at the
CPU cores themselves. (E.g. L1 & L2 dcache miss rates; TLB flushes;
etc).

I tried using oprofile, but I failed to make it work. 
On one machine (Xeon-MP, F10), oprofile failed to identify the
Dunnington CPU (switching to timer mode) and on the other (Barcelona
8354, F9), even though it was configured to report dcache statistics
[1,2] opreport returns empty reports.
In-order to verify that oprofile indeed works on Opteron machine, I
reconfigured oprofile to report CPU usage [3], but even than, oprofile
either returns empty results to hard-locks the machine.

So:
A. Anyone else seeing the same odd behavior once MSI-X/MQ is enabled on
Intel's 10G cards? (P.S. MQ cannot be enabled on both machines unless I
add pci=msi to the kernel's command line)
B. Any idea why oprofile refuses to generate cache statistics and/or
what did I do wrong?
C. Before I dive into AMD's and Intel's MSR/PMC documentation and spend
the next five days trying to decipher which architectural /
non-architectural counter needs to set/used and how, do you have any
idea how I can access the performance counters without writing the code
myself?

- Gilboa
[1] opcontrol --setup --vmlinux /usr/lib/debug/lib/modules/2.6.27.9-73.fc9.x86_64/vmlinux --event=DATA_CACHE_ACCESS:1000:0:1:1
[2] opcontrol --setup --vmlinux /usr/lib/debug/lib/modules/2.6.27.9-73.fc9.x86_64/vmlinux --event=L2_CACHE_MISS:1000:0:1:1
[3] opcontrol --setup --vmlinux /usr/lib/debug/lib/modules/2.6.27.9-73.fc9.x86_64/vmlinux --event=CPU_CLK_UNHALTED:10000000:0:1:1
* F10 seems to dislike the DL585G5; Issue already reported against anaconda. (#480638)




More information about the kernel mailing list