On Fri, Jan 29, 2010 at 12:22:05PM +0200, Gilboa Davara wrote:
which was throttling the CPUs to 1.6 GHz (from a maximum of 2.4 GHz). I attempted to remedy this by setting InterruptThrottleRate=0,0 in the e1000e driver, after which we had one full day of testing with zero rx_missed_errors, but the application still reported packet loss.
rx_missed_error usually get triggered when the kernel is slow to handle incoming hardware interrupts. There's a trade-off here, increase the interrupt rate and you'll increase the kernel CPU usage as the expense of lower latency - decrease the interrupt rate, and you'll reduce the CPU usage at the expense of a higher chance of hitting the RX queue limit. I'd suggest you try setting the InterruptThrottleRate to 1000, while increasing the RX queues to 4096. (sbin/ethtool -G DEVICE rx 4096)
You could try enabling multi-queue by adding IntterruptType=2, RSS=NUM_OF_QUEUE and MQ=1 to your modprobe.conf.d.
I'll try these suggestions later today. Note that I was able to disable interrupt throttling on the on-board 82574L NICs without seeing any rx_missed_errors.
Can you post the output of $ mpstat -P 1 ALL during peak load?
We run "mpstat -P 5 ALL" continuously; is this sufficient resolution? I've attached the mpstat output from the 09:30-10:30 yesterday, which is one of the busiest hours of the day for multicast traffic.
Also, here is the top of the output from powertop. Are you running with C-STATE enabled? It is somewhat troubling that more than half of the time is spent in the most power-saving state (C3), but I think this is averaged across all CPUs.
PowerTOP version 1.11 (C) 2007 Intel Corporation
Cn Avg residency P-states (frequencies) C0 (cpu running) (15.2%) polling 5.5ms ( 4.1%) C1 halt 0.2ms (23.0%) C2 mwait 0.2ms ( 4.6%) C3 mwait 0.4ms (53.1%)
Wakeups-from-idle per second : 2833.7 interval: 10.0s no ACPI power usage estimate available
Top causes for wakeups: 47.7% (8416.6) <interrupt> : lan1-TxRx-0 25.5% (4498.9) <kernel IPI> : Rescheduling interrupts 13.2% (2324.3) <kernel core> : hrtimer_start_range_ns (tick_sched_timer) 5.7% (1000.9) kipmi0 : __mod_timer (process_timeout) 4.1% (721.9) <interrupt> : lan0-TxRx-0 2.3% (413.0) <interrupt> : extra timer interrupt 0.6% ( 99.8) <kernel module> : __mod_timer (smi_timeout) 0.5% ( 93.1) <interrupt> : ata_piix, ata_piix, uhci_hcd:usb5, uhci_hcd: 0.1% ( 17.2) <kernel core> : __mod_timer (neigh_periodic_timer) 0.1% ( 11.1) <kernel core> : hrtimer_start (tick_sched_timer) 0.1% ( 10.4) vconfig : __mod_timer (garp_join_timer) 0.1% ( 10.0) <kernel module> : __mod_timer (ipmi_timeout) ...
Thanks, Kelvin
On Fri, 2010-01-29 at 14:59 -0500, Kelvin Ku wrote:
On Fri, Jan 29, 2010 at 12:22:05PM +0200, Gilboa Davara wrote:
which was throttling the CPUs to 1.6 GHz (from a maximum of 2.4 GHz). I attempted to remedy this by setting InterruptThrottleRate=0,0 in the e1000e driver, after which we had one full day of testing with zero rx_missed_errors, but the application still reported packet loss.
rx_missed_error usually get triggered when the kernel is slow to handle incoming hardware interrupts. There's a trade-off here, increase the interrupt rate and you'll increase the kernel CPU usage as the expense of lower latency - decrease the interrupt rate, and you'll reduce the CPU usage at the expense of a higher chance of hitting the RX queue limit. I'd suggest you try setting the InterruptThrottleRate to 1000, while increasing the RX queues to 4096. (sbin/ethtool -G DEVICE rx 4096)
You could try enabling multi-queue by adding IntterruptType=2, RSS=NUM_OF_QUEUE and MQ=1 to your modprobe.conf.d.
I'll try these suggestions later today. Note that I was able to disable interrupt throttling on the on-board 82574L NICs without seeing any rx_missed_errors.
Did it help?
Can you post the output of $ mpstat -P 1 ALL during peak load?
We run "mpstat -P 5 ALL" continuously; is this sufficient resolution? I've attached the mpstat output from the 09:30-10:30 yesterday, which is one of the busiest hours of the day for multicast traffic.
~15'000 interrupts/core seems rather high to me - especially considering the fact that this is a 1GbE link. Reducing the InterruptThrottleRate to 1000/5000 while increasing the queue count (ethtool -G ... rx ...) should decrease it.
Also, here is the top of the output from powertop. Are you running with C-STATE enabled? It is somewhat troubling that more than half of the time is spent in the most power-saving state (C3), but I think this is averaged across all CPUs.
I usually disable power management. Be advised, that we are using 10GbE cards and not 1GbE, so we are more vulnerable to scaling-the-core-down-right-when-the-cards-starts-flooding-the-hell-out-of-it...
P.S. Please post your complete hardware configuration. (Board, CPU, in-which slot did you put the NIC, etc)
- Gilboa
On Mon, Feb 01, 2010 at 07:35:11AM +0200, Gilboa Davara wrote:
On Fri, 2010-01-29 at 14:59 -0500, Kelvin Ku wrote:
On Fri, Jan 29, 2010 at 12:22:05PM +0200, Gilboa Davara wrote:
which was throttling the CPUs to 1.6 GHz (from a maximum of 2.4 GHz). I attempted to remedy this by setting InterruptThrottleRate=0,0 in the e1000e driver, after which we had one full day of testing with zero rx_missed_errors, but the application still reported packet loss.
rx_missed_error usually get triggered when the kernel is slow to handle incoming hardware interrupts. There's a trade-off here, increase the interrupt rate and you'll increase the kernel CPU usage as the expense of lower latency - decrease the interrupt rate, and you'll reduce the CPU usage at the expense of a higher chance of hitting the RX queue limit. I'd suggest you try setting the InterruptThrottleRate to 1000, while increasing the RX queues to 4096. (sbin/ethtool -G DEVICE rx 4096)
You could try enabling multi-queue by adding IntterruptType=2, RSS=NUM_OF_QUEUE and MQ=1 to your modprobe.conf.d.
I'll try these suggestions later today. Note that I was able to disable interrupt throttling on the on-board 82574L NICs without seeing any rx_missed_errors.
Did it help?
I switched to an 82576 NIC. The kernel igb driver (version 1.3.16-k2) has multiqueue enabled by default with 4 rx and 4 tx queues. I'm running with 4096 rx ring entries enabled.
The target app is performing well on the igb NIC whereas on the e1000e NIC it was missing packets. I say "missing" rather than "dropped" because these packets don't show up on any error counters. However, in throughput testing, we can't receive faster than 905-910 Mbps whereas we can reliably receive at 950 Mbps on our older non-Nehalem machines.
Can you post the output of $ mpstat -P 1 ALL during peak load?
We run "mpstat -P 5 ALL" continuously; is this sufficient resolution? I've attached the mpstat output from the 09:30-10:30 yesterday, which is one of the busiest hours of the day for multicast traffic.
~15'000 interrupts/core seems rather high to me - especially considering the fact that this is a 1GbE link. Reducing the InterruptThrottleRate to 1000/5000 while increasing the queue count (ethtool -G ... rx ...) should decrease it.
Also, here is the top of the output from powertop. Are you running with C-STATE enabled? It is somewhat troubling that more than half of the time is spent in the most power-saving state (C3), but I think this is averaged across all CPUs.
I usually disable power management. Be advised, that we are using 10GbE cards and not 1GbE, so we are more vulnerable to scaling-the-core-down-right-when-the-cards-starts-flooding-the-hell-out-of-it...
I notice that ASPM is enabled on the 82576 NIC and the PCIe ports. Have you disabled ASPM? Disabling C-STATE had no effect on throughput or app performance.
I'm going to test with ASPM disabled later today.
P.S. Please post your complete hardware configuration. (Board, CPU, in-which slot did you put the NIC, etc)
- Gilboa
I've attached dmidecode and lspci output. Here's a summary:
Motherboard: Supermicro X8DTL-iF CPU: Single Xeon E5530 RAM: 3x1GB 1333MHz DDR3 ECC Registered
The 82576 NIC is inserted into a PCIe 2.0 x8 slot.
- Kelvin
On Mon, Feb 01, 2010 at 07:35:11AM +0200, Gilboa Davara wrote:
Can you post the output of $ mpstat -P 1 ALL during peak load?
We run "mpstat -P 5 ALL" continuously; is this sufficient resolution? I've attached the mpstat output from the 09:30-10:30 yesterday, which is one of the busiest hours of the day for multicast traffic.
~15'000 interrupts/core seems rather high to me - especially considering the fact that this is a 1GbE link. Reducing the InterruptThrottleRate to 1000/5000 while increasing the queue count (ethtool -G ... rx ...) should decrease it.
Oops, forgot to respond to this. The interrupt rate is currently around 4500 intr/s in total (i.e. the sum of intr/s over all cores) under load, which seems normal.
Also, I've attached the dmidecode and lspci output.
- Kelvin
Hello Kelvin,
I somehow missed your reply.
On Tue, 2010-02-16 at 15:02 -0500, Kelvin Ku wrote:
Did it help?
I switched to an 82576 NIC. The kernel igb driver (version 1.3.16-k2) has multiqueue enabled by default with 4 rx and 4 tx queues. I'm running with 4096 rx ring entries enabled.
The target app is performing well on the igb NIC whereas on the e1000e NIC it was missing packets. I say "missing" rather than "dropped" because these packets don't show up on any error counters. However, in throughput testing, we can't receive faster than 905-910 Mbps whereas we can reliably receive at 950 Mbps on our older non-Nehalem machines.
You should see the missing package using $ ethtool -s. (Most likely rx_missed_errors)
I usually disable power management. Be advised, that we are using 10GbE cards and not 1GbE, so we are more vulnerable to scaling-the-core-down-right-when-the-cards-starts-flooding-the-hell-out-of-it...
I notice that ASPM is enabled on the 82576 NIC and the PCIe ports. Have you disabled ASPM? Disabling C-STATE had no effect on throughput or app performance.
At 1Gbps, it shouldn't. We have ASPM enabled on quad-1GbE and 10GbE without an issue.
I'm going to test with ASPM disabled later today.
P.S. Please post your complete hardware configuration. (Board, CPU, in-which slot did you put the NIC, etc)
I've attached dmidecode and lspci output. Here's a summary:
Motherboard: Supermicro X8DTL-iF CPU: Single Xeon E5530 RAM: 3x1GB 1333MHz DDR3 ECC Registered
The 82576 NIC is inserted into a PCIe 2.0 x8 slot.
The weird things is that a two PCI-E 1.0 channels should be more-than to drive a dual port 1GbE card - let alone 8x/2.0... I'm stumped.
Side question: Why did you buy an expensive dual socket board, and dual socket capable CPU (Xeon 55xx) to run a single CPU? If you don't need the extra codes (or memory), a normal desktop X58/Core i5/i7 will be just as good and far less expensive.
- Gilboa