This is a weird bug, has anyone else seen or fixed it?
If you connect storage up to a pandaboard USB port (flash or sata), you get about 5 MB/sec throughput. Now, if you "ping -i 0.001" the pandaboard from another host, you can increase the *storage* performance to 22 MB/sec (for my disk, max 32 MB/sec on an x86 desktop). Note that the ethernet device is itself also on the same USB hub (on chip) as the device storage.
This tells me that something in the smsc95xx driver is either missing an interrupt, or not polling fast enough, but I couldn't find anything.
USB protocol analyzer shows that most of the time, the transfers are happening at the full speed, so it's just stopping every once in a while, killing performance.
Kernel is 2.6.38.5-23.fc13.armv7l.omap from the Fedora 15 development SRPMs.
On Sat, May 28, 2011 at 1:53 AM, DJ Delorie dj@redhat.com wrote:
This is a weird bug, has anyone else seen or fixed it?
If you connect storage up to a pandaboard USB port (flash or sata), you get about 5 MB/sec throughput. Now, if you "ping -i 0.001" the pandaboard from another host, you can increase the *storage* performance to 22 MB/sec (for my disk, max 32 MB/sec on an x86 desktop). Note that the ethernet device is itself also on the same USB hub (on chip) as the device storage.
In order to find out if the smsc95xx really is the cause, maybe you can do the same/similar test when the driver is not loaded?
If the driver is compiled in the kernel, you can unbind (similar to a disconnect) the device with something like:
# cd /sys/class/net/usb0/device/driver/ # ls bind 1-0:1.0 uevent unbind # echo 1-0:1.0 > unbind
-> do some testing
# echo 1-0:1.0 > bind
-> and possibly you need to restart the network service
This tells me that something in the smsc95xx driver is either missing an interrupt, or not polling fast enough, but I couldn't find anything.
This could be, but maybe there is something going on in the USB-chip or on the USB-bus itself.
USB protocol analyzer shows that most of the time, the transfers are happening at the full speed, so it's just stopping every once in a while, killing performance.
I have seen a 'broken' USB-device once, which was sending 0-size URB's to the host when there was no data. The host was getting an unexpected high load. If the firmware from the USB-device was sending a NAK instead, the handling would have been done in the USB-chip, not in the driver, which passed the empty data on to the actual module/handler. The 0-size packet that was send, was actually conforming the specifications, however the driver was programmed in such an unsuitable way, that there was no delay in requesting more data (which resulted in a new 0-size packet....).
It would be interesting to see if there is data going to/from the smsc95xx if there is nothing to do. If there is data from this device on the bus, it might affect the bandwidth. Sending a ping would slow the smsc95xx down while processes the ping, and this might free some bandwidth for your disk.
Kernel is 2.6.38.5-23.fc13.armv7l.omap from the Fedora 15 development SRPMs.
I don't have access to a PandaBoard, so I can only review code and give some thoughts. Maybe you can post some of the USB-captures and a /proc/bus/usb/devices for review? I also don't know of you're familiar with systemtap (if it works on ARM at all, I haven't tried yet) and have a -debuginfo package. If so, you might want to try to find out what functions are being called.
HTH, Niels
On Tue, May 31, 2011 at 7:16 PM, DJ Delorie dj@redhat.com wrote:
In order to find out if the smsc95xx really is the cause, maybe you can do the same/similar test when the driver is not loaded?
The ethernet is *on* the smsc9514 chip, as a USB device. Disable the hub, no ethernet.
Uh, I think you understood my intention wrong. Obviously, the smsc9514 should provide usb0 as ethernet device as the smsc95xx module is a network driver.
I was interested to find out if this USB-device is conflicting with the storage. Therefore you might be able to disable smsc95xx and see if your storage performs much better. When the storage is still not performing as it should, there is likely an issue with the USB-controller or the (assuming internal) USB-hub and not necessarily with the smsc95xx.
Maybe it is clearer this way?
I was interested to find out if this USB-device is conflicting with the storage. Therefore you might be able to disable smsc95xx and see if your storage performs much better.
The storage is *also* connected via the smsc9514 chip. It's a USB docking port for sata disks.
So, ethernet and storage are going through the same physical USB hub (the smsc9514).
On Tue, May 31, 2011 at 7:55 PM, DJ Delorie dj@redhat.com wrote:
I was interested to find out if this USB-device is conflicting with the storage. Therefore you might be able to disable smsc95xx and see if your storage performs much better.
The storage is *also* connected via the smsc9514 chip. It's a USB docking port for sata disks.
So, ethernet and storage are going through the same physical USB hub (the smsc9514).
This is confusing me now... I can only find linux-2.6/drivers/net/usb/smsc95xx.c, which definitely is not providing a usb-hub (like linux-2.6/drivers/usb/core/hub.c) and also not a USB-storage interface.
The different USB-devices can be on the same chip, but they will still be different devices in the USB-stack. If you can attach or copy&paste your /proc/bus/usb/devices, I should be able to tell you what bus/port the smsc95xx is connected to, and how you can 'disconnect' it without unloading the smsc95xx-module. Using unbind/bind over sysfs you can logically disconnect the driver from the device, which would allow you to do some tests without side-effects caused by the smsc95xx driver.
This is confusing me now... I can only find linux-2.6/drivers/net/usb/smsc95xx.c, which definitely is not providing a usb-hub (like linux-2.6/drivers/usb/core/hub.c) and also not a USB-storage interface.
http://www.smsc.com/media/Downloads_Public/Data_Sheets/9514.pdf
See page 6 for a block diagram of the SMSC LAN9514 chip, if that helps understand the hardware.
On Tue, May 31, 2011 at 9:38 PM, DJ Delorie dj@redhat.com wrote:
This is confusing me now... I can only find linux-2.6/drivers/net/usb/smsc95xx.c, which definitely is not providing a usb-hub (like linux-2.6/drivers/usb/core/hub.c) and also not a USB-storage interface.
http://www.smsc.com/media/Downloads_Public/Data_Sheets/9514.pdf
See page 6 for a block diagram of the SMSC LAN9514 chip, if that helps understand the hardware.
So, that seems to be one chip, which provides at least presents two USB-devices (one hub with 4-ports for other devices and one ethernet-port). The information can be found in Chapters 1.1.2, 1.1.3 and Chapter 3 which describes the descriptors that can be programmed in the EEPROM.
It is possible to unbind the smsc95xx from the ethernet-device (via sysfs). Obviously you could unload the module already (it was does not seem to be compiled in your kernel). Unloading a driver is similar to unbinding the port/device/driver.
From my understanding in an other reply you sent to the list,
unloading the smsc95xx module does not make a difference, you seem to be able to reproduce the behavior with different devices (serial-port and usb-mouse).
It seems that the smsc95xx module can not be kept responsible for the bad performance of your drive. Other points that might investigation, include the USB-controller on your PandaBoard (ehci, ohci, ...) or possibly the USB-storage-stack. It might well be that there is an issue around the scheduling of the interrupt-handling done by the USB-controller...
Note that USB-mouses tend to be USB-1.x and USB-storage is usually USB-2.x. Different USB-versions are normally handled by different USB-controllers and therefore have their own drivers (for example, ehci = USB-2, uhci = USB-1). /proc/interrupts might point you to some more ideas. Possibly the USB2-controller uses a different interrupt than the USB1-controller and the serial-port. Depending the driver used for your USB2-controller, you might have some module-parameters that influence interrupt-handling:
$ modinfo ehci-hcd filename: /lib/modules/2.6.38.3-kw/kernel/drivers/usb/host/ehci-hcd.ko alias: platform:orion-ehci license: GPL author: David Brownell description: USB 2.0 'Enhanced' Host Controller (EHCI) Driver alias: pci:v*d*sv*sd*bc0Csc03i20* depends: usbcore vermagic: 2.6.38.3-kw preempt mod_unload ARMv5 parm: log2_irq_thresh:log2 IRQ latency, 1-64 microframes (int) parm: park:park setting; 1-3 back-to-back async packets (uint) parm: ignore_oc:ignore bogus hardware overcurrent indications (bool) parm: hird:host initiated resume duration, +1 for each 75us (int)
HTH, Niels
It seems that the smsc95xx module can not be kept responsible for the bad performance of your drive.
The opposite seems to be true - activity in the smsc95xx module *improves* performance. It still smells like a missing interrupt (or poll) problem, but I still don't know where...
Other points that might investigation, include the USB-controller on your PandaBoard (ehci, ohci, ...) or
[ 1833.619598] usb 1-1.3: new full speed USB device using ehci-omap and address 6 [ 2318.736450] usb 1-1.3: new high speed USB device using ehci-omap and address 7 [ 2695.448394] usb 1-1.3: new full speed USB device using ehci-omap and address 8
ehci-omap is built in to the kernel.
On Tue, May 31, 2011 at 10:53 PM, DJ Delorie dj@redhat.com wrote:
It seems that the smsc95xx module can not be kept responsible for the bad performance of your drive.
The opposite seems to be true - activity in the smsc95xx module *improves* performance. It still smells like a missing interrupt (or poll) problem, but I still don't know where...
Other points that might investigation, include the USB-controller on your PandaBoard (ehci, ohci, ...) or
[ 1833.619598] usb 1-1.3: new full speed USB device using ehci-omap and address 6 [ 2318.736450] usb 1-1.3: new high speed USB device using ehci-omap and address 7 [ 2695.448394] usb 1-1.3: new full speed USB device using ehci-omap and address 8
ehci-omap is built in to the kernel.
This driver seems to use all the ehci-hcd functions. You can set the parameters on the kernel-command line like "ehci_hcd.ignore_oc=1" (to ignore over-currents, which you probably don't have). It's late here, and I can not concentrate the effects that the options describe in ehci-hcd.c at the moment. You might find some hints in linux-2.6/Documentation/usb/ehci.txt as it describes some debugging options.
Good luck!
Quoting DJ Delorie dj@redhat.com:
I was interested to find out if this USB-device is conflicting with the storage. Therefore you might be able to disable smsc95xx and see if your storage performs much better.
The storage is *also* connected via the smsc9514 chip. It's a USB docking port for sata disks.
So, ethernet and storage are going through the same physical USB hub (the smsc9514).
Can you put the ethernet and storage on different hubs? (it may not be..)
Does the issues replicate itself if you are using a usb drive connection instead of a sata one? (with the ethernet running through the hub still)
Or do you see similar crappy performance with the nic adapter unplugged and unused?
Can you plug the docking hubs into another hub before it reaches the device?
Can you tell if the nic adapter is going into a sleep mode?
Can you add power to the hub? (maybe it is drawing too much power if something is bus-powered..)
Can you put the ethernet and storage on different hubs? (it may not be..)
No, it's all on-board. No other hubs, all board-powered.
Or do you see similar crappy performance with the nic adapter unplugged and unused?
I see the same crappy performance with the smsc95xx driver unloaded, net down, cable unplugged.
That gave me an idea, though... I unloaded the smsc95xx driver, and plugged in a serial port connected to a 9600 baud character generator, and got good performance. Likewise with a usb mouse, as long as I wiggled the mouse during the test.
Note: I get crappy performance with the network *up*, as long as there's no traffic.
If I plug in *two* usb-storage devices (er, flash stick and sata dock), I get the same crappy performance for both of them (i.e. same per-drive throughput as the single-drive crappy performance).
On 05/31/2011 09:36 PM, Somebody in the thread at some point said:
Can you put the ethernet and storage on different hubs? (it may not be..)
No, it's all on-board. No other hubs, all board-powered.
Or do you see similar crappy performance with the nic adapter unplugged and unused?
I see the same crappy performance with the smsc95xx driver unloaded, net down, cable unplugged.
That gave me an idea, though... I unloaded the smsc95xx driver, and plugged in a serial port connected to a 9600 baud character generator, and got good performance. Likewise with a usb mouse, as long as I wiggled the mouse during the test.
Note: I get crappy performance with the network *up*, as long as there's no traffic.
If I plug in *two* usb-storage devices (er, flash stick and sata dock), I get the same crappy performance for both of them (i.e. same per-drive throughput as the single-drive crappy performance).
cc-ing Felipe Balbi (whole thread here http://comments.gmane.org/gmane.linux.redhat.fedora.arm/1266 ) in case he has an idea.
I had a go a reproducing this also on a 2.6.38-based kernel on Panda. I was thinking it sounds a bit like USB suspend (not the same as kernel suspend) because evidently there is deadtime after idle from device. You showed that traffic to the device doesn't count with your "two usb storage device" test.
First two goes are with ethernet idle, then two goes with pingflood
root@linaro:~# dd if=/dev/zero of=/media/FAD8-72F0/dump bs=4096 count=10240 10240+0 records in 10240+0 records out 41943040 bytes (42 MB) copied, 17.4684 s, 2.4 MB/s root@linaro:~# dd if=/dev/zero of=/media/FAD8-72F0/dump bs=4096 count=10240 10240+0 records in 10240+0 records out 41943040 bytes (42 MB) copied, 17.219 s, 2.4 MB/s root@linaro:~# dd if=/dev/zero of=/media/FAD8-72F0/dump bs=4096 count=10240 10240+0 records in 10240+0 records out 41943040 bytes (42 MB) copied, 11.4112 s, 3.7 MB/s root@linaro:~# dd if=/dev/zero of=/media/FAD8-72F0/dump bs=4096 count=10240 10240+0 records in 10240+0 records out 41943040 bytes (42 MB) copied, 11.8075 s, 3.6 MB/s
This is the same tests done with CONFIG_USB_SUSPEND disabled, no real difference.
root@linaro:~# dd if=/dev/zero of=/media/FAD8-72F0/dump bs=4096 count=10240 10240+0 records in 10240+0 records out 41943040 bytes (42 MB) copied, 15.8717 s, 2.6 MB/s root@linaro:~# dd if=/dev/zero of=/media/FAD8-72F0/dump bs=4096 count=10240 10240+0 records in 10240+0 records out 41943040 bytes (42 MB) copied, 17.0267 s, 2.5 MB/s root@linaro:~# dd if=/dev/zero of=/media/FAD8-72F0/dump bs=4096 count=10240 10240+0 records in 10240+0 records out 41943040 bytes (42 MB) copied, 10.6266 s, 3.9 MB/s root@linaro:~# dd if=/dev/zero of=/media/FAD8-72F0/dump bs=4096 count=10240 10240+0 records in 10240+0 records out 41943040 bytes (42 MB) copied, 11.9173 s, 3.5 MB/s
I had a look at the smsc lan9514 and ULPI PHY datasheets
http://www.smsc.com/media/Downloads_Public/Data_Sheets/3320.pdf
to see if there was something obvious to hack suspend disabled but didn't see anything useful.
-Andy
On 2011-05-27 19:53, DJ Delorie wrote:
This is a weird bug, has anyone else seen or fixed it?
If you connect storage up to a pandaboard USB port (flash or sata), you get about 5 MB/sec throughput. Now, if you "ping -i 0.001" the pandaboard from another host, you can increase the *storage* performance to 22 MB/sec (for my disk, max 32 MB/sec on an x86 desktop). Note that the ethernet device is itself also on the same USB hub (on chip) as the device storage.
I recently ran some bonnie++ benchmarks against both an SD card and a USB 2.0 hard drive and I was seeing about 10 MB/sec on the USB drive. http://jeffbastian.blogspot.com/2011/05/storage-speed-on-pandaboard.html
This was with the 2.6.35-g6d019da-dirty kernel.
I'll have to try this again with the pings and see if that improves the performance.
Jeff
On 2011-05-31 10:00, Jeffrey Bastian wrote:
I recently ran some bonnie++ benchmarks against both an SD card and a USB 2.0 hard drive and I was seeing about 10 MB/sec on the USB drive. http://jeffbastian.blogspot.com/2011/05/storage-speed-on-pandaboard.html
This was with the 2.6.35-g6d019da-dirty kernel.
I'll have to try this again with the pings and see if that improves the performance.
I ran bonnie++ again on my PandaBoard with a USB hard drive and I also saw the performance double when I was pinging the system. I'm now getting 18 MB/sec writes and 24 MB/sec reads.
I posted the full bonnie++ results at: http://jeffbastian.blogspot.com/2011/06/storage-speed-on-pandaboard-revisite...
Jeff
Hi guys,
On Thu, Jun 2, 2011 at 5:50 PM, Jeffrey Bastian jbastian@redhat.com wrote:
On 2011-05-31 10:00, Jeffrey Bastian wrote:
I recently ran some bonnie++ benchmarks against both an SD card and a USB 2.0 hard drive and I was seeing about 10 MB/sec on the USB drive. http://jeffbastian.blogspot.com/2011/05/storage-speed-on-pandaboard.html
This was with the 2.6.35-g6d019da-dirty kernel.
I'll have to try this again with the pings and see if that improves the performance.
I ran bonnie++ again on my PandaBoard with a USB hard drive and I also saw the performance double when I was pinging the system. I'm now getting 18 MB/sec writes and 24 MB/sec reads.
I posted the full bonnie++ results at: http://jeffbastian.blogspot.com/2011/06/storage-speed-on-pandaboard-revisite...
Just wondering if someone got on with this. Maybe a tcpdump from the USB-traffic helps. I'd suggest to compare the following
a) tcpdump from the USB-storage without pinging b) tcpdump from the USB-storage with pinging (or other usage)
tcpdump can capture the USB-traffic (on a similar level as Ethernet), with a command like this:
# tcpdump -w /tmp/my-usb-traffic.pcap -s 256 -i usbmon0
usbmon0 captures all USB-traffic, where usbmon<N> captures the traffic from USB-bus <N>. The USB-bus where the device is connected can be found in /proc/bus/usb/devices.
tshark (or wireshark) can read the generated .pcap file. In order to make sense out of the bus-numbers and device-IDs, the /proc/bus/usb/devices is very valuable. In case someone decides to make traces available, please provide that file as well.
Good luck, Niels
I put tcpdump traces here:
http://www.delorie.com/tmp/usb-yes-pings.gz http://www.delorie.com/tmp/usb-no-pings.gz http://www.delorie.com/tmp/lsusb.txt
(my panda doesn't have /proc/bus/usb for some reason)
Note: I also have a USB protocol analyzer (www.totalphase.com) which I can do wire-level USB dumps from, if that would help. They're *big* though.
Today Salman Zafar and I observed something that we hadn't noticed before (not sure if it didn't happen, or we didn't notice it) -- the PandaBoards in the build farm have high latency, with occasional packet drops, when pinged at 1-second intervals (the default for the 'ping' command).
Oddly, when flood-pinged, they don't drop any packets, and the latency improves. This sounds a lot like the storage performance bug, without the storage :-) Perhaps an IRQ is being missed, and a subsequent IRQ (or polling event or something else) is causing the driver to pick up the data late? (Just speculating)
Regular ping ...
------------------------------------------- $ ping -c20 5-1 PING cdot-panda-5-1 (192.168.1.151) 56(84) bytes of data. 64 bytes from cdot-panda-5-1 (192.168.1.151): icmp_req=1 ttl=64 time=0.753 ms 64 bytes from cdot-panda-5-1 (192.168.1.151): icmp_req=2 ttl=64 time=49.8 ms 64 bytes from cdot-panda-5-1 (192.168.1.151): icmp_req=3 ttl=64 time=128 ms 64 bytes from cdot-panda-5-1 (192.168.1.151): icmp_req=4 ttl=64 time=8.21 ms 64 bytes from cdot-panda-5-1 (192.168.1.151): icmp_req=5 ttl=64 time=233 ms 64 bytes from cdot-panda-5-1 (192.168.1.151): icmp_req=6 ttl=64 time=646 ms 64 bytes from cdot-panda-5-1 (192.168.1.151): icmp_req=7 ttl=64 time=0.609 ms 64 bytes from cdot-panda-5-1 (192.168.1.151): icmp_req=8 ttl=64 time=248 ms 64 bytes from cdot-panda-5-1 (192.168.1.151): icmp_req=9 ttl=64 time=318 ms 64 bytes from cdot-panda-5-1 (192.168.1.151): icmp_req=10 ttl=64 time=267 ms 64 bytes from cdot-panda-5-1 (192.168.1.151): icmp_req=11 ttl=64 time=267 ms 64 bytes from cdot-panda-5-1 (192.168.1.151): icmp_req=12 ttl=64 time=268 ms 64 bytes from cdot-panda-5-1 (192.168.1.151): icmp_req=13 ttl=64 time=266 ms 64 bytes from cdot-panda-5-1 (192.168.1.151): icmp_req=14 ttl=64 time=265 ms 64 bytes from cdot-panda-5-1 (192.168.1.151): icmp_req=15 ttl=64 time=153 ms 64 bytes from cdot-panda-5-1 (192.168.1.151): icmp_req=16 ttl=64 time=0.632 ms 64 bytes from cdot-panda-5-1 (192.168.1.151): icmp_req=17 ttl=64 time=61.6 ms 64 bytes from cdot-panda-5-1 (192.168.1.151): icmp_req=18 ttl=64 time=0.628 ms 64 bytes from cdot-panda-5-1 (192.168.1.151): icmp_req=19 ttl=64 time=75.9 ms 64 bytes from cdot-panda-5-1 (192.168.1.151): icmp_req=20 ttl=64 time=23.6 ms
--- cdot-panda-5-1 ping statistics --- 20 packets transmitted, 20 received, 0% packet loss, time 19021ms rtt min/avg/max/mdev = 0.609/164.416/646.835/159.032 ms -------------------------------------------
Notice the 164 mS average latency. No packet loss in this particular sample, though.
Now a flood ping:
------------------------------------------- # ping -f -c20 5-1 PING cdot-panda-5-1 (192.168.1.151) 56(84) bytes of data.
--- cdot-panda-5-1 ping statistics --- 20 packets transmitted, 20 received, 0% packet loss, time 92ms rtt min/avg/max/mdev = 0.373/4.666/19.813/6.597 ms, pipe 2, ipg/ewma 4.883/3.134 ms -------------------------------------------
Results vary but are generally repeatable.
This kernel (which is a 2.6.35.3) doesn't appear to be built with the appropriate /sys/kernel/debug/usbmon bits for tracing with tcpdump. (Anyone know offhand the kernel flags for that?)
-Chris
Wow! Mine does the same thing, with 2.6.38.5-23.fc13.armv7l.omap. Even when flooding, some of the responses take longer than the ping interval...
$ ping -c20 panda-1 PING panda-1.delorie.com (172.31.6.2) 56(84) bytes of data. 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=1 ttl=64 time= 0.477 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=2 ttl=64 time= 0.710 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=3 ttl=64 time= 0.527 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=4 ttl=64 time= 0.610 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=5 ttl=64 time=184 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=6 ttl=64 time= 0.681 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=7 ttl=64 time=183 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=8 ttl=64 time=182 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=9 ttl=64 time=181 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=10 ttl=64 time= 0.439 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=11 ttl=64 time= 0.421 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=12 ttl=64 time= 0.516 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=13 ttl=64 time= 0.485 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=14 ttl=64 time=181 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=15 ttl=64 time= 0.475 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=16 ttl=64 time=181 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=17 ttl=64 time=180 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=18 ttl=64 time=179 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=19 ttl=64 time=178 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=20 ttl=64 time=177 ms --- panda-1.delorie.com ping statistics --- 20 packets transmitted, 20 received, 0% packet loss, time 19012ms rtt min/avg/max/mdev = 0.421/90.797/184.668/90.275 ms
$ sudo ping -i 0.1 -c20 panda-1 PING panda-1.delorie.com (172.31.6.2) 56(84) bytes of data. 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=1 ttl=64 time= 0.474 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=2 ttl=64 time=110 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=3 ttl=64 time= 0.563 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=4 ttl=64 time=110 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=5 ttl=64 time= 0.567 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=6 ttl=64 time= 57.2 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=7 ttl=64 time= 81.2 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=8 ttl=64 time=109 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=9 ttl=64 time= 0.504 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=10 ttl=64 time=110 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=11 ttl=64 time= 0.574 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=12 ttl=64 time=110 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=13 ttl=64 time= 0.473 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=14 ttl=64 time=110 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=15 ttl=64 time= 0.468 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=16 ttl=64 time= 0.655 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=17 ttl=64 time= 41.5 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=18 ttl=64 time=109 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=19 ttl=64 time= 10.2 ms --- panda-1.delorie.com ping statistics --- 20 packets transmitted, 19 received, 5% packet loss, time 1969ms rtt min/avg/max/mdev = 0.468/50.825/110.450/50.031 ms, pipe 2
$ sudo ping -i 0.01 -c20 panda-1 PING panda-1.delorie.com (172.31.6.2) 56(84) bytes of data. 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=1 ttl=64 time= 0.629 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=2 ttl=64 time= 0.506 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=3 ttl=64 time=20.2 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=4 ttl=64 time= 0.503 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=5 ttl=64 time= 0.476 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=6 ttl=64 time=20.3 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=7 ttl=64 time= 0.549 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=8 ttl=64 time= 0.510 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=9 ttl=64 time= 0.525 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=10 ttl=64 time= 0.520 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=11 ttl=64 time= 0.528 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=12 ttl=64 time=10.3 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=13 ttl=64 time= 0.434 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=14 ttl=64 time= 0.409 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=15 ttl=64 time= 0.409 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=16 ttl=64 time=20.4 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=17 ttl=64 time= 9.78 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=18 ttl=64 time=15.4 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=19 ttl=64 time=15.5 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=20 ttl=64 time=15.5 ms --- panda-1.delorie.com ping statistics --- 20 packets transmitted, 20 received, 0% packet loss, time 230ms rtt min/avg/max/mdev = 0.409/6.679/20.404/7.978 ms, pipe 2
It is interesting that seq 1-9, and 10-18 are identical patterns.. (19 and 20 break the pattern..)
Quoting DJ Delorie dj@redhat.com:
Wow! Mine does the same thing, with 2.6.38.5-23.fc13.armv7l.omap. Even when flooding, some of the responses take longer than the ping interval...
$ ping -c20 panda-1 PING panda-1.delorie.com (172.31.6.2) 56(84) bytes of data. 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=1 ttl=64 time= 0.477 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=2 ttl=64 time= 0.710 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=3 ttl=64 time= 0.527 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=4 ttl=64 time= 0.610 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=5 ttl=64 time=184 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=6 ttl=64 time= 0.681 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=7 ttl=64 time=183 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=8 ttl=64 time=182 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=9 ttl=64 time=181 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=10 ttl=64 time= 0.439 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=11 ttl=64 time= 0.421 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=12 ttl=64 time= 0.516 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=13 ttl=64 time= 0.485 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=14 ttl=64 time=181 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=15 ttl=64 time= 0.475 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=16 ttl=64 time=181 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=17 ttl=64 time=180 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=18 ttl=64 time=179 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=19 ttl=64 time=178 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=20 ttl=64 time=177 ms --- panda-1.delorie.com ping statistics --- 20 packets transmitted, 20 received, 0% packet loss, time 19012ms rtt min/avg/max/mdev = 0.421/90.797/184.668/90.275 ms
$ sudo ping -i 0.1 -c20 panda-1 PING panda-1.delorie.com (172.31.6.2) 56(84) bytes of data. 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=1 ttl=64 time= 0.474 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=2 ttl=64 time=110 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=3 ttl=64 time= 0.563 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=4 ttl=64 time=110 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=5 ttl=64 time= 0.567 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=6 ttl=64 time= 57.2 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=7 ttl=64 time= 81.2 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=8 ttl=64 time=109 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=9 ttl=64 time= 0.504 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=10 ttl=64 time=110 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=11 ttl=64 time= 0.574 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=12 ttl=64 time=110 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=13 ttl=64 time= 0.473 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=14 ttl=64 time=110 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=15 ttl=64 time= 0.468 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=16 ttl=64 time= 0.655 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=17 ttl=64 time= 41.5 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=18 ttl=64 time=109 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=19 ttl=64 time= 10.2 ms --- panda-1.delorie.com ping statistics --- 20 packets transmitted, 19 received, 5% packet loss, time 1969ms rtt min/avg/max/mdev = 0.468/50.825/110.450/50.031 ms, pipe 2
$ sudo ping -i 0.01 -c20 panda-1 PING panda-1.delorie.com (172.31.6.2) 56(84) bytes of data. 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=1 ttl=64 time= 0.629 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=2 ttl=64 time= 0.506 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=3 ttl=64 time=20.2 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=4 ttl=64 time= 0.503 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=5 ttl=64 time= 0.476 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=6 ttl=64 time=20.3 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=7 ttl=64 time= 0.549 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=8 ttl=64 time= 0.510 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=9 ttl=64 time= 0.525 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=10 ttl=64 time= 0.520 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=11 ttl=64 time= 0.528 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=12 ttl=64 time=10.3 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=13 ttl=64 time= 0.434 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=14 ttl=64 time= 0.409 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=15 ttl=64 time= 0.409 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=16 ttl=64 time=20.4 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=17 ttl=64 time= 9.78 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=18 ttl=64 time=15.4 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=19 ttl=64 time=15.5 ms 64 bytes from panda-1.31.172.in-addr.arpa (172.31.6.2): icmp_req=20 ttl=64 time=15.5 ms --- panda-1.delorie.com ping statistics --- 20 packets transmitted, 20 received, 0% packet loss, time 230ms rtt min/avg/max/mdev = 0.409/6.679/20.404/7.978 ms, pipe 2 _______________________________________________ arm mailing list arm@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/arm
I would say this points to a FIFO alarm level being not to your favor inside the hub. If pinging makes storage faster, that seems to me that the USB hub is buffering bulk packets until it hits an alarm condition, and then sending them as a sequence of transfers. Just because there is a hub + lan chip doesn't mean they are independent. I am sure the SMSC parts are tightly combined and sharing buffers somewhere.
As I've said before, CONFIG_USB_EHCI_TT_NEWSCHED might change the behavior if there is any Transaction Translator in use on the hub (implies a USB 1.1 device like a mouse or keyboard). The turbo_mode option to the SMSC driver may well also change the ethernet behavior.. it may not make it better, but it could be that it makes it much more stable (evenly slow across the board). By disabling both you can be sure that whatever data is going into the chip is slowly but surely, no magic batching up of transactions. If there is a complete lack of performance improvement then you can basically put that down to FIFO inside the chip not hitting the alarm level required for the chip to empty it on the other side..
On 06/06/2011 02:37:20 PM, DJ Delorie wrote:
how's it going with this?
I put tcpdump traces here:
http://www.delorie.com/tmp/usb-yes-pings.gz http://www.delorie.com/tmp/usb-no-pings.gz http://www.delorie.com/tmp/lsusb.txt
(my panda doesn't have /proc/bus/usb for some reason)
it's deprecated but very handy (usbview uses it) you enable it in the kernel config in the usb section
Note: I also have a USB protocol analyzer (www.totalphase.com) which I can do wire-level USB dumps from, if that would help. They're *big* though.
did you ever look at /proc/interrupts and see if the numbers are increasing?
i ask because i'm chasing a problem now of the kernel disabling interrupts on my pcie usb card (i7 not arm) because the kernel abruptly gets alot (100K) of interrupts and can't find a driver to admit to owning them so it disables that interrupt and sets up a polling timer. the default timer rate is HZ/10 which is 100msec on my system and results in horrible performance. when i change it to a 10 msec poll i can barely tell there's a problem.
anyway the upshot is check your logs for "kernel: irq NN: nobody cared". this means that this interrupt has been switched off and replaced with polling. i'm pretty sure the number in /proc/interrupts stops changing (i can't look now because it hasn't failed today yet)
hth
On 2011-06-20 09:16, Andrew Burgess wrote:
On 06/06/2011 02:37:20 PM, DJ Delorie wrote:
how's it going with this?
A comment left on my blog [1] pointed me to a Google Groups discussion [2] which pointed to a problem with gcc-4.6.0 [3]. Fedora 13 had gcc-4.4.5, though, and the kernel I'm using was compiled with gcc-4.3.3, so this may not be related.
Jeff
[1] http://jeffbastian.blogspot.com/2011/06/storage-speed-on-pandaboard-revisite... [2] http://groups.google.com/group/pandaboard/browse_thread/thread/6dc03bf589769... [3] https://lists.yoctoproject.org/pipermail/poky/2011-June/006649.html
A comment left on my blog [1] pointed me to a Google Groups discussion [2] which pointed to a problem with gcc-4.6.0 [3]. Fedora 13 had gcc-4.4.5, though, and the kernel I'm using was compiled with gcc-4.3.3, so this may not be related.
I saw that, but my kernels have the "good" generated code, yet still have the slow performance. It seems to me that the "bad" generated code results in a USB that just doesn't work at all.