Poojan's Tech Blog / Debugging USB 3.0 Interrupts

I’m seeing interrupt storm errors. I’ve also noticed that my USB 3.0 GoFlex 2TB drive (which I have to brag I bought for $67 back in 2009 or so) seems to be intermittently absent from my ZFS pool.

Here’s a vmstat:
root@server ~ >vmstat -i interrupt total rate irq1: atkbd0 3163 0 irq16: ehci0 xhci1 4657985391 7471 irq18: xhci0 305433357 489 irq19: atapci0++ 37971283 60 irq23: ehci1 1281657 2 cpu0:timer 3193627253 5122 irq264: ahci0 9319576 14 irq265: hdac0 100 0 irq266: em0:rx 0 165298952 265 irq267: em0:tx 0 29987834 48 irq268: em0:link 9 0 cpu1:timer 483764624 775 cpu7:timer 546385426 876 cpu5:timer 533264630 855 cpu4:timer 577926662 927 cpu3:timer 663537088 1064 cpu6:timer 534384348 857 cpu2:timer 487291455 781 Total 12227462808 19613

Notice the crazy > 7000 interrupt rate on irq16. Note that one downside of vmstat is that it shows the rate over the uptime of the computer–that is, it does no near-term windowing of the data.

So, I decided to switch the port it’s on just to see if it was an issue with the card not liking the drive, or if it was the drive itself. I was surprised that this drive was connected to the built-in Asmedia controller, and not the add-on NEC/Renesas controller. (I would have thought that the built-in controller would have supported MSI rather than falling back to IRQ.)

I’ll check on this tomorrow and see how it’s doing. Probably won’t be scientific, since I don’t know exactly what usage pattern might exercise the interrupts.

Update 1

Oh, it looks like xhci0 is indeed the add-on card:



xhci0: <NEC uPD720200 USB 3.0 controller> mem 0xf7a00000-0xf7a01fff irq 18 at device 0.0 on pci3

xhci0: 32 byte context size.

usbus1 on xhci0

pcib4: <ACPI PCI-PCI bridge> irq 16 at device 28.4 on pci0

pci4: <ACPI PCI bus> on pcib4

xhci1: <XHCI (generic) USB 3.0 controller> mem 0xf7900000-0xf7907fff irq 16 at device 0.0 on pci4

xhci1: 32 byte context size.

usbus2 on xhci1

So, I made have moved the incorrect hard drive (it was the GoFlex that I moved rather than the Backup Plus). However, it is the GoFlex that’s giving checksum errors reported by the pool:



server# zpool status tank

pool: tank

state: ONLINE

status: One or more devices has experienced an unrecoverable error.  An

attempt was made to correct the error.  Applications are unaffected.

action: Determine if the device needs to be replaced, and clear the errors

using ‘zpool clear’ or replace the device with ‘zpool replace’.

see: http://illumos.org/msg/ZFS-8000-9P

scan: scrub repaired 0 in 0h13m with 0 errors on Fri Oct 25 22:31:32 2013

config:
NAME                 STATE     READ WRITE CKSUM

tank                 ONLINE       0     0     0

raidz1-0           ONLINE       0     0     0

gpt/STBPDESK3TB  ONLINE       0     0     0

gpt/WD20EARS     ONLINE       0     0     0

gpt/GOFLEX2TB    ONLINE       0     0     3

mirror-2           ONLINE       0     0     0

gpt/WD15EARS     ONLINE       0     0     0

gpt/ST1500       ONLINE       0     0     0

cache

gpt/tank_cache0    ONLINE       0     0     0

errors: No known data errors

It should also be noted that I’ve been testing a Mushkin USB3.0 drive connected to the xhci0 port, so that might explain the high (average) interrupt rates on xhci0–I’ve been banging away at it for days.

Update 2

Oh: it looks like irq16 is on xhci1, which is the built-in controller, and the port that the GoFlex 2TB drive was attached to. So, this has nothing to do with testing the Mushkin flash drive, and does point to possible failure of the GoFlex drive.

Be the first to like.

Unlike

Poojan's Tech Blog

Debugging USB 3.0 Interrupts

Update 1

Update 2

Post a Comment

Search

Meta

Blogroll

Top Liked Posts

Your IP Address

Categories