db-koji01 slowness

Tue Apr 14 17:23:25 UTC 2015

So, last weekend we rebooted bvirthost09 and db-koji01. 

It helped somewhat.
Database dumps are back to a reasonable few hours. 

However, it's still got high load and occasionally alerts and also now
it's sometimes causing builders to stop talking to the hub. (They
timeout and just stop checking in). 

I've asked netapp folks to look and see if they can see any problems
with the iscsi lun that guest is on, but they say they are not aware of
any issues. 

I do see some packet dropping on db-koji01: 

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 52:54:00:06:90:a4 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    65269562146 369724151 0       128685  0       0      
    TX: bytes  packets  errors  dropped carrier collsns 
    395224051163 377221287 0       0       0       0    

My only ideas at this point: 

a) run another postgresql vacuum analyze. Perhaps the first one made
some poor choices and another one would make things happier. In any
case that shouldn't make things any worse. 

b) Switch the network card on db-koji01 to e1000 instead of virtio-net.
This really shouldn't be needed, but perhaps we are hitting some weird
virtio-net bug. This would require a short outage. 

c) Some other brilliant idea. ;) 

kevin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.fedoraproject.org/pipermail/infrastructure/attachments/20150414/2e46d05b/attachment.sig>