I've hit this bug a few times on f20 with a fairly boring config:
http://lists.xen.org/archives/html/xen-devel/2014-12/msg01045.html
currently:
xen-4.3.3-5.fc20.x86_64 3.17.3-200.fc20.x86_64
One f20 DomU (16GB), one el6 DomU (2GB), i7 desktop w/ 32GB RAM (12GB Dom0). The only thing not completely bog-standard is that my 'physical' interface is on a vlan tag on each bridge, but Dom0 network continues to hum along at the time, so I don't suppose that's a factor.
Symptom is the DomU network just goes away. It appears to be load related (I was doing video encoding over NFS). dmesg says:
[76819.472975] vif vif-2-0 vif2.0: txreq.offset: 8ee, size: 3858, end: 6144 [76819.473012] vif vif-2-0 vif2.0: fatal error; disabling device [76819.482474] brbfc: port 2(vif2.0) entered disabled state
A workaround is to xl save the domU to a checkpoint file (have to use -c and destroy it), then restore it, and things continue happily. I wasn't able to figure out a way to tell Xen to just restart the network device (it appears to be attached and up after Xen decides it's failed).
I'll be applying the 3-line kernel patch here; do we stand any chance of getting something like this cherry picked into the Fedora kernel? It's not upstream as of 3.18:
http://lxr.free-electrons.com/source/drivers/net/xen-netfront.c#L628
I can advocate on xen-devel if needed.
-Bill
On 12/11/14 13:28, Bill McGonigle wrote:
[76819.472975] vif vif-2-0 vif2.0: txreq.offset: 8ee, size: 3858, end: 6144 [76819.473012] vif vif-2-0 vif2.0: fatal error; disabling device [76819.482474] brbfc: port 2(vif2.0) entered disabled state
Followup: a few days later, this patch hit lkml:
https://lkml.org/lkml/2014/12/14/147
The Fedora 3.17.8 kernel definitely has it (in patch-3.17.8.xz).
The problem is in the DomU kernel, so that's where the updates need to happen. 3.17.3 was really giving me headaches lately as a DomU - to the point that I'd have to reboot and hope I could 'xl console' and yum could download the update before the port got disabled. I don't know why the problem go so much more severe this week - the domU and dom0 kernels haven't been updated for a few weeks. I rebooted to deal with some bum memory, so something apparently is different other than the kernels.
Anyway, after updating the DomU kernel to 3.17.8 the networking has been stable so far.
-Bill