Thanks for your quick reply. After days of investigations, I finally managed
to reproduce the same issue in a repeatable manner.
It seems once the number of TX-OK exceeds ~2,157,000,000 in teamed
interface. TX-DRP will start happening in that particular interface. And
reset the teamed interface (ifdown/ifup to reset the TX-OK back to 0) seems
to resolve the issue temporarily. I've been able to recreate the same issue
multiple times in different machines (all are RHEL 7.2, libteam 1.17). And
in all machines, the TX-DRP is only observed in teamed interface, no TX-DRP
in underlying interfaces.
Please see the following `netstat -i` results I collected from different
Iface MTU RX-OK ERR DRP OVR TX-OK ERR DRP OVR Flg
team0@ip1 1500 2132803208 0 0 0 2159140094 0 67630 0 BMRU
team0@ip2 1500 2143255069 0 0 0 2157767058 0 49719 0 BMRU
team0@ip3 1500 2131552843 0 0 0 2157853127 0 1754 0 BMRU
team0@ip4 1500 2137758098 0 0 0 2158602342 0 1027 0 BMRU
On 8 September 2016 at 04:32, <> wrote:
> Wed, Sep 07, 2016 at 01:52:22PM CEST, alpha.roc(a)gmail.com
> >> Thu, Nov 06, 2014 at 05:44:37PM CET, ingo.brand(a)webbilling.de wrote:
> >> I'll think about it over night and provide you some test script
> >> tomorrow.
> >Hi Jiri,
> >Just wonder if there is any updates on this issue? Currently I also have
> > a machine having exactly same issue - both underline eth have NO Tx drop,
> > but the teamed interface is having around 25% Tx drop.
> >It's a HP server running RHEL 7.2. I'm happy to provide more info if you
> > are still interested in this issue.
> You should contact RH support.
> But yeah, please provide more info.
> > Liang
> >libteam mailing list
> > >https://lists.fedorahosted.org/admin/lists/libteam(a)lists.fedorahosted.org
libteam mailing list
thanks for the information. I had a 2 node cluster machine which
was running rhel 7.0 with teamd round-robin interconnected. the
cluster broke several times due to huge packet loss. I didn't know
what happened, it's a mystery to me.
it never happen again after I upgrade to 7.1 and change round-robin
to failover. but I was worried it's some kind of hardware issue and
will come back again.
now I understand what happened. thanks again for your testing!!