Hi Jiri,
Thanks for your quick reply. After days of investigations, I finally managed to reproduce the same issue in a repeatable manner.
It seems once the number of TX-OK exceeds ~2,157,000,000 in teamed interface. TX-DRP will start happening in that particular interface. And reset the teamed interface (ifdown/ifup to reset the TX-OK back to 0) seems to resolve the issue temporarily. I've been able to recreate the same issue multiple times in different machines (all are RHEL 7.2, libteam 1.17). And in all machines, the TX-DRP is only observed in teamed interface, no TX-DRP in underlying interfaces.
Please see the following `netstat -i` results I collected from different machines:
Iface MTU RX-OK ERR DRP OVR TX-OK ERR DRP OVR Flg team0@ip1 1500 2132803208 0 0 0 2159140094 0 67630 0 BMRU team0@ip2 1500 2143255069 0 0 0 2157767058 0 49719 0 BMRU team0@ip3 1500 2131552843 0 0 0 2157853127 0 1754 0 BMRU team0@ip4 1500 2137758098 0 0 0 2158602342 0 1027 0 BMRU
Regards, Liang
On 8 September 2016 at 04:32, <> wrote:
Wed, Sep 07, 2016 at 01:52:22PM CEST, alpha.roc(a)gmail.com wrote:
Thu, Nov 06, 2014 at 05:44:37PM CET, ingo.brand(a)webbilling.de wrote:
I'll think about it over night and provide you some test script tomorrow.
Hi Jiri,
Just wonder if there is any updates on this issue? Currently I also have
a machine having exactly same issue - both underline eth have NO Tx drop, but the teamed interface is having around 25% Tx drop.
It's a HP server running RHEL 7.2. I'm happy to provide more info if you
are still interested in this issue.
You should contact RH support.
But yeah, please provide more info.
Thanks, Liang _______________________________________________ libteam mailing list libteam(a)lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/libteam(a)
lists.fedorahosted.org
2016-09-15 22:13 GMT+08:00 Liang Zhao alpha.roc@gmail.com:
Hi Jiri,
Thanks for your quick reply. After days of investigations, I finally managed to reproduce the same issue in a repeatable manner.
It seems once the number of TX-OK exceeds ~2,157,000,000 in teamed interface. TX-DRP will start happening in that particular interface. And reset the teamed interface (ifdown/ifup to reset the TX-OK back to 0) seems to resolve the issue temporarily. I've been able to recreate the same issue multiple times in different machines (all are RHEL 7.2, libteam 1.17). And in all machines, the TX-DRP is only observed in teamed interface, no TX-DRP in underlying interfaces.
Please see the following `netstat -i` results I collected from different machines:
Iface MTU RX-OK ERR DRP OVR TX-OK ERR DRP OVR Flg team0@ip1 1500 2132803208 0 0 0 2159140094 0 67630 0 BMRU team0@ip2 1500 2143255069 0 0 0 2157767058 0 49719 0 BMRU team0@ip3 1500 2131552843 0 0 0 2157853127 0 1754 0 BMRU team0@ip4 1500 2137758098 0 0 0 2158602342 0 1027 0 BMRU
Regards, Liang
On 8 September 2016 at 04:32, <> wrote:
Wed, Sep 07, 2016 at 01:52:22PM CEST, alpha.roc(a)gmail.com wrote:
Thu, Nov 06, 2014 at 05:44:37PM CET, ingo.brand(a)webbilling.de wrote:
I'll think about it over night and provide you some test script tomorrow.
Hi Jiri,
Just wonder if there is any updates on this issue? Currently I also have a machine having exactly same issue - both underline eth have NO Tx drop, but the teamed interface is having around 25% Tx drop.
It's a HP server running RHEL 7.2. I'm happy to provide more info if you are still interested in this issue.
You should contact RH support.
But yeah, please provide more info.
Thanks, Liang _______________________________________________ libteam mailing list libteam(a)lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/libteam(a)lists.fedorahosted.org
libteam mailing list libteam@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/libteam@lists.fedorahosted.org
hi: thanks for the information. I had a 2 node cluster machine which was running rhel 7.0 with teamd round-robin interconnected. the cluster broke several times due to huge packet loss. I didn't know what happened, it's a mystery to me.
it never happen again after I upgrade to 7.1 and change round-robin to failover. but I was worried it's some kind of hardware issue and will come back again.
now I understand what happened. thanks again for your testing!!
Regards, tbskyd
libteam@lists.fedorahosted.org