Sat, Dec 15, 2018 at 11:45:25AM CET, yanmiaobest@gmail.com wrote:
Hi Jiri
This is Miao, I am currently working on a project to implement a new team policy based on the kernel team support. Basically, I have lots of 'ports', each of them has a ID. I'd like to use the ID to calculate the hash and distribute the traffic to different team slave ports, similar to round-robin.
And when looking into team code, I see the team kernel driver has the following comments:
/*
- Enable/disable port by adding to enabled port hashlist and setting
- port->index (Might be racy so reader could see incorrect ifindex when
- processing a flying packet, but that is not a problem). Write guarded
- by team->lock.
*/
The team datapath runs lockless, and libteam can enable/disble port one the fly when traffic happens. For exmaple, I have 4 port and I want, port0->slave0, port1->slave1, port2->slave2 and port3->slave3, when slave1 device is link down state, it becomes port0->slave0, port1->slave2, port2->slave3, port3->slave0, using the ID%num as the hash function.
But it seems if datapath runs lockless, it could see the wrong ifindex during configuration as stated in the comment because the team drvier needs to reconstruct the active ports list. So my question is why this is not a problem when datapathsees a wrong ifindex, wouldn't that break the defined team policy ? Thank you very much for your help.
Yes, it would. For a packet from time to time. Is it a problem?
Regards, Miao
On Tue, 15 Jan 2019 at 05:47, Jiri Pirko jiri@resnulli.us wrote:
The team datapath runs lockless, and libteam can enable/disble port one the fly when traffic happens. For exmaple, I have 4 port and I want, port0->slave0, port1->slave1, port2->slave2 and port3->slave3, when slave1 device is link down state, it becomes port0->slave0, port1->slave2, port2->slave3, port3->slave0, using the ID%num as the hash function.
But it seems if datapath runs lockless, it could see the wrong ifindex
during
configuration as stated in the comment because the team drvier needs to reconstruct the active ports list. So my question is why this is not a
problem
when datapathsees a wrong ifindex, wouldn't that break the defined team policy ? Thank you very much for your help.
Yes, it would. For a packet from time to time. Is it a problem?
Hi Jiri,
Recently one of our customer found packets are sent before a port was enabled with lacp. Do you think if there is any relation with the lockles datapath (although I doubt)?
Here is the debug message, you can see eth0 up at 10:17:49, enabled at 10:17:51
Jan 8 10:16:54 localhost teamd: eth1: lacp info state: 0x3D. Jan 8 10:17:20 localhost teamd: eth0: lacp info state: 0x3D. Jan 8 10:17:24 localhost teamd: eth1: lacp info state: 0x3D. Jan 8 10:17:27 localhost teamd: eth0: Setting periodic timer to "fast". Jan 8 10:17:27 localhost teamd: eth0: Disabling port Jan 8 10:17:27 localhost teamd: eth0: Changed port state: "current" -> "expired" Jan 8 10:17:27 localhost teamd: eth0: lacp info state: 0x8D. Jan 8 10:17:27 localhost teamd: eth0: lacp info state: 0x8D. Jan 8 10:17:27 localhost teamd: <port_list> Jan 8 10:17:27 localhost teamd: 3: eth1: up 10000Mbit FD Jan 8 10:17:27 localhost teamd: *2: eth0: down 0Mbit HD Jan 8 10:17:27 localhost teamd: </port_list> Jan 8 10:17:27 localhost teamd: eth0: Setting periodic timer to "slow". Jan 8 10:17:27 localhost teamd: eth0: Changed port state: "expired" -> "disabled" Jan 8 10:17:27 localhost teamd: eth0: Unselecting LACP port Jan 8 10:17:27 localhost teamd: eth0: LACP port unselected from aggregator 3 Jan 8 10:17:27 localhost teamd: eth0: lacp info state: 0x05. Jan 8 10:17:27 localhost teamd: eth0: ethtool-link went down. Jan 8 10:17:49 localhost teamd: <port_list> Jan 8 10:17:49 localhost teamd: 3: eth1: up 10000Mbit FD Jan 8 10:17:49 localhost teamd: *2: eth0: up 10000Mbit FD Jan 8 10:17:49 localhost teamd: </port_list> Jan 8 10:17:49 localhost teamd: eth0: Setting periodic timer to "fast". Jan 8 10:17:49 localhost teamd: eth0: Changed port state: "disabled" -> "expired" Jan 8 10:17:49 localhost teamd: eth0: lacp info state: 0x85. Jan 8 10:17:49 localhost teamd: eth0: lacp info state: 0x85. Jan 8 10:17:50 localhost teamd: eth0: lacp info state: 0x85. Jan 8 10:17:51 localhost teamd: eth0: lacp info state: 0x85. Jan 8 10:17:51 localhost teamd: eth0: Changed port state: "expired" -> "current" Jan 8 10:17:51 localhost teamd: eth0: Selecting LACP port Jan 8 10:17:51 localhost teamd: eth0: LACP port selected into aggregator 3 Jan 8 10:17:51 localhost teamd: eth0: Enabling port Jan 8 10:17:51 localhost teamd: eth0: lacp info state: 0x3D. Jan 8 10:17:52 localhost teamd: eth0: ethtool-link went up. Jan 8 10:17:52 localhost teamd: eth0: lacp info state: 0x3D. Jan 8 10:17:52 localhost teamd: eth0: Setting periodic timer to "slow". Jan 8 10:17:52 localhost teamd: eth0: lacp info state: 0x3D. Jan 8 10:17:54 localhost teamd: eth1: lacp info state: 0x3D. Jan 8 10:18:22 localhost teamd: eth0: lacp info state: 0x3D.
While from pcap file, eth0 started to send echo reply immediately after link up, before enabled (10:17:50 - 10:17:51 in pcap file), which is weird.
Note: the libteam that customer used has applied my lacp patch(posted to libteam list), which will enable port until partner enter Sync state.
10:17:25.221928 172.25.200.130 → 172.25.200.133 ICMP 102 Echo (ping) request id=0x79c4, seq=11/2816, ttl=64 10:17:25.221944 172.25.200.133 → 172.25.200.130 ICMP 98 Echo (ping) reply id=0x79c4, seq=11/2816, ttl=64 (request in 27) 10:17:26.222944 172.25.200.130 → 172.25.200.133 ICMP 102 Echo (ping) request id=0x79c4, seq=12/3072, ttl=64 10:17:26.222961 172.25.200.133 → 172.25.200.130 ICMP 98 Echo (ping) reply id=0x79c4, seq=12/3072, ttl=64 (request in 29) 10:17:27.143002 50:2f:a8:1b:4c:7c → 01:80:c2:00:00:02 LACP 128 v1 ACTOR 00:23:04:ee:be:68 P: 16644 K: 32773 *****GSA PARTNER cc:16:7e:91:0b:6c P: 2 K: 0 **DCSG*A 10:17:27.143783 cc:16:7e:91:0b:6c → 01:80:c2:00:00:02 LACP 124 v1 ACTOR cc:16:7e:91:0b:6c P: 2 K: 0 E***SG*A PARTNER 00:23:04:ee:be:68 P: 16644 K: 32773 *****GSA 10:17:27.226957 172.25.200.133 → 172.25.200.130 ICMP 98 Echo (ping) reply id=0x79c4, seq=13/3328, ttl=64 10:17:27.262630 172.25.200.133 → 172.25.200.178 ICMP 42 Echo (ping) reply id=0x3cb0, seq=0/0, ttl=64 10:17:49.766487 cc:16:7e:91:0b:6c → 01:80:c2:00:00:02 LACP 124 v1 ACTOR cc:16:7e:91:0b:6c P: 2 K: 0 E****G*A PARTNER 00:00:00:00:00:00 P: 0 K: 0 ******S* 10:17:49.766828 50:2f:a8:1b:4c:7c → 01:80:c2:00:00:02 LACP 128 v1 ACTOR 00:23:04:ee:be:68 P: 16644 K: 32773 *****GSA PARTNER cc:16:7e:91:0b:6c P: 2 K: 0 E****G*A 10:17:50.263981 172.25.200.133 → 172.25.200.130 ICMP 98 Echo (ping) reply id=0x79c4, seq=36/9216, ttl=64 10:17:50.766452 cc:16:7e:91:0b:6c → 01:80:c2:00:00:02 LACP 124 v1 ACTOR cc:16:7e:91:0b:6c P: 2 K: 0 E****G*A PARTNER 00:23:04:ee:be:68 P: 16644 K: 32773 *****GSA 10:17:51.265952 172.25.200.133 → 172.25.200.130 ICMP 98 Echo (ping) reply id=0x79c4, seq=37/9472, ttl=64 10:17:51.766699 cc:16:7e:91:0b:6c → 01:80:c2:00:00:02 LACP 124 v1 ACTOR cc:16:7e:91:0b:6c P: 2 K: 0 E****G*A PARTNER 00:23:04:ee:be:68 P: 16644 K: 32773 *****GSA 10:17:51.767298 50:2f:a8:1b:4c:7c → 01:80:c2:00:00:02 LACP 128 v1 ACTOR 00:23:04:ee:be:68 P: 16644 K: 32773 ****SGSA PARTNER cc:16:7e:91:0b:6c P: 2 K: 0 E****G*A 10:17:52.265954 172.25.200.133 → 172.25.200.130 ICMP 98 Echo (ping) reply id=0x79c4, seq=38/9728, ttl=64 10:17:52.766459 cc:16:7e:91:0b:6c → 01:80:c2:00:00:02 LACP 124 v1 ACTOR cc:16:7e:91:0b:6c P: 2 K: 0 **DCSG*A PARTNER 00:23:04:ee:be:68 P: 16644 K: 32773 ****SGSA 10:17:52.814804 50:2f:a8:1b:4c:7c → 01:80:c2:00:00:02 LACP 128 v1 ACTOR 00:23:04:ee:be:68 P: 16644 K: 32773 ***CSGSA PARTNER cc:16:7e:91:0b:6c P: 2 K: 0 **DCSG*A 10:17:52.822757 50:2f:a8:1b:4c:7c → 01:80:c2:00:00:02 LACP 128 v1 ACTOR 00:23:04:ee:be:68 P: 16644 K: 32773 **DCSG*A PARTNER cc:16:7e:91:0b:6c P: 2 K: 0 **DCSG*A 10:17:52.822814 cc:16:7e:91:0b:6c → 01:80:c2:00:00:02 LACP 124 v1 ACTOR cc:16:7e:91:0b:6c P: 2 K: 0 **DCSG*A PARTNER 00:23:04:ee:be:68 P: 16644 K: 32773 **DCSG*A 10:17:53.265957 172.25.200.133 → 172.25.200.130 ICMP 98 Echo (ping) reply id=0x79c4, seq=39/9984, ttl=64 10:17:54.265952 172.25.200.130 → 172.25.200.133 ICMP 102 Echo (ping) request id=0x79c4, seq=40/10240, ttl=64 10:17:54.265968 172.25.200.133 → 172.25.200.130 ICMP 98 Echo (ping) reply id=0x79c4, seq=40/10240, ttl=64 (request in 48) 10:17:55.270934 172.25.200.130 → 172.25.200.133 ICMP 102 Echo (ping) request id=0x79c4, seq=41/10496, ttl=64 10:17:55.270950 172.25.200.133 → 172.25.200.130 ICMP 98 Echo (ping) reply id=0x79c4, seq=41/10496, ttl=64 (request in 50)
The only possibility we come up with is that it may caused by the team_queue_override_transmit(), So I gave customer a patch
--- a/drivers/net/team/team.c +++ b/drivers/net/team/team.c @@ -811,8 +811,11 @@ static bool team_queue_override_transmit(struct team *team, struct sk_buff *skb) return false; qom_list = __team_get_qom_list(team, skb->queue_mapping); list_for_each_entry_rcu(port, qom_list, qom_list) { - if (!team_dev_queue_xmit(team, port, skb)) + if (team_port_enabled(port) && + !team_dev_queue_xmit(team, port, skb)) return true; } return false; }
But I don't have much confidence. Do you have any other idea?
Thanks Hangbin
On Fri, 1 Feb 2019 at 21:44, Hangbin Liu liuhangbin@gmail.com wrote:
--- a/drivers/net/team/team.c +++ b/drivers/net/team/team.c @@ -811,8 +811,11 @@ static bool team_queue_override_transmit(struct team *team, struct sk_buff *skb) return false; qom_list = __team_get_qom_list(team, skb->queue_mapping); list_for_each_entry_rcu(port, qom_list, qom_list) {
if (!team_dev_queue_xmit(team, port, skb))
if (team_port_enabled(port) &&
!team_dev_queue_xmit(team, port, skb)) return true; } return false;
}
Just confirmed the issue is not cause by this code. I will give an update after get more info. Sorry to bother you.
Thanks Hangbin
Just confirmed the issue is not cause by this code. I will give an update
after get more info.
Sorry to bother you.
Update: This issue was caused by teaming PFs and VFs .
The customer has a topology like
eth0 (PF link that's running LACP) --> vf0 (VF that use loadbalance mode) eth1 (physical link that's running LACP) --> vf1 (VF that use loadbalance mode)
After eth0/vf0 down and up. eth0 enters lacp negotiate mode and not active yet. But vf0 is up immediately. So vf0 will be active(cause it use loadbalance mode) and starts sending packets. In the end, these packets are sent from eth0 as it's PF device.
To avoid vf0 active before eth0 finishing LACP negotiation. The customer use 'teamnl -p eth0 team0 getoption enabled' to detect eth0's state and setup vf0 manually. This method works. But I'm not sure if we have a gentle/auto ways.
Thanks Hangbin
libteam@lists.fedorahosted.org