Re: [PATCH] teamd: Disregard current state when considering port
enablement
by Jiri Pirko
Wed, Nov 13, 2019 at 02:26:47PM CET, petrm(a)mellanox.com wrote:
>On systems where carrier is gained very quickly, there is a race between
>teamd and the kernel that sometimes leads to all team slaves being stuck in
>enabled=false state.
>
>When a port is enslaved to a team device, the kernel sends a netlink
>message marking the port as enabled. teamd's lb_event_watch_port_added()
>calls team_set_port_enabled(false), because link is down at that point. The
>kernel responds with a message marking the port as disabled. At this point,
>there are two outstanding messages: the initial one marking port as
>enabled, and the second one marking it as disabled. teamd has not processed
>either of these.
>
>Next teamd gets the netlink message that sets enabled=true, and updates its
>internal cache accordingly. If at this point ethtool link-watch wakes up,
>teamd considers (in teamd_port_check_enable()) enabling the port. After
>consulting the cache, it concludes the port is already up, and neglects to
>do so. Only then does teamd get the netlink message informing it of setting
>enabled=false.
>
>The problem is that the teamd cache is not synchronous with respect to the
>kernel state. If the carrier takes a while to come up (as is normally the
>case), this is not a problem, because teamd caches up quickly enough. But
>this may not always be the case, and particularly on a simulated system,
>the carrier is gained almost immediately.
>
>Fix this by not suppressing the enablement message.
>
>Signed-off-by: Petr Machata <petrm(a)mellanox.com>
applied. Thanks!
2 years, 8 months
[libteam PATCH] teamd/lacp: silence ignore none LACP frames
by Hangbin Liu
According to 802.3, Annex 43B, section 4, aside from LACP, the Slow
Protocol linktype is also to be used by other protocols, like 0x02 for
LAMP, 0x03 for OAM. So let's only check LACP frames. For none LACP
protocols, just silence ignore.
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
teamd/teamd_runner_lacp.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/teamd/teamd_runner_lacp.c b/teamd/teamd_runner_lacp.c
index 11d02f1..9437f05 100644
--- a/teamd/teamd_runner_lacp.c
+++ b/teamd/teamd_runner_lacp.c
@@ -96,17 +96,27 @@ static void lacpdu_init(struct lacpdu *lacpdu)
static bool lacpdu_check(struct lacpdu *lacpdu)
{
+ /*
+ * According to Annex 43B, section 4, aside from LACP, the Slow
+ * Protocol linktype is also to be used by other protocols, like
+ * 0x02 for LAMP, 0x03 for OAM. So for none LACP protocols, just
+ * silence ignore.
+ */
+ if (lacpdu->subtype != 0x01)
+ return false;
+
/*
* According to 43.4.12 version_number, tlv_type and reserved fields
* should not be checked.
*/
- if (lacpdu->subtype != 0x01 ||
- lacpdu->actor_info_len != 0x14 ||
+ if (lacpdu->actor_info_len != 0x14 ||
lacpdu->partner_info_len != 0x14 ||
lacpdu->collector_info_len != 0x10 ||
- lacpdu->terminator_info_len != 0x00)
+ lacpdu->terminator_info_len != 0x00) {
+ teamd_log_warn("malformed LACP PDU came.");
return false;
+ }
return true;
}
@@ -1088,10 +1098,8 @@ static int lacpdu_recv(struct lacp_port *lacp_port)
if (!teamd_port_present(lacp_port->ctx, lacp_port->tdport))
return 0;
- if (!lacpdu_check(&lacpdu)) {
- teamd_log_warn("malformed LACP PDU came.");
+ if (!lacpdu_check(&lacpdu))
return 0;
- }
/* Check if we have correct info about the other side */
if (memcmp(&lacpdu.actor, &lacp_port->partner,
--
2.25.4
2 years, 11 months