LiLiang found an issue that after adding two ports into a team device with lb mode their enabled option sometimes is false.
It was caused by the unexpected events order:
0. team_port_add() in kernel. 1. port_change event A1 sent to userspace. 2. option_change event B1 sent to userspace. 3. port_change event A2 sent to userspace IF port is up now. 4. process port_change event A1 and set port's enabled option 'false'. 5. option_change event B2 sent to userspace. 6. process option_change event B1 and sync enabled option (value = 1). 7. process port_change event A2 and do nothing as enabled option is 1. 8. process option_change event B2 and sync enabled option (value = 0).
In kernel, when the port is still down after dev_open(), which happens more often since it changed to use netif_oper_up() to check the state instead of netif_carrier_ok(), the event A2 in Step 3 can be sent at any moment. When it's ahead of Step 4, Step 7 won't set enabled option to 1 as Step 8 comes late.
As the port up event can be triggered by dev_watchdog at anytime in kernel, the port_change and option_change events order can not be controlled. What can only be done here is to correct it at Step 8, to sync enabled option to link_up.
So this patch is to add enabled option_changed for lb mode to do this sync.
Reported-by: LiLiang liali@redhat.com Signed-off-by: Xin Long lucien.xin@gmail.com --- teamd/teamd_runner_loadbalance.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
diff --git a/teamd/teamd_runner_loadbalance.c b/teamd/teamd_runner_loadbalance.c index b9bfc13..a581472 100644 --- a/teamd/teamd_runner_loadbalance.c +++ b/teamd/teamd_runner_loadbalance.c @@ -109,12 +109,27 @@ static int lb_event_watch_port_hwaddr_changed(struct teamd_context *ctx, return err; }
+static int lb_event_watch_enabled_option_changed(struct teamd_context *ctx, + struct team_option *option, + void *priv) +{ + struct teamd_port *tdport; + + tdport = teamd_get_port(ctx, team_get_option_port_ifindex(option)); + if (!tdport) + return 0; + + return lb_event_watch_port_link_changed(ctx, tdport, priv); +} + static const struct teamd_event_watch_ops lb_port_watch_ops = { .hwaddr_changed = lb_event_watch_hwaddr_changed, .port_hwaddr_changed = lb_event_watch_port_hwaddr_changed, .port_added = lb_event_watch_port_added, .port_removed = lb_event_watch_port_removed, .port_link_changed = lb_event_watch_port_link_changed, + .option_changed = lb_event_watch_enabled_option_changed, + .option_changed_match_name = "enabled", };
static int lb_init(struct teamd_context *ctx, void *priv)
Mon, Apr 15, 2019 at 10:56:35AM CEST, lucien.xin@gmail.com wrote:
LiLiang found an issue that after adding two ports into a team device with lb mode their enabled option sometimes is false.
It was caused by the unexpected events order:
- team_port_add() in kernel.
- port_change event A1 sent to userspace.
- option_change event B1 sent to userspace.
- port_change event A2 sent to userspace IF port is up now.
- process port_change event A1 and set port's enabled option 'false'.
- option_change event B2 sent to userspace.
- process option_change event B1 and sync enabled option (value = 1).
- process port_change event A2 and do nothing as enabled option is 1.
- process option_change event B2 and sync enabled option (value = 0).
In kernel, when the port is still down after dev_open(), which happens more often since it changed to use netif_oper_up() to check the state instead of netif_carrier_ok(), the event A2 in Step 3 can be sent at any moment. When it's ahead of Step 4, Step 7 won't set enabled option to 1 as Step 8 comes late.
As the port up event can be triggered by dev_watchdog at anytime in kernel, the port_change and option_change events order can not be controlled. What can only be done here is to correct it at Step 8, to sync enabled option to link_up.
So this patch is to add enabled option_changed for lb mode to do this sync.
Reported-by: LiLiang liali@redhat.com Signed-off-by: Xin Long lucien.xin@gmail.com
applied, thanks!
libteam@lists.fedorahosted.org