[ISSUE] Failed to find "enabled" option.
by Xin Long
I found this err in "loadbalance" and "lacp" runner when adding ports.
It's caused by trying to set "enabled" option in .port_link_changed()
or .port_changed().
When a new port is added, the first 'port changed event' process is
earlier than CMD TEAM_CMD_OPTIONS_GET, in this CMD, all
the options are synchronized from kernel.
It means there's no 'enabled' option yet when calling port_link_changed
in the first 'port changed event' process. In lb_event_watch_port_link_changed
and lacp_event_watch_port_changed, they call teamd_port_check_enable
to set 'enabled' option. this err is triggered.
I'm not sure why teamd_port_check_enable needs to check if
'enabled' option exists. I checked the ab's .port_link_changed(),
it just sets it by calling team_set_port_enabled(), instead of
checking 'enabled' option first.
can we just use team_set_port_enabled to set it directly in
.port(_link)_changed OR improve teamd_port_check_enable
to avoid this err ?
Thanks.
4 years, 2 months
[PATCH] teamd: lacp: update port state according to partner's sync bit
by Hangbin Liu
1) According to 6.4.12 of IEEE 802.1AX-2008, Figure 6-18, if the partner's sync
bit is not set, the lacp port state should be set to EXPIRED.
Function lacpdu_recv() makes the state of the lacp port to PORT_STATE_CURRENT
when the teamd receives any valid lacp packet. Fix it by checking the parter's
sync bit.
2) According to 6.4.15 of IEEE 802.1AX-2008, Figure 6-22, the state that the
port is selected moves MUX state from DETACHED to ATTACHED.
But ATTACHED sate does not mean that the port can send and receive user frames.
COLLECTING_DISTRIBUTION state is the sate that the port can send and receive
user frames. To move MUX state from ATTACHED to COLLECTING_DISTRIBUTION, the
partner state should be sync as well as the port selected.
In function lacp_port_actor_update(), only INFO_STATE_SYNCHRONIZATION should
be set to the actor.state when the port is selected. INFO_STATE_COLLECTING and
INFO_STATE_DISTRIBUTING should be set to false with ATTACHED mode and set to
true when INFO_STATE_SYNCHRONIZATION of partner.state is set.
In function lacp_port_should_be_{enabled, disabled}(), we also need to check
the INFO_STATE_SYNCHRONIZATION bit of partner.state.
Reported-by: Cisco Systems
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
teamd/teamd_runner_lacp.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/teamd/teamd_runner_lacp.c b/teamd/teamd_runner_lacp.c
index 7b8f0a7..5e763ad 100644
--- a/teamd/teamd_runner_lacp.c
+++ b/teamd/teamd_runner_lacp.c
@@ -333,7 +333,8 @@ static int lacp_port_should_be_enabled(struct lacp_port *lacp_port)
struct lacp *lacp = lacp_port->lacp;
if (lacp_port_selected(lacp_port) &&
- lacp_port->agg_lead == lacp->selected_agg_lead)
+ lacp_port->agg_lead == lacp->selected_agg_lead &&
+ lacp_port->partner.state & INFO_STATE_SYNCHRONIZATION)
return true;
return false;
}
@@ -343,7 +344,8 @@ static int lacp_port_should_be_disabled(struct lacp_port *lacp_port)
struct lacp *lacp = lacp_port->lacp;
if (!lacp_port_selected(lacp_port) ||
- lacp_port->agg_lead != lacp->selected_agg_lead)
+ lacp_port->agg_lead != lacp->selected_agg_lead ||
+ !(lacp_port->partner.state & INFO_STATE_SYNCHRONIZATION))
return true;
return false;
}
@@ -914,9 +916,13 @@ static void lacp_port_actor_update(struct lacp_port *lacp_port)
if (lacp_port->lacp->cfg.fast_rate)
state |= INFO_STATE_LACP_TIMEOUT;
if (lacp_port_selected(lacp_port) &&
- lacp_port_agg_selected(lacp_port))
- state |= INFO_STATE_SYNCHRONIZATION |
- INFO_STATE_COLLECTING | INFO_STATE_DISTRIBUTING;
+ lacp_port_agg_selected(lacp_port)) {
+ state |= INFO_STATE_SYNCHRONIZATION;
+ state &= ~(INFO_STATE_COLLECTING | INFO_STATE_DISTRIBUTING);
+ if (lacp_port->partner.state & INFO_STATE_SYNCHRONIZATION)
+ state |= INFO_STATE_COLLECTING |
+ INFO_STATE_DISTRIBUTING;
+ }
if (lacp_port->state == PORT_STATE_EXPIRED)
state |= INFO_STATE_EXPIRED;
if (lacp_port->state == PORT_STATE_DEFAULTED)
@@ -1095,7 +1101,10 @@ static int lacpdu_recv(struct lacp_port *lacp_port)
return err;
}
- err = lacp_port_set_state(lacp_port, PORT_STATE_CURRENT);
+ if (lacp_port->partner.state & INFO_STATE_SYNCHRONIZATION)
+ err = lacp_port_set_state(lacp_port, PORT_STATE_CURRENT);
+ else
+ err = lacp_port_set_state(lacp_port, PORT_STATE_EXPIRED);
if (err)
return err;
--
2.19.2
4 years, 9 months
Re: A question about team port enable/disable
by Jiri Pirko
Sat, Dec 15, 2018 at 11:45:25AM CET, yanmiaobest(a)gmail.com wrote:
>Hi Jiri
>
>This is Miao, I am currently working on a project to implement a new team policy
>based on the kernel team support. Basically, I have lots of 'ports',
>each of them
>has a ID. I'd like to use the ID to calculate the hash and distribute
>the traffic to
>different team slave ports, similar to round-robin.
>
>And when looking into team code, I see the team kernel driver has
>the following comments:
>
>/*
> * Enable/disable port by adding to enabled port hashlist and setting
> * port->index (Might be racy so reader could see incorrect ifindex when
> * processing a flying packet, but that is not a problem). Write guarded
> * by team->lock.
> */
>
>The team datapath runs lockless, and libteam can enable/disble port
>one the fly when traffic happens. For exmaple, I have 4 port and I
>want, port0->slave0,
>port1->slave1, port2->slave2 and port3->slave3, when slave1 device is link
>down state, it becomes port0->slave0, port1->slave2, port2->slave3,
>port3->slave0,
>using the ID%num as the hash function.
>
>But it seems if datapath runs lockless, it could see the wrong ifindex during
>configuration as stated in the comment because the team drvier needs to
>reconstruct the active ports list. So my question is why this is not a problem
>when datapathsees a wrong ifindex, wouldn't that break the defined team
>policy ? Thank you very much for your help.
Yes, it would. For a packet from time to time. Is it a problem?
>
>Regards,
>Miao
4 years, 9 months
[patch libteam] teamd: lw: arp_ping: only check arp reply message
by Hangbin Liu
Currently we check both arp request and reply message for arp_ping link
watch. But if we enabled validate_active and validate_inactive at the
same time, we will receive the other slave's arp request as the switch
broadcasts arp request message. i.e. slave1 receives arp request from
slave2 and vice versa.
Then the arp check will pass even the target is unreachable. Fix it by
only check arp reply message.
Reported-by: LiLiang <liali(a)redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
teamd/teamd_lw_arp_ping.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/teamd/teamd_lw_arp_ping.c b/teamd/teamd_lw_arp_ping.c
index 01cd6e1..81806f0 100644
--- a/teamd/teamd_lw_arp_ping.c
+++ b/teamd/teamd_lw_arp_ping.c
@@ -336,7 +336,8 @@ static int lw_ap_receive(struct lw_psr_port_priv *psr_ppriv)
if (ap.ah.ar_hrd != htons(ll_my.sll_hatype) ||
ap.ah.ar_pro != htons(ETH_P_IP) ||
ap.ah.ar_hln != ll_my.sll_halen ||
- ap.ah.ar_pln != 4) {
+ ap.ah.ar_pln != 4 ||
+ ap.ah.ar_op != htons(ARPOP_REPLY)) {
return 0;
}
--
2.19.2
4 years, 10 months
[patch libteam] teamd: config: update local prio to kernel
by Hangbin Liu
Team port's priority will affect the active port selection. Update the
local config is not enough. We also need to update kernel configs.
Reported-by: Liang Li <liali(a)redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
teamd/teamd_config.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/teamd/teamd_config.c b/teamd/teamd_config.c
index 94158ce..69b25de 100644
--- a/teamd/teamd_config.c
+++ b/teamd/teamd_config.c
@@ -155,6 +155,31 @@ errout:
return err;
}
+static int teamd_config_port_set(struct teamd_context *ctx, const char *port_name,
+ json_t *port_obj)
+{
+ struct teamd_port *tdport;
+ json_t *config;
+ int tmp, err;
+
+ tdport = teamd_get_port_by_ifname(ctx, port_name);
+ if (!tdport)
+ return -ENODEV;
+
+ config = json_object_get(port_obj, "prio");
+ if (json_is_integer(config)) {
+ tmp = json_integer_value(config);
+ err = team_set_port_priority(ctx->th, tdport->ifindex, tmp);
+ if (err) {
+ teamd_log_err("%s: Failed to set \"priority\".",
+ tdport->ifname);
+ return err;
+ }
+ }
+
+ return 0;
+}
+
int teamd_config_port_update(struct teamd_context *ctx, const char *port_name,
const char *json_port_cfg_str)
{
@@ -184,6 +209,13 @@ int teamd_config_port_update(struct teamd_context *ctx, const char *port_name,
if (err)
teamd_log_err("%s: Failed to update existing config "
"port object", port_name);
+ else {
+ err = teamd_config_port_set(ctx, port_name, port_new_obj);
+ if (err)
+ teamd_log_err("%s: Failed to update config to kernel",
+ port_name);
+ }
+
new_port_decref:
json_decref(port_new_obj);
return err;
--
2.19.2
4 years, 10 months
[patch libteam] teamnl: update help message
by Hangbin Liu
Update help message so people could know we support port name directly.
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
utils/teamnl.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/utils/teamnl.c b/utils/teamnl.c
index e8de7e2..c53345d 100644
--- a/utils/teamnl.c
+++ b/utils/teamnl.c
@@ -521,7 +521,9 @@ static void print_help(const char *argv0) {
printf(
"%s [options] teamdevname command [command args]\n"
- "\t-h --help Show this help\n",
+ "\t-h --help Show this help\n"
+ "\t-p --port_name team slave port name\n"
+ "\t-a --array_index team option array index\n",
argv0);
printf("Commands:\n");
for (i = 0; i < CMD_TYPE_COUNT; i++) {
--
2.19.2
4 years, 10 months
Re: [patch libteam] Pass device ID through to super class init.
by Jiri Pirko
Thu, Jan 10, 2019 at 12:37:15PM CET, Paul.D.Smith(a)metaswitch.com wrote:
>Jiri,
>
>Just found some time to look at this again and the bug has existed since release:
>
>042fb5c8818510f18974d63c569699d76d66f64c
>rewrite python wrapper in more object nice style ...
>Jiri Pirko
>Jiri Pirko committed on Nov 21, 2011
>
>In this release you rewrote the file along nice Python class lines and made Team() a subclass of TeamNetDevice() but you failed to pass the teamdev through to the TeamNetDevice superclass __init__() method so this code only works if teamdev is zero (the default).
>
>I don't know how nobody else has hit this (I would hope we're not the first people to actually use this in anger with multiple 'teamdev's!) but it just doesn't work.
>
>There is no way that I will have the time to set up tooling to create patches in the way that you asked for earlier but I have told you the bug, where it is and the fix. FWIW the fix has been running successfully in our code since about October 2018 and we have quite complex networking requirements so we're very confident that it is the correct fix.
>
>Hopefully you will find time to look at this and fix it because it seems silly that users will have to keep applying their own patches for every release if they actually want to use this code.
>
>Sorry I can't be more helpful but 'time is money' of course and this is all the help I can manage to squeeze in.
Are you kidding? Instead of writing lenthty emails, you could fix the
patch and resend. I really don't understand, sorry.
+ I just found that your patch is wrong. You pass "teamdev" which is
ifindex to super - TeamNetDevice constructor which accepts ifindex -
integer.
When you call "Team" constructor, you should not pass ifindex but
interface name. Otherwise you endup in else branch of:
ifindex = self._conv.get_ifindex(teamdev) if teamdev else 0
So basically your patch tries to workaround your own bug.
>
>Regards,
>Paul D Smith.
>
>
>-----Original Message-----
>From: Jiri Pirko <jiri(a)resnulli.us>
>Sent: 14 August 2018 09:49
>To: Paul D. Smith <Paul.D.Smith(a)metaswitch.com>
>Subject: Re: [patch libteam] Pass device ID through to super class init.
>
>Tue, Aug 14, 2018 at 10:29:24AM CEST, Paul.D.Smith(a)metaswitch.com wrote:
>>Jiri,
>>
>>I'm afraid I can't install git-email because this requires an update to my core Git install and my employers like everyone to use the same level of git. However here are some notes on the fix that I sent earlier.
>>
>>Without the patch, we were hitting this bug:
>>
>> APPLICATION BUG: route/link.c:1233:rtnl_link_build_get_request: ifindex or name must be specified
>> python: route/link.c:1233: rtnl_link_build_get_request: Assertion `0' failed.
>> Aborted (core dumped)
>>
>>Once we pass through the correct ifindex as a parameter, all works fine.
>
>I have no doubts that the patch helps
>
>
>>
>>Looking at history via Github, the last change to the code we patched was over 6 years ago in commit 'a7702de3c6b2dfb0e2fc951ab4c6dbe0f5d1a983' and that change looks in no way related to this problem.
>
>So it has to be another patch. Please find it.
>
>>
>>We do have some concern as to how this bug might have been missed for so long and wonder whether we might be somehow misusing the interface but at present we can't see how - it looks like this fix is required and always has been.
>>
>>I hope this gives you enough to go on - sorry we can't bundle this all up in your preferred format.
>
>
>Please repost the patch with the proper description (now it is empty).
>Please fill up "Fixes:"
>If you can do "git send email", at least mimic it (No headers in the email body)
>
>>
>>Regards,
>>Paul D.Smith.
>>
>>-----Original Message-----
>>From: Jiri Pirko <jiri(a)resnulli.us>
>>Sent: 08 August 2018 10:07
>>To: Paul D. Smith <Paul.D.Smith(a)metaswitch.com>
>>Cc: libteam(a)lists.fedorahosted.org
>>Subject: Re: [patch libteam] Pass device ID through to super class init.
>>
>>Wed, Aug 08, 2018 at 11:04:13AM CEST, Paul.D.Smith(a)metaswitch.com wrote:
>>>Jiri,
>>>
>>>I following the instructions as far as, and including, using the 'git format-patch' command but my version of Git does not have the 'git send-email' command so I just cut-n-pasted the patch into an e-mail by hand.
>>
>>Okay. That's the problem. "git send-email" is usually distributed in a separate package. In Fedora it is in "git-email" package.
>>
>>
>>>
>>>The motivation for the change is that we found that the code just didn't work; my understanding is that regardless of which team device we were attempting to program, the superclass init() always attempted to program the '0' team device.
>>>
>>>I'll drop you more info next week when my colleague, who hit this when developing our code and can probably give you a more detailed answer, returns from holiday.
>>
>>Okay. So it is a bugfix. Yeah, please provide this info in the patch description. Also, if you find the offending commit tha actually caused this issue, it is fine to add line similar to this:
>>Fixes: d5a1c8ee9e36 ("utils: add bond2team conversion tool")
>>
>>Please see "git log" and find for "Fixes" to see how is it used.
>>
>>Thanks!
>>
>>
>>>
>>>Regards,
>>>Paul DS.
>>>
>>>
>>>-----Original Message-----
>>>From: Jiri Pirko <jiri(a)resnulli.us>
>>>Sent: 08 August 2018 09:39
>>>To: Paul D. Smith <Paul.D.Smith(a)metaswitch.com>
>>>Cc: libteam(a)lists.fedorahosted.org
>>>Subject: Re: [patch libteam] Pass device ID through to super class init.
>>>
>>>Mon, Aug 06, 2018 at 02:46:30PM CEST, Paul.D.Smith(a)metaswitch.com wrote:
>>>>From 34bccf0221af3f063686d58704d1255eab5f15b9 Mon Sep 17 00:00:00
>>>>2001
>>>>From: "Paul D.Smith" <paul.d.smith(a)metaswitch.com>
>>>>Date: Mon, 6 Aug 2018 13:41:28 +0100
>>>>Subject: [patch libteam] Pass device ID through to super class init.
>>>
>>>These 4 lines does not supposed to be there. Odd. Did you use "git format-patch" and "git send-email"?
>>>
>>>Also, please provide a bit of description of the change. At least some motivation for the change.
>>>
>>>Thanks!
>>>
>>>
>>>>
>>>>Signed-off-by: Paul D.Smith <paul.d.smith(a)metaswitch.com>
>>>>---
>>>>binding/python/team/core.py | 2 +-
>>>>1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>>diff --git a/binding/python/team/core.py
>>>>b/binding/python/team/core.py index 54161bf..e784930 100644
>>>>--- a/binding/python/team/core.py
>>>>+++ b/binding/python/team/core.py
>>>>@@ -348,7 +348,7 @@ class Team(TeamNetDevice):
>>>> if not th:
>>>> raise TeamLibError("Failed to allocate team handle.")
>>>>
>>>>- super(Team, self).__init__(th)
>>>>+ super(Team, self).__init__(th, teamdev)
>>>>
>>>> if isinstance(teamdev, str):
>>>> err = 0
>>>>--
>>>>2.9.2
4 years, 10 months