Re: [PATCH] teamd: Disregard current state when considering port
enablement
by Jiri Pirko
Wed, Nov 13, 2019 at 02:26:47PM CET, petrm(a)mellanox.com wrote:
>On systems where carrier is gained very quickly, there is a race between
>teamd and the kernel that sometimes leads to all team slaves being stuck in
>enabled=false state.
>
>When a port is enslaved to a team device, the kernel sends a netlink
>message marking the port as enabled. teamd's lb_event_watch_port_added()
>calls team_set_port_enabled(false), because link is down at that point. The
>kernel responds with a message marking the port as disabled. At this point,
>there are two outstanding messages: the initial one marking port as
>enabled, and the second one marking it as disabled. teamd has not processed
>either of these.
>
>Next teamd gets the netlink message that sets enabled=true, and updates its
>internal cache accordingly. If at this point ethtool link-watch wakes up,
>teamd considers (in teamd_port_check_enable()) enabling the port. After
>consulting the cache, it concludes the port is already up, and neglects to
>do so. Only then does teamd get the netlink message informing it of setting
>enabled=false.
>
>The problem is that the teamd cache is not synchronous with respect to the
>kernel state. If the carrier takes a while to come up (as is normally the
>case), this is not a problem, because teamd caches up quickly enough. But
>this may not always be the case, and particularly on a simulated system,
>the carrier is gained almost immediately.
>
>Fix this by not suppressing the enablement message.
>
>Signed-off-by: Petr Machata <petrm(a)mellanox.com>
applied. Thanks!
2 years, 8 months
[PATCH] Don't return an error when timerfd socket return 0
by Pavel Shirshov
It is possible to read 0 bytes from timerfd descriptor
despite the fact that descriptor notified poll() that
it has data.
It is possible to see such behaviour on some hardware
platforms.
Solve this by treating such situation as normal.
Signed-off-by: Pavel Shirshov <pavel.contrib(a)gmail.com>
---
teamd/teamd.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/teamd/teamd.c b/teamd/teamd.c
index e035ac5..8cdc16d 100644
--- a/teamd/teamd.c
+++ b/teamd/teamd.c
@@ -265,6 +265,10 @@ static int handle_period_fd(int fd)
teamd_log_err("read() failed.");
return -errno;
}
+ if (ret == 0) {
+ teamd_log_warn("read() for timer_fd returned 0.");
+ return 0;
+ }
if (ret != sizeof(uint64_t)) {
teamd_log_err("read() returned unexpected number of bytes.");
return -EINVAL;
--
2.7.4
3 years, 2 months
[PATCH] Fix ifinfo_link_with_port race condition with newlink
by Shuotian Cheng
When a member port is enslaved into a port channel
immediately after the port channel was created,
it is possible to get member port ifinfo structure
not initialized for the member port because of a race
condition.
The race condition here occurs because order of
following events is not strict:
- adding the member port to the port channel;
- creating ifinfo structure for the member port.
The error message "Failed to link port with ifinfo" is
thrown when a member port is tried to be added to the
team handler's port list before ifinfo structure was
initialized.
To fix this situation ifinfo_find_create() is used
to search member ports ifinfo structure in
ifinfo_link_with_port().
Signed-off-by: Shuotian Cheng <shuche(a)microsoft.com>
Signed-off-by: Pavel Shirshov <pavel.contrib(a)gmail.com>
---
libteam/ifinfo.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/libteam/ifinfo.c b/libteam/ifinfo.c
index 46d56a2..a15788b 100644
--- a/libteam/ifinfo.c
+++ b/libteam/ifinfo.c
@@ -453,7 +453,10 @@ int ifinfo_link_with_port(struct team_handle *th, uint32_t ifindex,
{
struct team_ifinfo *ifinfo;
- ifinfo = ifinfo_find(th, ifindex);
+ if (port)
+ ifinfo = ifinfo_find_create(th, ifindex);
+ else
+ ifinfo = ifinfo_find(th, ifindex);
if (!ifinfo)
return -ENOENT;
if (ifinfo->linked)
--
2.7.4
3 years, 2 months
[PATCH] Fix ifinfo_link_with_port race condition with newlink
by Shuotian Cheng
From: Pavel Shirshov <pavel.contrib(a)gmail.com>
When a member port is enslaved into a port channel
immediately after the port channel was created,
it is possible to get member port ifinfo structure
not initialized for the member port because of a race
condition.
The race condition here occurs because order of
following events is not strict:
- adding the member port to the port channel;
- creating ifinfo structure for the member port.
The error message "Failed to link port with ifinfo" is
thrown when a member port is tried to be added to the
team handler's port list before ifinfo structure was
initialized.
To fix this situation ifinfo_find_create() is used
to search member ports ifinfo structure in
ifinfo_link_with_port().
Signed-off-by: Shuotian Cheng <shuche(a)microsoft.com>
Signed-off-by: Pavel Shirshov <pavelsh(a)microsoft.com>
---
libteam/ifinfo.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/libteam/ifinfo.c b/libteam/ifinfo.c
index 46d56a2..a15788b 100644
--- a/libteam/ifinfo.c
+++ b/libteam/ifinfo.c
@@ -453,7 +453,10 @@ int ifinfo_link_with_port(struct team_handle *th, uint32_t ifindex,
{
struct team_ifinfo *ifinfo;
- ifinfo = ifinfo_find(th, ifindex);
+ if (port)
+ ifinfo = ifinfo_find_create(th, ifindex);
+ else
+ ifinfo = ifinfo_find(th, ifindex);
if (!ifinfo)
return -ENOENT;
if (ifinfo->linked)
--
2.7.4
3 years, 2 months
[PATCH] Fix ifinfo_link_with_port race condition with newlink
by Pavel Shirshov
The race condition could happen like this:
When an interface is enslaved into the port channel
immediately after it is created, the order of creating
the ifinfo and linking the ifinfo to the port is not
guaranteed.
The team handler will listen to both netlink message
to track new links get created to allocate the ifinfo
and add the ifinfo into its linked list, and the team
port change message to link the new port with ifinfo
found in its linkedin list. However, when the ifinfo
is not yet created, the error message "Failed to link
port with ifinfo" is thrown with member port failed
to be added into the team handler's port list.
This fix adds a condition to check if
ifinfo_link_with_port is linking ifinfo to a port or
to the team interface itself. If it is a port,
ifinfo_find_create function is used to fix the race
condition.
Signed-off-by: Shu0T1an ChenG <shuche(a)microsoft.com>
---
libteam/ifinfo.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/libteam/ifinfo.c b/libteam/ifinfo.c
index 46d56a2..a15788b 100644
--- a/libteam/ifinfo.c
+++ b/libteam/ifinfo.c
@@ -453,7 +453,10 @@ int ifinfo_link_with_port(struct team_handle *th, uint32_t ifindex,
{
struct team_ifinfo *ifinfo;
- ifinfo = ifinfo_find(th, ifindex);
+ if (port)
+ ifinfo = ifinfo_find_create(th, ifindex);
+ else
+ ifinfo = ifinfo_find(th, ifindex);
if (!ifinfo)
return -ENOENT;
if (ifinfo->linked)
--
2.7.4
3 years, 2 months
[PATCH] Don't fire an error when timerfd socket read returned 0
by Pavel Shirshov
It is possible to read 0 bytes from timerfd descriptor
despite the fact that descriptor notified poll() that descriptor
is ready. We saw such behaviour on some hardware platforms.
This patch is solving this by treating such situation as normal.
Signed-off-by: Pavel Shirshov <pavel.contrib(a)gmail.com>
---
teamd/teamd.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/teamd/teamd.c b/teamd/teamd.c
index e035ac5..8cdc16d 100644
--- a/teamd/teamd.c
+++ b/teamd/teamd.c
@@ -265,6 +265,10 @@ static int handle_period_fd(int fd)
teamd_log_err("read() failed.");
return -errno;
}
+ if (ret == 0) {
+ teamd_log_warn("read() for timer_fd returned 0.");
+ return 0;
+ }
if (ret != sizeof(uint64_t)) {
teamd_log_err("read() returned unexpected number of bytes.");
return -EINVAL;
--
2.7.4
3 years, 2 months
[patch libteam] teamd/lacp: fix incorrect aggregator selection
by kon5t
No need to `ntohs` values when comparing bytes.
Signed-off-by: kon5t <konst.ru(a)gmail.com>
---
teamd/teamd_runner_lacp.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/teamd/teamd_runner_lacp.c b/teamd/teamd_runner_lacp.c
index ec01237..709008e 100644
--- a/teamd/teamd_runner_lacp.c
+++ b/teamd/teamd_runner_lacp.c
@@ -406,10 +406,7 @@ static void get_lacp_port_prio_info(struct lacp_port *lacp_port,
*prio_info = lacp_port->partner;
/* adjust values for further memcmp comparison */
- prio_info->system_priority = ntohs(prio_info->system_priority);
prio_info->key = 0;
- prio_info->port_priority = ntohs(prio_info->port_priority);
- prio_info->port = ntohs(prio_info->port);
prio_info->state = 0;
}
--
2.7.4
3 years, 2 months