Re: [PATCH] teamd: Disregard current state when considering port
enablement
by Jiri Pirko
Wed, Nov 13, 2019 at 02:26:47PM CET, petrm(a)mellanox.com wrote:
>On systems where carrier is gained very quickly, there is a race between
>teamd and the kernel that sometimes leads to all team slaves being stuck in
>enabled=false state.
>
>When a port is enslaved to a team device, the kernel sends a netlink
>message marking the port as enabled. teamd's lb_event_watch_port_added()
>calls team_set_port_enabled(false), because link is down at that point. The
>kernel responds with a message marking the port as disabled. At this point,
>there are two outstanding messages: the initial one marking port as
>enabled, and the second one marking it as disabled. teamd has not processed
>either of these.
>
>Next teamd gets the netlink message that sets enabled=true, and updates its
>internal cache accordingly. If at this point ethtool link-watch wakes up,
>teamd considers (in teamd_port_check_enable()) enabling the port. After
>consulting the cache, it concludes the port is already up, and neglects to
>do so. Only then does teamd get the netlink message informing it of setting
>enabled=false.
>
>The problem is that the teamd cache is not synchronous with respect to the
>kernel state. If the carrier takes a while to come up (as is normally the
>case), this is not a problem, because teamd caches up quickly enough. But
>this may not always be the case, and particularly on a simulated system,
>the carrier is gained almost immediately.
>
>Fix this by not suppressing the enablement message.
>
>Signed-off-by: Petr Machata <petrm(a)mellanox.com>
applied. Thanks!
3 years
Re: [patch libteam] poll instead of select
by Jiri Pirko
Tue, Jan 28, 2020 at 01:11:00AM CET, jerome99(a)internet.lu wrote:
>The select function cannot be used in application if the application has
>already more than 1024 open files. The select will crash if an file
>descriptor greater or equal than 1023 is monitored.
Okay, how we can come close that?
>
>Signed-off-by: Jerome Freilinger <jerome99(a)internet.lu>
>---
> libteamdctl/cli_usock.c | 25 +++++++++----------------
> 1 file changed, 9 insertions(+), 16 deletions(-)
>
>diff --git a/libteamdctl/cli_usock.c b/libteamdctl/cli_usock.c
>index 0dc97ae..431b12d 100644
>--- a/libteamdctl/cli_usock.c
>+++ b/libteamdctl/cli_usock.c
>@@ -25,6 +25,7 @@
> #include <sys/socket.h>
> #include <unistd.h>
> #include <teamdctl.h>
>+#include <poll.h>
> #include "teamdctl_private.h"
> #include "../teamd/teamd_usock_common.h"
>
>@@ -79,26 +80,18 @@ static int cli_usock_send(int sock, char *msg)
> return 0;
> }
>
>-#define WAIT_SEC (TEAMDCTL_REPLY_TIMEOUT / 1000)
>-#define WAIT_USEC (TEAMDCTL_REPLY_TIMEOUT % 1000 * 1000)
>-
> static int cli_usock_wait_recv(int sock)
> {
>- fd_set rfds;
>- int fdmax;
>- int ret;
>- struct timeval tv;
>+ struct pollfd fds[1];
>+
>+ fds[0].fd = sock;
>+ fds[0].events = POLLIN;
>+ fds[0].revents = 0;
>+ int ret = poll(fds, 1, TEAMDCTL_REPLY_TIMEOUT);
>
>- tv.tv_sec = WAIT_SEC;
>- tv.tv_usec = WAIT_USEC;
>- FD_ZERO(&rfds);
>- FD_SET(sock, &rfds);
>- fdmax = sock + 1;
>- ret = select(fdmax, &rfds, NULL, NULL, &tv);
>- if (ret == -1)
>- return -errno;
>- if (!FD_ISSET(sock, &rfds))
>+ if (ret == 0)
> return -ETIMEDOUT;
>+ else if (ret < 0)
>+ return -errno;
> return 0;
> }
>
>--
>2.20.1
>
3 years, 8 months
[libteam PATCH] teamd/lacp: fix segfault due to NULL pointer dereference
by Hangbin Liu
If we set a team0 link down with lacp mode, we will call like
- lacp_port_agg_unselect()
- lacp_switch_agg_lead()
- teamd_log_dbg()
while the new_agg_lead in lacp_switch_agg_lead() may be NULL, then we
will got NULL pointer dereference as we called new_agg_lead->ctx in
new teamd_log_dbg().
Fix it by using agg_lead->ctx, which is safe as we referenced it in function
lacp_switch_agg_lead().
Fixes: f32310b9a5cc ("libteam: wapper teamd_log_dbg with teamd_log_dbgx")
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
teamd/teamd_runner_lacp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/teamd/teamd_runner_lacp.c b/teamd/teamd_runner_lacp.c
index 7d940b3..ec01237 100644
--- a/teamd/teamd_runner_lacp.c
+++ b/teamd/teamd_runner_lacp.c
@@ -634,7 +634,7 @@ static void lacp_switch_agg_lead(struct lacp_port *agg_lead,
struct teamd_port *tdport;
struct lacp_port *lacp_port;
- teamd_log_dbg(new_agg_lead->ctx, "Renaming aggregator %u to %u",
+ teamd_log_dbg(agg_lead->ctx, "Renaming aggregator %u to %u",
lacp_agg_id(agg_lead), lacp_agg_id(new_agg_lead));
if (lacp->selected_agg_lead == agg_lead)
lacp->selected_agg_lead = new_agg_lead;
--
2.19.2
3 years, 8 months
[libteam PATCH] teamd: fix build error in expansion of macro teamd_log_dbgx
by Hangbin Liu
With gcc 8.3 I got the following build error:
In file included from teamd_dbus.c:33:
teamd_dbus.c: In function 'teamd_dbus_init':
teamd.h:54:2: error: expected expression before 'if'
if (val <= ctx->debug) \
^~
teamd.h:57:37: note: in expansion of macro 'teamd_log_dbgx'
#define teamd_log_dbg(ctx, args...) teamd_log_dbgx(ctx, 1, ##args)
^~~~~~~~~~~~~~
teamd_dbus.c:507:2: note: in expansion of macro 'teamd_log_dbg'
teamd_log_dbg(ctx, "dbus: connected to %s with name %s", id,
^~~~~~~~~~~~~
Fix it by adding parentheses and braces around the content.
Fixes: f32310b9a5cc ("libteam: wapper teamd_log_dbg with teamd_log_dbgx")
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
teamd/teamd.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/teamd/teamd.h b/teamd/teamd.h
index 469b769..fb2872e 100644
--- a/teamd/teamd.h
+++ b/teamd/teamd.h
@@ -51,8 +51,7 @@
#define teamd_log_info(args...) daemon_log(LOG_INFO, ##args)
#define teamd_log_dbgx(ctx, val, args...) \
- if (val <= ctx->debug) \
- daemon_log(LOG_DEBUG, ##args)
+ ({ if (val <= ctx->debug) daemon_log(LOG_DEBUG, ##args); })
#define teamd_log_dbg(ctx, args...) teamd_log_dbgx(ctx, 1, ##args)
--
2.19.2
3 years, 8 months
Query regarding ports arp monitoring
by petr wozniak
Hello,
Please allow me to ask the following question regarding ARP monitoring. I have the following configuration - on one side is Mikrotik router with two LTE interfaces and on the second one is Debian Stretch server with Strongswan. Between router and server are two EoIP over IKEv2/IPsec tunnels and on the both sides are the EoIP interfaces put to logical devices in round-robin mode (bond in router’s side and team on server’s side). On Debian server I have created kernel 4.19.0-eoip with EoIP driver from here https://github.com/bbonev/eoip .
When the both LTE interfaces on router are up all is working without problems:
root@eoip:/home/ipsec# cat /etc/team0.conf
{
"device": "team0",
"runner": {"name": "roundrobin"},
"link_watch":{
"name": "arp_ping",
"interval": 100,
"missed_max": 30,
"source_host": "10.50.1.1",
"target_host": "10.50.1.2"
},
"ports": {"eoip57": {}, "eoip58": {}}
}root@eoip:/home/ipsec#
root@eoip:/home/ipsec# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet 10.57.10.1/32 brd 10.57.10.1 scope global lo
valid_lft forever preferred_lft forever
inet 10.58.10.1/32 brd 10.58.10.1 scope global lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:56:63:09 brd ff:ff:ff:ff:ff:ff
inet 10.17.1.55/24 brd 10.17.1.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe56:6309/64 scope link
valid_lft forever preferred_lft forever
3: eoip57@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master team0 state UNKNOWN group default qlen 1000
link/ether ea:49:65:b5:ca:ce brd ff:ff:ff:ff:ff:ff
4: eoip58@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master team0 state UNKNOWN group default qlen 1000
link/ether ea:49:65:b5:ca:ce brd ff:ff:ff:ff:ff:ff
5: team0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 46:86:7c:5b:7c:53 brd ff:ff:ff:ff:ff:ff
inet 10.50.1.1/24 scope global team0
valid_lft forever preferred_lft forever
inet6 fe80::4486:7cff:fe5b:7c53/64 scope link
valid_lft forever preferred_lft forever
root@eoip:/home/ipsec#
Security Associations (2 up, 0 connecting):
test-lte58[24]: ESTABLISHED 9 seconds ago, 10.17.1.55[10.17.1.55]...37.48.60.236[test-lte58.cz]
test-lte58{3}: INSTALLED, TUNNEL, reqid 3, ESP in UDP SPIs: cb2d2ce6_i 0815d80d_o
test-lte58{3}: 10.58.10.1/32 === 10.58.10.2/32
test-lte57[2]: ESTABLISHED 6 minutes ago, 10.17.1.55[10.17.1.55]...37.48.35.104[test-lte57.cz]
test-lte57{2}: INSTALLED, TUNNEL, reqid 2, ESP in UDP SPIs: cb1681bb_i 08519092_o
test-lte57{2}: 10.57.10.1/32 === 10.57.10.2/32
root@eoip:/home/ipsec# ping 10.50.1.2
PING 10.50.1.2 (10.50.1.2) 56(84) bytes of data.
64 bytes from 10.50.1.2: icmp_seq=1 ttl=64 time=39.8 ms
64 bytes from 10.50.1.2: icmp_seq=2 ttl=64 time=37.8 ms
64 bytes from 10.50.1.2: icmp_seq=3 ttl=64 time=41.7 ms
64 bytes from 10.50.1.2: icmp_seq=4 ttl=64 time=38.4 ms
64 bytes from 10.50.1.2: icmp_seq=5 ttl=64 time=45.2 ms
^C
--- 10.50.1.2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4015ms
rtt min/avg/max/mdev = 37.808/40.609/45.252/2.689 ms
root@eoip:/home/ipsec#
When one of LTE interfaces on router’s side is disabled one EoIP over IKEv2/IPsec tunnel is going down and arping on disabled tunnell has no response:
root@eoip:/home/ipsec# ipsec status
Security Associations (1 up, 0 connecting):
test-lte57[2]: ESTABLISHED 20 minutes ago, 10.17.1.55[10.17.1.55]...37.48.35.104[test-lte57.cz]
test-lte57{2}: INSTALLED, TUNNEL, reqid 2, ESP in UDP SPIs: cb1681bb_i 08519092_o
test-lte57{2}: 10.57.10.1/32 === 10.57.10.2/32
root@eoip:/home/ipsec# arping 10.50.1.2 -I eoip57
ARPING 10.50.1.2 from 10.57.10.1 eoip57
Unicast reply from 10.50.1.2 [02:8F:77:A3:CA:D8] 70.984ms
Unicast reply from 10.50.1.2 [02:8F:77:A3:CA:D8] 58.267ms
Unicast reply from 10.50.1.2 [02:8F:77:A3:CA:D8] 51.903ms
^CSent 4 probes (1 broadcast(s))
Received 3 response(s)
root@eoip:/home/ipsec# arping 10.50.1.2 -I eoip58
ARPING 10.50.1.2 from 10.57.10.1 eoip58
^CSent 12 probes (12 broadcast(s))
Received 0 response(s)
root@eoip:/home/ipsec# arping 10.50.1.2 -I team0
ARPING 10.50.1.2 from 10.50.1.1 team0
Unicast reply from 10.50.1.2 [02:8F:77:A3:CA:D8] 37.101ms
Unicast reply from 10.50.1.2 [02:8F:77:A3:CA:D8] 37.653ms
^CSent 5 probes (1 broadcast(s))
Received 2 response(s)
My problem is that the both ports of team0 interface stay up:
root@eoip:/home/ipsec# teamdctl team0 state view -v
setup:
runner: roundrobin
kernel team mode: roundrobin
D-BUS enabled: no
ZeroMQ enabled: no
debug level: 1
daemonized: yes
PID: 536
PID file: /var/run/teamd/team0.pid
ports:
eoip57
ifindex: 3
addr: 82:ae:24:87:56:9c
ethtool link: 0mbit/halfduplex/up
link watches:
link summary: up
instance[link_watch_0]:
name: arp_ping
link: up
down count: 0
source host: 10.50.1.1
target host: 10.50.1.2
interval: 100
missed packets: 0/30
validate_active: no
validate_inactive: no
send_always: no
initial wait: 0
eoip58
ifindex: 4
addr: 82:ae:24:87:56:9c
ethtool link: 0mbit/halfduplex/up
link watches:
link summary: up
instance[link_watch_0]:
name: arp_ping
link: up
down count: 0
source host: 10.50.1.1
target host: 10.50.1.2
interval: 100
missed packets: 1/30 Here is sometimes 1, sometimes 0
validate_active: no
validate_inactive: no
send_always: no
initial wait: 0
root@eoip:/home/ipsec#
root@eoip:/home/ipsec# ping 10.50.1.2
PING 10.50.1.2 (10.50.1.2) 56(84) bytes of data.
64 bytes from 10.50.1.2: icmp_seq=3 ttl=64 time=43.8 ms
64 bytes from 10.50.1.2: icmp_seq=5 ttl=64 time=37.2 ms
64 bytes from 10.50.1.2: icmp_seq=9 ttl=64 time=55.1 ms
64 bytes from 10.50.1.2: icmp_seq=11 ttl=64 time=47.7 ms
^C
--- 10.50.1.2 ping statistics ---
11 packets transmitted, 4 received, 63% packet loss, time 10145ms
rtt min/avg/max/mdev = 37.248/46.010/55.179/6.493 ms
root@eoip:/home/ipsec#
When the second LTE interface is also disabled then both ports are going down:
root@eoip:/home/ipsec# teamdctl team0 state view -v
setup:
runner: roundrobin
kernel team mode: roundrobin
D-BUS enabled: no
ZeroMQ enabled: no
debug level: 1
daemonized: yes
PID: 535
PID file: /var/run/teamd/team0.pid
ports:
eoip57
ifindex: 3
addr: 2e:c3:04:11:56:04
ethtool link: 0mbit/halfduplex/up
link watches:
link summary: down
instance[link_watch_0]:
name: arp_ping
link: down
down count: 1
source host: 10.50.1.1
target host: 10.50.1.2
interval: 100
missed packets: 36/30
validate_active: no
validate_inactive: no
send_always: no
initial wait: 0
eoip58
ifindex: 4
addr: 2e:c3:04:11:56:04
ethtool link: 0mbit/halfduplex/up
link watches:
link summary: down
instance[link_watch_0]:
name: arp_ping
link: down
down count: 1
source host: 10.50.1.1
target host: 10.50.1.2
interval: 100
missed packets: 37/30
validate_active: no
validate_inactive: no
send_always: no
initial wait: 0
root@eoip:/home/ipsec#
May I ask you for help me configure ARP monitoring properly?
Thank you in advance.
Petr
3 years, 9 months