Re: [PATCH] teamd: Disregard current state when considering port
enablement
by Jiri Pirko
Wed, Nov 13, 2019 at 02:26:47PM CET, petrm(a)mellanox.com wrote:
>On systems where carrier is gained very quickly, there is a race between
>teamd and the kernel that sometimes leads to all team slaves being stuck in
>enabled=false state.
>
>When a port is enslaved to a team device, the kernel sends a netlink
>message marking the port as enabled. teamd's lb_event_watch_port_added()
>calls team_set_port_enabled(false), because link is down at that point. The
>kernel responds with a message marking the port as disabled. At this point,
>there are two outstanding messages: the initial one marking port as
>enabled, and the second one marking it as disabled. teamd has not processed
>either of these.
>
>Next teamd gets the netlink message that sets enabled=true, and updates its
>internal cache accordingly. If at this point ethtool link-watch wakes up,
>teamd considers (in teamd_port_check_enable()) enabling the port. After
>consulting the cache, it concludes the port is already up, and neglects to
>do so. Only then does teamd get the netlink message informing it of setting
>enabled=false.
>
>The problem is that the teamd cache is not synchronous with respect to the
>kernel state. If the carrier takes a while to come up (as is normally the
>case), this is not a problem, because teamd caches up quickly enough. But
>this may not always be the case, and particularly on a simulated system,
>the carrier is gained almost immediately.
>
>Fix this by not suppressing the enablement message.
>
>Signed-off-by: Petr Machata <petrm(a)mellanox.com>
applied. Thanks!
3 years
[libteam PATCH] teamd/lacp: fix segfault due to NULL pointer dereference
by Hangbin Liu
If we set a team0 link down with lacp mode, we will call like
- lacp_port_agg_unselect()
- lacp_switch_agg_lead()
- teamd_log_dbg()
while the new_agg_lead in lacp_switch_agg_lead() may be NULL, then we
will got NULL pointer dereference as we called new_agg_lead->ctx in
new teamd_log_dbg().
Fix it by using agg_lead->ctx, which is safe as we referenced it in function
lacp_switch_agg_lead().
Fixes: f32310b9a5cc ("libteam: wapper teamd_log_dbg with teamd_log_dbgx")
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
teamd/teamd_runner_lacp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/teamd/teamd_runner_lacp.c b/teamd/teamd_runner_lacp.c
index 7d940b3..ec01237 100644
--- a/teamd/teamd_runner_lacp.c
+++ b/teamd/teamd_runner_lacp.c
@@ -634,7 +634,7 @@ static void lacp_switch_agg_lead(struct lacp_port *agg_lead,
struct teamd_port *tdport;
struct lacp_port *lacp_port;
- teamd_log_dbg(new_agg_lead->ctx, "Renaming aggregator %u to %u",
+ teamd_log_dbg(agg_lead->ctx, "Renaming aggregator %u to %u",
lacp_agg_id(agg_lead), lacp_agg_id(new_agg_lead));
if (lacp->selected_agg_lead == agg_lead)
lacp->selected_agg_lead = new_agg_lead;
--
2.19.2
3 years, 8 months
[libteam PATCH] teamd: fix build error in expansion of macro teamd_log_dbgx
by Hangbin Liu
With gcc 8.3 I got the following build error:
In file included from teamd_dbus.c:33:
teamd_dbus.c: In function 'teamd_dbus_init':
teamd.h:54:2: error: expected expression before 'if'
if (val <= ctx->debug) \
^~
teamd.h:57:37: note: in expansion of macro 'teamd_log_dbgx'
#define teamd_log_dbg(ctx, args...) teamd_log_dbgx(ctx, 1, ##args)
^~~~~~~~~~~~~~
teamd_dbus.c:507:2: note: in expansion of macro 'teamd_log_dbg'
teamd_log_dbg(ctx, "dbus: connected to %s with name %s", id,
^~~~~~~~~~~~~
Fix it by adding parentheses and braces around the content.
Fixes: f32310b9a5cc ("libteam: wapper teamd_log_dbg with teamd_log_dbgx")
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
teamd/teamd.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/teamd/teamd.h b/teamd/teamd.h
index 469b769..fb2872e 100644
--- a/teamd/teamd.h
+++ b/teamd/teamd.h
@@ -51,8 +51,7 @@
#define teamd_log_info(args...) daemon_log(LOG_INFO, ##args)
#define teamd_log_dbgx(ctx, val, args...) \
- if (val <= ctx->debug) \
- daemon_log(LOG_DEBUG, ##args)
+ ({ if (val <= ctx->debug) daemon_log(LOG_DEBUG, ##args); })
#define teamd_log_dbg(ctx, args...) teamd_log_dbgx(ctx, 1, ##args)
--
2.19.2
3 years, 8 months
Query regarding ports arp monitoring
by petr wozniak
Hello,
Please allow me to ask the following question regarding ARP monitoring. I have the following configuration - on one side is Mikrotik router with two LTE interfaces and on the second one is Debian Stretch server with Strongswan. Between router and server are two EoIP over IKEv2/IPsec tunnels and on the both sides are the EoIP interfaces put to logical devices in round-robin mode (bond in router’s side and team on server’s side). On Debian server I have created kernel 4.19.0-eoip with EoIP driver from here https://github.com/bbonev/eoip .
When the both LTE interfaces on router are up all is working without problems:
root@eoip:/home/ipsec# cat /etc/team0.conf
{
"device": "team0",
"runner": {"name": "roundrobin"},
"link_watch":{
"name": "arp_ping",
"interval": 100,
"missed_max": 30,
"source_host": "10.50.1.1",
"target_host": "10.50.1.2"
},
"ports": {"eoip57": {}, "eoip58": {}}
}root@eoip:/home/ipsec#
root@eoip:/home/ipsec# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet 10.57.10.1/32 brd 10.57.10.1 scope global lo
valid_lft forever preferred_lft forever
inet 10.58.10.1/32 brd 10.58.10.1 scope global lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:56:63:09 brd ff:ff:ff:ff:ff:ff
inet 10.17.1.55/24 brd 10.17.1.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe56:6309/64 scope link
valid_lft forever preferred_lft forever
3: eoip57@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master team0 state UNKNOWN group default qlen 1000
link/ether ea:49:65:b5:ca:ce brd ff:ff:ff:ff:ff:ff
4: eoip58@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master team0 state UNKNOWN group default qlen 1000
link/ether ea:49:65:b5:ca:ce brd ff:ff:ff:ff:ff:ff
5: team0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 46:86:7c:5b:7c:53 brd ff:ff:ff:ff:ff:ff
inet 10.50.1.1/24 scope global team0
valid_lft forever preferred_lft forever
inet6 fe80::4486:7cff:fe5b:7c53/64 scope link
valid_lft forever preferred_lft forever
root@eoip:/home/ipsec#
Security Associations (2 up, 0 connecting):
test-lte58[24]: ESTABLISHED 9 seconds ago, 10.17.1.55[10.17.1.55]...37.48.60.236[test-lte58.cz]
test-lte58{3}: INSTALLED, TUNNEL, reqid 3, ESP in UDP SPIs: cb2d2ce6_i 0815d80d_o
test-lte58{3}: 10.58.10.1/32 === 10.58.10.2/32
test-lte57[2]: ESTABLISHED 6 minutes ago, 10.17.1.55[10.17.1.55]...37.48.35.104[test-lte57.cz]
test-lte57{2}: INSTALLED, TUNNEL, reqid 2, ESP in UDP SPIs: cb1681bb_i 08519092_o
test-lte57{2}: 10.57.10.1/32 === 10.57.10.2/32
root@eoip:/home/ipsec# ping 10.50.1.2
PING 10.50.1.2 (10.50.1.2) 56(84) bytes of data.
64 bytes from 10.50.1.2: icmp_seq=1 ttl=64 time=39.8 ms
64 bytes from 10.50.1.2: icmp_seq=2 ttl=64 time=37.8 ms
64 bytes from 10.50.1.2: icmp_seq=3 ttl=64 time=41.7 ms
64 bytes from 10.50.1.2: icmp_seq=4 ttl=64 time=38.4 ms
64 bytes from 10.50.1.2: icmp_seq=5 ttl=64 time=45.2 ms
^C
--- 10.50.1.2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4015ms
rtt min/avg/max/mdev = 37.808/40.609/45.252/2.689 ms
root@eoip:/home/ipsec#
When one of LTE interfaces on router’s side is disabled one EoIP over IKEv2/IPsec tunnel is going down and arping on disabled tunnell has no response:
root@eoip:/home/ipsec# ipsec status
Security Associations (1 up, 0 connecting):
test-lte57[2]: ESTABLISHED 20 minutes ago, 10.17.1.55[10.17.1.55]...37.48.35.104[test-lte57.cz]
test-lte57{2}: INSTALLED, TUNNEL, reqid 2, ESP in UDP SPIs: cb1681bb_i 08519092_o
test-lte57{2}: 10.57.10.1/32 === 10.57.10.2/32
root@eoip:/home/ipsec# arping 10.50.1.2 -I eoip57
ARPING 10.50.1.2 from 10.57.10.1 eoip57
Unicast reply from 10.50.1.2 [02:8F:77:A3:CA:D8] 70.984ms
Unicast reply from 10.50.1.2 [02:8F:77:A3:CA:D8] 58.267ms
Unicast reply from 10.50.1.2 [02:8F:77:A3:CA:D8] 51.903ms
^CSent 4 probes (1 broadcast(s))
Received 3 response(s)
root@eoip:/home/ipsec# arping 10.50.1.2 -I eoip58
ARPING 10.50.1.2 from 10.57.10.1 eoip58
^CSent 12 probes (12 broadcast(s))
Received 0 response(s)
root@eoip:/home/ipsec# arping 10.50.1.2 -I team0
ARPING 10.50.1.2 from 10.50.1.1 team0
Unicast reply from 10.50.1.2 [02:8F:77:A3:CA:D8] 37.101ms
Unicast reply from 10.50.1.2 [02:8F:77:A3:CA:D8] 37.653ms
^CSent 5 probes (1 broadcast(s))
Received 2 response(s)
My problem is that the both ports of team0 interface stay up:
root@eoip:/home/ipsec# teamdctl team0 state view -v
setup:
runner: roundrobin
kernel team mode: roundrobin
D-BUS enabled: no
ZeroMQ enabled: no
debug level: 1
daemonized: yes
PID: 536
PID file: /var/run/teamd/team0.pid
ports:
eoip57
ifindex: 3
addr: 82:ae:24:87:56:9c
ethtool link: 0mbit/halfduplex/up
link watches:
link summary: up
instance[link_watch_0]:
name: arp_ping
link: up
down count: 0
source host: 10.50.1.1
target host: 10.50.1.2
interval: 100
missed packets: 0/30
validate_active: no
validate_inactive: no
send_always: no
initial wait: 0
eoip58
ifindex: 4
addr: 82:ae:24:87:56:9c
ethtool link: 0mbit/halfduplex/up
link watches:
link summary: up
instance[link_watch_0]:
name: arp_ping
link: up
down count: 0
source host: 10.50.1.1
target host: 10.50.1.2
interval: 100
missed packets: 1/30 Here is sometimes 1, sometimes 0
validate_active: no
validate_inactive: no
send_always: no
initial wait: 0
root@eoip:/home/ipsec#
root@eoip:/home/ipsec# ping 10.50.1.2
PING 10.50.1.2 (10.50.1.2) 56(84) bytes of data.
64 bytes from 10.50.1.2: icmp_seq=3 ttl=64 time=43.8 ms
64 bytes from 10.50.1.2: icmp_seq=5 ttl=64 time=37.2 ms
64 bytes from 10.50.1.2: icmp_seq=9 ttl=64 time=55.1 ms
64 bytes from 10.50.1.2: icmp_seq=11 ttl=64 time=47.7 ms
^C
--- 10.50.1.2 ping statistics ---
11 packets transmitted, 4 received, 63% packet loss, time 10145ms
rtt min/avg/max/mdev = 37.248/46.010/55.179/6.493 ms
root@eoip:/home/ipsec#
When the second LTE interface is also disabled then both ports are going down:
root@eoip:/home/ipsec# teamdctl team0 state view -v
setup:
runner: roundrobin
kernel team mode: roundrobin
D-BUS enabled: no
ZeroMQ enabled: no
debug level: 1
daemonized: yes
PID: 535
PID file: /var/run/teamd/team0.pid
ports:
eoip57
ifindex: 3
addr: 2e:c3:04:11:56:04
ethtool link: 0mbit/halfduplex/up
link watches:
link summary: down
instance[link_watch_0]:
name: arp_ping
link: down
down count: 1
source host: 10.50.1.1
target host: 10.50.1.2
interval: 100
missed packets: 36/30
validate_active: no
validate_inactive: no
send_always: no
initial wait: 0
eoip58
ifindex: 4
addr: 2e:c3:04:11:56:04
ethtool link: 0mbit/halfduplex/up
link watches:
link summary: down
instance[link_watch_0]:
name: arp_ping
link: down
down count: 1
source host: 10.50.1.1
target host: 10.50.1.2
interval: 100
missed packets: 37/30
validate_active: no
validate_inactive: no
send_always: no
initial wait: 0
root@eoip:/home/ipsec#
May I ask you for help me configure ARP monitoring properly?
Thank you in advance.
Petr
3 years, 9 months
[libteam PATCH] teamd: update ctx->hwaddr after setting ctx->ifindex to new hwaddr
by Hangbin Liu
When we add the first slave to team port, we will update ctx->ifindex
with new hwaddr in function
teamd_event_watch_port_added()
- teamd_hwaddr_check_change(),
But we didn't update the ctx->hwaddr, which will cause the first added
slave set to team's init hwaddr again later. e.g. in the following functions
lacp_port_set_mac()
lb_event_watch_port_added()
ab_hwaddr_policy_same_all_port_added().
The tdport's hwaddr will be reset based on ctx->hwaddr. Fix it by updating
ctx->hwaddr when set ctx->ifindex to new hwaddr.
Note: function teamd_set_hwaddr() is not considered as it will set
ctx->hwaddr_explicit = true.
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
teamd/teamd.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/teamd/teamd.c b/teamd/teamd.c
index 6c47312..9622da1 100644
--- a/teamd/teamd.c
+++ b/teamd/teamd.c
@@ -867,7 +867,7 @@ static int teamd_add_ports(struct teamd_context *ctx)
static int teamd_hwaddr_check_change(struct teamd_context *ctx,
struct teamd_port *tdport)
{
- const char *hwaddr;
+ char *hwaddr;
unsigned char hwaddr_len;
int err;
@@ -885,6 +885,8 @@ static int teamd_hwaddr_check_change(struct teamd_context *ctx,
teamd_log_err("Failed to set team device hardware address.");
return err;
}
+ ctx->hwaddr = hwaddr;
+ ctx->hwaddr_len = hwaddr_len;
return 0;
}
--
2.19.2
3 years, 9 months
[libteam PATCH 0/6] move all teamd_log_dbg to teamd_log_dbgx
by Hangbin Liu
Hi Jiri,
I'm not sure if I should split the patch or just post one directly. Please
tell me if you feel the commit message are repeated and want only one patch.
Recently some users reported that they start to see debug messages in their
syslogs even with daemon_verbosity_level = LOG_INFO and without -g option.
Actually this issue is there at the begining, the user would see the debug
messages if they run teamd with -d option. The reason that most users did
not notice this is because they are using libteam via NetworkManager, and
NetworkManager run libteam in frontend.
But after commit e47d5db53873 ("teamd: add an option to force log
output to stdout, stderr or syslog"), NetworkManager will set
TEAM_LOG_OUTPUT=syslog in the environment. At the same time libdaemon
does not filter log levels if we use syslog(see function daemon_logv in
libdaemon). Then all the users would see the debug messages suddenly and
feels annoying.
And here is the quote for daemon_set_verbosity() from libdaemon/dlog.h
"""
Allows to decide which messages to output on standard output/error
streams. All messages are logged to syslog and this setting does
not influence that.
"""
Since we should not limit how our user(NM) used libteam. And libdaemon
is intend to not filter logs if use syslog. We'd better filter the
debug message ourselves, like via -g option. So I would prefer to
move all teamd_log_dbg to teamd_log_dbgx. After that, the user could
decide whether to enable debug or not by themselves with -g option.
Hangbin Liu (6):
teamd/teamd.c: move teamd_log_dbg to teamd_log_dbgx
teamd/teamd_runner_activebackup.c: move teamd_log_dbg to
teamd_log_dbgx
teamd/teamd_balancer.c: move teamd_log_dbg to teamd_log_dbgx
teamd/teamd_runner_lacp.c: move teamd_log_dbg to teamd_log_dbgx
teamd/teamd_link_watch.c: move teamd_log_dbg to teamd_log_dbgx
teamd: move teamd_log_dbg to teamd_log_dbgx
teamd/teamd.c | 40 ++++++++++++++--------------
teamd/teamd_balancer.c | 21 ++++++++-------
teamd/teamd_dbus.c | 6 ++---
teamd/teamd_hash_func.c | 2 +-
teamd/teamd_link_watch.c | 14 +++++-----
teamd/teamd_lw_arp_ping.c | 12 ++++-----
teamd/teamd_lw_ethtool.c | 4 +--
teamd/teamd_lw_nsna_ping.c | 2 +-
teamd/teamd_lw_psr.c | 12 ++++-----
teamd/teamd_lw_tipc.c | 8 +++---
teamd/teamd_per_port.c | 6 ++---
teamd/teamd_runner_activebackup.c | 18 ++++++-------
teamd/teamd_runner_lacp.c | 44 +++++++++++++++----------------
teamd/teamd_usock.c | 12 ++++-----
teamd/teamd_zmq.c | 10 +++----
15 files changed, 107 insertions(+), 104 deletions(-)
--
2.19.2
3 years, 9 months