Branch: refs/heads/master
Home:
https://github.com/jpirko/libteam
Commit: 046fb6ba0aec8246075b18d787daec43201566fa
https://github.com/jpirko/libteam/commit/046fb6ba0aec8246075b18d787daec43...
Author: Antti Tiainen <atiainen(a)forcepoint.com>
Date: 2017-02-06 (Mon, 06 Feb 2017)
Changed paths:
M libteam/ifinfo.c
M libteam/libteam.c
M libteam/team_private.h
Log Message:
-----------
libteam: resynchronize ifinfo after lost RTNLGRP_LINK notifications
When there's a large number of interfaces (e.g. vlans), teamd loses
link notifications as it cannot read them as fast as kernel is
broadcasting them. This often prevents teamd starting properly if
started concurrently when other links are being set up. It can also
fail when it's up and running, especially in the cases where the team
device itself has a lot of vlans under it.
This can easily be reproduces by simple example (in SMP system) by
manually adding team device with a bunch of vlans, putting it up,
and starting teamd with --take-over option:
root@debian:~# ip link add name team0 type team
root@debian:~# for i in `seq 100 150` ; do
ip link add link team0 name team0.$i type vlan id $i ; done
root@debian:~# ip link set team0 up
root@debian:~# cat teamd.conf
{
"device": "team0",
"runner": {
"name": "activebackup"
},
"ports": {
"eth1": {},
"eth2": {}
}
}
root@debian:~# teamd -o -N -f teamd.conf
At this point, teamd will not give any error messages or other
indication that something is wrong. But state will not look healthy:
root@debian:~# teamdctl team0 state
setup:
runner: activebackup
ports:
eth1
link watches:
link summary: up
instance[link_watch_0]:
name: ethtool
link: up
down count: 0
Failed to parse JSON port dump.
command call failed (Invalid argument)
If checking state dump, it will show that port eth2 is missing info.
Running strace to teamd will reveal that there's one recvmsgs() that
returned -1 with errno ENOBUFS. What happened in this example was
that when teamd started, all vlans got carrier up, and kernel flooded
notifications faster than teamd could read them. It then lost events
related to port eth2 getting enslaved and up.
The socket that joins RTNLGRP_LINK notifications uses default libnl
32k buffer size. Netlink messages are large (over 1k), and this buffer
gets easily full. Kernel neither knows nor cares were notification
broadcasts delivered. This cannot be fixed by simply increasing the
buffer size, as there's no size that is guaranteed to work in every
use case, and this can require several megabytes of buffer (a way over
normal rmem_max limit) if there are hunderds of vlans.
Only way to recover from this is to refresh all ifinfo list, as it's
invalidated at this point. It cannot easily work around of this by
just refreshing team device and its ports, because library side might
not have ports linked due to events missed, and it doesn't know about
teamd configuration.
Checks now return value of nl_recvmsgs_default() for event socket. In
case of ENOBUFS (which libnl nicely changes to ENOMEM), refreshes
all ifinfo list. get_ifinfo_list() also checks now for removed interfaces
in case of missed dellink event. Currently all TEAM_IFINFO_CHANGE
handlers processed events one by one, so it had to be changed to support
multiple ifinfo changes. For this, ifinfo changed flags are cleared
and removed entries destroyed only after all handlers have been called.
Also, increased nl_cli.sock_event receive buffers to 96k like all other
sockets. Added possibility to change this via environment variable.
Signed-off-by: Antti Tiainen <atiainen(a)forcepoint.com>
Signed-off-by: Jiri Pirko <jiri(a)mellanox.com>