[jpirko/libteam] 046fb6: libteam: resynchronize ifinfo after lost RTNLGRP_L...

Monday, 6 February 2017

  Branch: refs/heads/master
  Home:   https://github.com/jpirko/libteam
  Commit: 046fb6ba0aec8246075b18d787daec43201566fa
      https://github.com/jpirko/libteam/commit/046fb6ba0aec8246075b18d787daec43...
  Author: Antti Tiainen <atiainen(a)forcepoint.com&gt;
  Date:   2017-02-06 (Mon, 06 Feb 2017)

  Changed paths:
    M libteam/ifinfo.c
    M libteam/libteam.c
    M libteam/team_private.h

  Log Message:
  -----------
  libteam: resynchronize ifinfo after lost RTNLGRP_LINK notifications

When there's a large number of interfaces (e.g. vlans), teamd loses
link notifications as it cannot read them as fast as kernel is
broadcasting them. This often prevents teamd starting properly if
started concurrently when other links are being set up. It can also
fail when it's up and running, especially in the cases where the team
device itself has a lot of vlans under it.

This can easily be reproduces by simple example (in SMP system) by
manually adding team device with a bunch of vlans, putting it up,
and starting teamd with --take-over option:

  root@debian:~# ip link add name team0 type team
  root@debian:~# for i in `seq 100 150` ; do
...
 ip link add link team0 name team0.$i type vlan id $i ; done  
root@debian:~# ip link set team0 up
  root@debian:~# cat teamd.conf
  {
    "device": "team0",
    "runner": {
      "name": "activebackup"
     },
    "ports": {
      "eth1": {},
      "eth2": {}
    }
  }
  root@debian:~# teamd -o -N -f teamd.conf

At this point, teamd will not give any error messages or other
indication that something is wrong. But state will not look healthy:

  root@debian:~# teamdctl team0 state
  setup:
    runner: activebackup
  ports:
    eth1
      link watches:
  link summary: up
  instance[link_watch_0]:
    name: ethtool
    link: up
    down count: 0
  Failed to parse JSON port dump.
  command call failed (Invalid argument)

If checking state dump, it will show that port eth2 is missing info.
Running strace to teamd will reveal that there's one recvmsgs() that
returned -1 with errno ENOBUFS. What happened in this example was
that when teamd started, all vlans got carrier up, and kernel flooded
notifications faster than teamd could read them. It then lost events
related to port eth2 getting enslaved and up.

The socket that joins RTNLGRP_LINK notifications uses default libnl
32k buffer size. Netlink messages are large (over 1k), and this buffer
gets easily full. Kernel neither knows nor cares were notification
broadcasts delivered. This cannot be fixed by simply increasing the
buffer size, as there's no size that is guaranteed to work in every
use case, and this can require several megabytes of buffer (a way over
normal rmem_max limit) if there are hunderds of vlans.

Only way to recover from this is to refresh all ifinfo list, as it's
invalidated at this point. It cannot easily work around of this by
just refreshing team device and its ports, because library side might
not have ports linked due to events missed, and it doesn't know about
teamd configuration.

Checks now return value of nl_recvmsgs_default() for event socket. In
case of ENOBUFS (which libnl nicely changes to ENOMEM), refreshes
all ifinfo list. get_ifinfo_list() also checks now for removed interfaces
in case of missed dellink event. Currently all TEAM_IFINFO_CHANGE
handlers processed events one by one, so it had to be changed to support
multiple ifinfo changes. For this, ifinfo changed flags are cleared
and removed entries destroyed only after all handlers have been called.

Also, increased nl_cli.sock_event receive buffers to 96k like all other
sockets. Added possibility to change this via environment variable.

Signed-off-by: Antti Tiainen <atiainen(a)forcepoint.com&gt;
Signed-off-by: Jiri Pirko <jiri(a)mellanox.com&gt;

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012