spammy log messages with loadbalance runner
by Daniel Arena
Hello,
I have a loadbalance runner for 2 interfaces setup through
NetworkManager on fedora 30. I noticed in the system journal the teamd
messages have been quite spammy with stats updates/rebalance
information from teamd. See example log lines below. I tried looking
at `man teamd.conf` but only found the debug_level setting, which is
defaulted to 0. Is there some other way to turn these messages off? I
tried adjusting NetworkManager's log levels also, but even after
shutting off all logging, the messages from teamd still get through.
Is this actually an issue for NetworkManager, maybe it should be
capturing these log messages and filtering them with NetworkManager's
log levels?
My teamd config looks like this:
{"runner": {
"name": "loadbalance",
"tx_hash": ["eth","ipv4", "ipv6"],
"tx_balancer": {"name": "basic"}
},
"ports":{"eno1": {}, "eno2": {}}
}
Jul 03 15:41:57 server1 teamd_nm-team[32311]: stats update for port
eno2: "3257841132330".
Jul 03 15:41:57 server1 teamd_nm-team[32311]: stats update for port
eno1: "6585330648".
Jul 03 15:41:57 server1 teamd_nm-team[32311]: stats update for hash
"21": "1242456011".
Jul 03 15:41:57 server1 teamd_nm-team[32311]: stats update for hash
"16": "1270449453".
Jul 03 15:41:57 server1 teamd_nm-team[32311]: stats update for hash
"9": "3761543798".
Jul 03 15:41:57 server1 teamd_nm-team[32311]: stats update for hash
"1": "626647946".
Jul 03 15:41:57 server1 teamd_nm-team[32311]: Remapped hash "21"
(delta 447647) to port eno2.
Jul 03 15:41:57 server1 teamd_nm-team[32311]: Remapped hash "16"
(delta 5301) to port eno1.
Jul 03 15:41:57 server1 teamd_nm-team[32311]: Port eno2 rebalanced,
delta: 447647
Jul 03 15:41:57 server1 teamd_nm-team[32311]: Port eno1 rebalanced, delta: 6205
Thanks,
Dan
3 years, 7 months
Team interface drops packets with kernel 4.14
by Yevhen Kondrashyn
Hi All,
After upgrading from 3.10.104 to kernel 4.14 I see constant drops
(around 10 packets/sec) on team interface in active-backup mode,
increasing txqueuelen on a physical port helps, however, drops still
occur from time to time.
I've added debug printk in team driver where dev_kfree_skb_any is
called, however, I do not see messages in dmesg until I reset physical
interface by ethtool, therefore packets are being dropped somewhere
else, however, for now, I have no clue how to find the exact cause.
Do you have any ideas/suggestions on how to debug this?
Thanks!
net7 Link encap:Ethernet HWaddr 00:1E:67:B5:7F:76
inet addr:192.168.221.203 Bcast:0.0.0.0 Mask:255.255.255.0
inet6 addr: fe80::21e:67ff:feb5:7f76/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:263 errors:0 dropped:151 overruns:0 frame:0
TX packets:83944096 errors:0 dropped:57692 overruns:0 carrier:0
collisions:0 txqueuelen:10000
RX bytes:27531 (26.8 KiB) TX bytes:113995992408 (106.1 GiB)
# teamnl net7 ports
5: eth6: up 1000Mbit FD
# ethtool -S eth6
NIC statistics:
rx_packets: 113
tx_packets: 86938454
rx_bytes: 22848
tx_bytes: 118410084622
rx_broadcast: 68
tx_broadcast: 6
rx_multicast: 45
tx_multicast: 86938448
multicast: 45
collisions: 0
rx_crc_errors: 0
rx_no_buffer_count: 0
rx_missed_errors: 0
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_window_errors: 0
tx_abort_late_coll: 0
tx_deferred_ok: 0
tx_single_coll_ok: 0
tx_multi_coll_ok: 0
tx_timeout_count: 0
rx_long_length_errors: 0
rx_short_length_errors: 0
rx_align_errors: 0
tx_tcp_seg_good: 0
tx_tcp_seg_failed: 0
rx_flow_control_xon: 68
rx_flow_control_xoff: 96
tx_flow_control_xon: 0
tx_flow_control_xoff: 0
rx_long_byte_count: 22848
tx_dma_out_of_sync: 0
lro_aggregated: 0
lro_flushed: 0
tx_smbus: 0
rx_smbus: 0
dropped_smbus: 0
os2bmc_rx_by_bmc: 0
os2bmc_tx_by_bmc: 0
os2bmc_tx_by_host: 0
os2bmc_rx_by_host: 0
tx_hwtstamp_timeouts: 0
rx_hwtstamp_cleared: 0
rx_errors: 0
tx_errors: 0
tx_dropped: 0
rx_length_errors: 0
rx_over_errors: 0
rx_frame_errors: 0
rx_fifo_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
tx_queue_0_packets: 86938440
tx_queue_0_bytes: 118062311560
tx_queue_0_restart: 0
rx_queue_0_packets: 277
rx_queue_0_bytes: 32236
rx_queue_0_drops: 0
rx_queue_0_csum_err: 0
rx_queue_0_alloc_failed: 0
# teamdctl net7 s d
{
"ports": {
"eth6": {
"ifinfo": {
"dev_addr": "00:1e:67:b5:7f:76",
"dev_addr_len": 6,
"ifindex": 5,
"ifname": "eth6"
},
"link": {
"duplex": "full",
"speed": 1000,
"up": true
},
"link_watches": {
"list": {
"link_watch_0": {
"delay_down": 0,
"delay_up": 0,
"down_count": 1,
"name": "ethtool",
"up": true
}
},
"up": true
}
}
},
"runner": {
"active_port": "eth6"
},
"setup": {
"daemonized": true,
"dbus_enabled": false,
"debug_level": 0,
"kernel_team_mode_name": "activebackup",
"pid": 5533,
"pid_file": "/var/run/teamd/net7.pid",
"runner_name": "activebackup",
"zmq_enabled": false
},
"team_device": {
"ifinfo": {
"dev_addr": "00:1e:67:b5:7f:76",
"dev_addr_len": 6,
"ifindex": 13,
"ifname": "net7"
}
}
}
3 years, 7 months
[ISSUE] Failed to find "enabled" option.
by Xin Long
I found this err in "loadbalance" and "lacp" runner when adding ports.
It's caused by trying to set "enabled" option in .port_link_changed()
or .port_changed().
When a new port is added, the first 'port changed event' process is
earlier than CMD TEAM_CMD_OPTIONS_GET, in this CMD, all
the options are synchronized from kernel.
It means there's no 'enabled' option yet when calling port_link_changed
in the first 'port changed event' process. In lb_event_watch_port_link_changed
and lacp_event_watch_port_changed, they call teamd_port_check_enable
to set 'enabled' option. this err is triggered.
I'm not sure why teamd_port_check_enable needs to check if
'enabled' option exists. I checked the ab's .port_link_changed(),
it just sets it by calling team_set_port_enabled(), instead of
checking 'enabled' option first.
can we just use team_set_port_enabled to set it directly in
.port(_link)_changed OR improve teamd_port_check_enable
to avoid this err ?
Thanks.
3 years, 8 months
LACPDU not transmitted during actor state changes
by Meng Koon Lim
Current actor state:
State Flags [Activity, Timeout, Aggregation, Synchronization]
Transitioned to actor state:
State Flags [Activity, Timeout, Aggregation, Synchronization, Collecting, Distributing]
For the above transition, it looks like teamd does not send out an LACPDU to inform. This cause impact to traffic.
I am not sure how to fix this in the code. Can help?
Below shows the LACP packet capture
# switch MAC 2a:99:3a:9e:a1:7d
# server (running libteam) MAC 90:e2:ba:74:83:58
# SENT BY SERVER
20:39:54.810203 LACPv1, length 110
Actor Information TLV (0x01), length 20
System 90:e2:ba:74:83:58 (oui Unknown), System Priority 65535, Key 0, Port 4, Port Priority 255
State Flags [Activity, Timeout, Aggregation, Expired]
0x0000: ffff 90e2 ba74 8358 0000 00ff 0004 8700
0x0010: 0000
Partner Information TLV (0x02), length 20
System 00:00:00:00:00:00 (oui Ethernet), System Priority 0, Key 0, Port 0, Port Priority 0
State Flags [Timeout]
0x0000: 0000 0000 0000 0000 0000 0000 0000 0200
0x0010: 0000
Collector Information TLV (0x03), length 16
Max Delay 0
0x0000: 0000 0000 0000 0000 0000 0000 0000
Terminator TLV (0x00), length 0
# SENT BY SWITCH
20:39:56.307394 LACPv1, length 110
Actor Information TLV (0x01), length 20
System 2a:99:3a:9e:a1:7d (oui Unknown), System Priority 32768, Key 361, Port 306, Port Priority 8192
State Flags [Activity, Aggregation, Expired]
0x0000: 8000 2a99 3a9e a17d 0169 2000 0132 8500
0x0010: 0000
Partner Information TLV (0x02), length 20
System 90:e2:ba:74:83:58 (oui Unknown), System Priority 65535, Key 0, Port 4, Port Priority 255
State Flags [Activity, Timeout, Aggregation, Expired]
0x0000: ffff 90e2 ba74 8358 0000 00ff 0004 8700
0x0010: 0000
Collector Information TLV (0x03), length 16
Max Delay 0
0x0000: 0000 0000 0000 0000 0000 0000 0000
Terminator TLV (0x00), length 0
# SENT BY SERVER
20::39:56.307659 LACPv1, length 110
Actor Information TLV (0x01), length 20
System 90:e2:ba:74:83:58 (oui Unknown), System Priority 65535, Key 0, Port 4, Port Priority 255
State Flags [Activity, Timeout, Aggregation, Synchronization]
0x0000: ffff 90e2 ba74 8358 0000 00ff 0004 0f00
0x0010: 0000
Partner Information TLV (0x02), length 20
System 2a:99:3a:9e:a1:7d (oui Unknown), System Priority 32768, Key 361, Port 306, Port Priority 8192
State Flags [Activity, Aggregation, Expired]
0x0000: 8000 2a99 3a9e a17d 0169 2000 0132 8500
0x0010: 0000
Collector Information TLV (0x03), length 16
Max Delay 0
0x0000: 0000 0000 0000 0000 0000 0000 0000
Terminator TLV (0x00), length 0
# SENT BY SWITCH
20:39:56.827402 LACPv1, length 110
Actor Information TLV (0x01), length 20
System 2a:99:3a:9e:a1:7d (oui Unknown), System Priority 32768, Key 361, Port 306, Port Priority 8192
State Flags [Activity, Aggregation, Synchronization, Collecting]
0x0000: 8000 2a99 3a9e a17d 0169 2000 0132 1d00
0x0010: 0000
Partner Information TLV (0x02), length 20
System 90:e2:ba:74:83:58 (oui Unknown), System Priority 65535, Key 0, Port 4, Port Priority 255
State Flags [Activity, Timeout, Aggregation, Synchronization]
0x0000: ffff 90e2 ba74 8358 0000 00ff 0004 0f00
0x0010: 0000
Collector Information TLV (0x03), length 16
Max Delay 0
0x0000: 0000 0000 0000 0000 0000 0000 0000
Terminator TLV (0x00), length 0
# at this point, libteam log shows that the port moved from expired to current, but no LACPDU was sent by server
Sep 1 20:39:56 video-edge-748358.prg02.justin.tv teamd[1124]: enp4s0f0: Changed port state: "expired" -> "current", enp4s0f0: Changed port state: "expired" -> "current"
# SENT BY SERVER
30 seconds later, teamd sends out LACPDU but notice actor state already changed to Sync, Collect, Distributing
20:40:26.307627 LACPv1, length 110
Actor Information TLV (0x01), length 20
System 90:e2:ba:74:83:58 (oui Unknown), System Priority 65535, Key 0, Port 4, Port Priority 255
State Flags [Activity, Timeout, Aggregation, Synchronization, Collecting, Distributing]
0x0000: ffff 90e2 ba74 8358 0000 00ff 0004 3f00
0x0010: 0000
Partner Information TLV (0x02), length 20
System 2a:99:3a:9e:a1:7d (oui Unknown), System Priority 32768, Key 361, Port 306, Port Priority 8192
State Flags [Activity, Aggregation, Synchronization, Collecting]
0x0000: 8000 2a99 3a9e a17d 0169 2000 0132 1d00
0x0010: 0000
Collector Information TLV (0x03), length 16
Max Delay 0
0x0000: 0000 0000 0000 0000 0000 0000 0000
Terminator TLV (0x00), length 0
3 years, 9 months