Stange problems with CentOS 7 and roundrobin runner
by Ingo Brand
Hello,
I am currently setting up a high available NFS server cluster with 2
nodes using pacemaker + corosync + drbd + flashcache.
The two nodes (media1 and media2) have 4 1G copper nics each.
I formed 2 teams:
team0: LACP with em1 and em2 connected to a switch for NFS services
team1: roundrobin with em3 and em4 and MTU 9000 directly connected with
copper patch cables for drbd sync
During my testing everything was working as expected. Then I started to
tranfer about 20TB from an old server to the new cluster. The copy
process was fine for about 32 hours. Then corosync and drbd started to
report errors on the roundrobin team1 interface (see logs at the end of
this email).
The drbd sync and corosync started to flap between states for several
hours on team1.
I checked the error counters with ifconfig and found a huge amount off
TX dropped packets on team1 of the drbd slave node media2.
There were NO other error counters raising but team1 on media2.
To make this a bit more clear:
- team1 on media2 (the drbd slave) showed raising TX dropped packets
- em3 and em4 on media2 (the physical interfaces of team1) did not show
any raising error counters.
- team1 on media1 (the drbd master) did not show any raising error counters.
- em3 and em4 on media1 (the physical interfaces of team1) did not show
any raising error counters.
I then thought maybe one of the two patch cables between the two
machines was not 100% ok. So I removed em4 from team1 on both machines:
"teamdctl team1 port remove em4"
The error counters stopped raising.
I thought: "Yeah! That cable must be broken!"
But just to make sure I really found the problem I re-enabled em4 on
both nodes:
"teamdctl team1 port add em4"
Ok, now both team1 interfaces had 2 links up again and the error counter
restarted to raise again.
I then stopped em3 on both nodes to force all traffic through em4:
"teamdctl team1 port remove em3"
And guess what? The error counters on team1 stopped raising!
So in short:
If I remove either one of the two physical interfaces from team1
everything is working without any errors. As soon as I enable both
physical interfaces the error counters start to raise.
After these tests I rebooted the slave node (media2) and added both
ports to team1 again (Have you tried turning it off and on again...).
After doing this, I now see TX dropped packets on the not yet rebooted
media1 team1 interface.
Why do I only see these errors on the team1 interface and not on em3 and
em4?
Currently the copy process of 20TB is still running. I think that if I
reboot media1 everything will work as expected again for some time
because it worked during my initial tests.
But I do think that I hit a bug in the teaming driver and the raising
error counters will come back.
Could anybody help?
Kind regards
Ingo
This is the used network config of team1:
media1:
cat /etc/sysconfig/network-scripts/ifcfg-team1_slave_0
# Generated by parse-kickstart
NAME=team1 slave 0
TEAM_MASTER=team1
DEVICETYPE=TeamPort
DEVICE=em3
ONBOOT=yes
UUID=b7026c5b-e9cd-457c-93d2-a2799361ed90
cat /etc/sysconfig/network-scripts/ifcfg-team1_slave_1
# Generated by parse-kickstart
NAME=team1 slave 1
TEAM_MASTER=team1
DEVICETYPE=TeamPort
DEVICE=em4
ONBOOT=yes
UUID=fbe5bc56-95f5-4f9e-a5ea-ab6b6f5e5b50
cat /etc/sysconfig/network-scripts/ifcfg-team1
# Generated by parse-kickstart
UUID=4e0090cd-32d3-4c5b-be23-9f53083da4dd
NAME="Team connection team1"
TEAM_CONFIG="{\"runner\": {\"name\": \"roundrobin\"}}"
GATEWAY=
IPV6_AUTOCONF=yes
BOOTPROTO=none
DEVICE=team1
MTU=9000
TYPE=Team
ONBOOT=yes
IPV6INIT=yes
DEVICETYPE=Team
IPADDR0=192.168.101.31
PREFIX0=24
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6_DEFROUTE=yes
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
media2:
cat /etc/sysconfig/network-scripts/ifcfg-team1_slave_0
# Generated by parse-kickstart
NAME=team1 slave 0
TEAM_MASTER=team1
DEVICETYPE=TeamPort
DEVICE=em3
ONBOOT=yes
UUID=bfd8e77e-7eb8-47ea-ab7e-95286e073202
cat /etc/sysconfig/network-scripts/ifcfg-team1_slave_1
# Generated by parse-kickstart
NAME=team1 slave 1
TEAM_MASTER=team1
DEVICETYPE=TeamPort
DEVICE=em4
ONBOOT=yes
UUID=93c5dfd8-3b81-4d1e-8ba9-c46f8daceb1a
cat /etc/sysconfig/network-scripts/ifcfg-team1
# Generated by parse-kickstart
UUID=79c843db-996c-41e4-9ee5-2a8f3da244ed
NAME="Team connection team1"
TEAM_CONFIG="{\"runner\": {\"name\": \"roundrobin\"}}"
GATEWAY=
IPV6_AUTOCONF=yes
BOOTPROTO=none
DEVICE=team1
MTU=9000
TYPE=Team
ONBOOT=yes
IPV6INIT=yes
DEVICETYPE=Team
IPADDR0=192.168.101.32
PREFIX0=24
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6_DEFROUTE=yes
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
Here are some system logs from both machines:
============================================================
media1:
============================================================
Nov 6 02:31:27 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412041 iface 192.168.100.31 to [1 of 10]
Nov 6 02:31:29 media1 corosync[3657]: [TOTEM ] ring 0 active with no faults
Nov 6 02:31:29 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412047 iface 192.168.100.31 to [1 of 10]
Nov 6 02:31:31 media1 corosync[3657]: [TOTEM ] ring 0 active with no faults
Nov 6 02:32:51 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412315 iface 192.168.101.31 to [1 of 10]
Nov 6 02:32:53 media1 corosync[3657]: [TOTEM ] ring 1 active with no faults
Nov 6 02:32:53 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412319 iface 192.168.101.31 to [1 of 10]
Nov 6 02:32:54 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412321 iface 192.168.101.31 to [2 of 10]
Nov 6 02:32:55 media1 corosync[3657]: [TOTEM ] Decrementing problem
counter for iface 192.168.101.31 to [1 of 10]
Nov 6 02:32:56 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412325 iface 192.168.101.31 to [2 of 10]
Nov 6 02:32:57 media1 corosync[3657]: [TOTEM ] Decrementing problem
counter for iface 192.168.101.31 to [1 of 10]
Nov 6 02:32:58 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412327 iface 192.168.101.31 to [2 of 10]
Nov 6 02:32:59 media1 kernel: drbd r0: PingAck did not arrive in time.
Nov 6 02:32:59 media1 kernel: drbd r0: peer( Secondary -> Unknown )
conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Nov 6 02:32:59 media1 kernel: block drbd0: new current UUID
0C0308FA4F94F811:59E43E5FD7B26A4F:1963884285B385E6:1962884285B385E6
Nov 6 02:32:59 media1 kernel: drbd r0: asender terminated
Nov 6 02:32:59 media1 kernel: drbd r0: Terminating drbd_a_r0
Nov 6 02:32:59 media1 kernel: drbd r0: Connection closed
Nov 6 02:32:59 media1 kernel: drbd r0: conn( NetworkFailure ->
Unconnected )
Nov 6 02:32:59 media1 kernel: drbd r0: receiver terminated
Nov 6 02:32:59 media1 kernel: drbd r0: Restarting receiver thread
Nov 6 02:32:59 media1 kernel: drbd r0: receiver (re)started
Nov 6 02:32:59 media1 kernel: drbd r0: conn( Unconnected -> WFConnection )
Nov 6 02:32:59 media1 corosync[3657]: [TOTEM ] Decrementing problem
counter for iface 192.168.101.31 to [1 of 10]
Nov 6 02:32:59 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412329 iface 192.168.101.31 to [2 of 10]
Nov 6 02:33:01 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412333 iface 192.168.101.31 to [3 of 10]
Nov 6 02:33:01 media1 corosync[3657]: [TOTEM ] Decrementing problem
counter for iface 192.168.101.31 to [2 of 10]
Nov 6 02:33:01 media1 crmd[4960]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Nov 6 02:33:01 media1 pengine[4959]: notice: unpack_config: On loss of
CCM Quorum: Ignore
Nov 6 02:33:01 media1 pengine[4959]: warning: unpack_rsc_op: Processing
failed op monitor for res_nfsserver_mediafiles on media1: unknown error
(1)
Nov 6 02:33:01 media1 pengine[4959]: warning: unpack_rsc_op: Processing
failed op monitor for res_nfsserver_mediafiles on media2: unknown error
(1)
Nov 6 02:33:01 media1 pengine[4959]: notice: process_pe_message:
Calculated Transition 144: /var/lib/pacemaker/pengine/pe-input-227.bz2
Nov 6 02:33:01 media1 crmd[4960]: notice: run_graph: Transition 144
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-227.bz2): Complete
Nov 6 02:33:01 media1 crmd[4960]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Nov 6 02:33:01 media1 kernel: drbd r0: Handshake successful: Agreed
network protocol version 101
Nov 6 02:33:01 media1 kernel: drbd r0: Agreed to support TRIM on
protocol level
Nov 6 02:33:01 media1 kernel: drbd r0: Peer authenticated using 20
bytes HMAC
Nov 6 02:33:01 media1 kernel: drbd r0: conn( WFConnection ->
WFReportParams )
Nov 6 02:33:01 media1 kernel: drbd r0: Starting asender thread (from
drbd_r_r0 [5999])
Nov 6 02:33:02 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412337 iface 192.168.101.31 to [3 of 10]
Nov 6 02:33:02 media1 kernel: block drbd0: drbd_sync_handshake:
Nov 6 02:33:02 media1 kernel: block drbd0: self
0C0308FA4F94F811:59E43E5FD7B26A4F:1963884285B385E6:1962884285B385E6
bits:192101 flags:0
Nov 6 02:33:02 media1 kernel: block drbd0: peer
59E43E5FD7B26A4E:0000000000000000:1963884285B385E6:1962884285B385E6
bits:0 flags:0
Nov 6 02:33:02 media1 kernel: block drbd0: uuid_compare()=1 by rule 70
Nov 6 02:33:02 media1 kernel: block drbd0: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
Nov 6 02:33:02 media1 kernel: block drbd0: send bitmap stats
[Bytes(packets)]: plain 0(0), RLE 185(1), total 185; compression: 100.0%
Nov 6 02:33:02 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412339 iface 192.168.101.31 to [4 of 10]
Nov 6 02:33:03 media1 kernel: block drbd0: receive bitmap stats
[Bytes(packets)]: plain 0(0), RLE 185(1), total 185; compression: 100.0%
Nov 6 02:33:03 media1 kernel: block drbd0: helper command:
/sbin/drbdadm before-resync-source minor-0
Nov 6 02:33:03 media1 kernel: block drbd0: helper command:
/sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0)
Nov 6 02:33:03 media1 kernel: block drbd0: conn( WFBitMapS ->
SyncSource ) pdsk( Consistent -> Inconsistent )
Nov 6 02:33:03 media1 kernel: block drbd0: Began resync as SyncSource
(will sync 919244 KB [229811 bits set]).
Nov 6 02:33:03 media1 kernel: block drbd0: updated sync UUID
0C0308FA4F94F811:59E53E5FD7B26A4F:59E43E5FD7B26A4F:1963884285B385E6
Nov 6 02:33:03 media1 corosync[3657]: [TOTEM ] Decrementing problem
counter for iface 192.168.101.31 to [3 of 10]
Nov 6 02:33:03 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412343 iface 192.168.101.31 to [4 of 10]
Nov 6 02:33:04 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412345 iface 192.168.101.31 to [5 of 10]
Nov 6 02:33:05 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412347 iface 192.168.101.31 to [6 of 10]
Nov 6 02:33:05 media1 corosync[3657]: [TOTEM ] Decrementing problem
counter for iface 192.168.101.31 to [5 of 10]
Nov 6 02:33:05 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412349 iface 192.168.101.31 to [6 of 10]
Nov 6 02:33:06 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412351 iface 192.168.101.31 to [7 of 10]
Nov 6 02:33:07 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412355 iface 192.168.101.31 to [8 of 10]
Nov 6 02:33:07 media1 corosync[3657]: [TOTEM ] Decrementing problem
counter for iface 192.168.101.31 to [7 of 10]
Nov 6 02:33:07 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412357 iface 192.168.101.31 to [8 of 10]
Nov 6 02:33:08 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412361 iface 192.168.101.31 to [9 of 10]
Nov 6 02:33:09 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412363 iface 192.168.101.31 to [10 of 10]
Nov 6 02:33:09 media1 corosync[3657]: [TOTEM ] Marking seqid 412363
ringid 1 interface 192.168.101.31 FAULTY
Nov 6 02:33:11 media1 crmd[4960]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Nov 6 02:33:11 media1 pengine[4959]: notice: unpack_config: On loss of
CCM Quorum: Ignore
Nov 6 02:33:11 media1 pengine[4959]: warning: unpack_rsc_op: Processing
failed op monitor for res_nfsserver_mediafiles on media1: unknown error
(1)
Nov 6 02:33:11 media1 pengine[4959]: warning: unpack_rsc_op: Processing
failed op monitor for res_nfsserver_mediafiles on media2: unknown error
(1)
Nov 6 02:33:11 media1 pengine[4959]: notice: process_pe_message:
Calculated Transition 145: /var/lib/pacemaker/pengine/pe-input-228.bz2
Nov 6 02:33:11 media1 crmd[4960]: notice: run_graph: Transition 145
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-228.bz2): Complete
Nov 6 02:33:11 media1 crmd[4960]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Nov 6 02:33:17 media1 corosync[3657]: [TOTEM ] Automatically recovered
ring 1
Nov 6 02:33:18 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412387 iface 192.168.101.31 to [1 of 10]
Nov 6 02:33:20 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412389 iface 192.168.101.31 to [2 of 10]
Nov 6 02:33:20 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412391 iface 192.168.101.31 to [3 of 10]
Nov 6 02:33:21 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412393 iface 192.168.101.31 to [4 of 10]
Nov 6 02:33:22 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412397 iface 192.168.101.31 to [5 of 10]
Nov 6 02:33:22 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412399 iface 192.168.101.31 to [6 of 10]
Nov 6 02:33:24 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412403 iface 192.168.101.31 to [7 of 10]
Nov 6 02:33:31 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412421 iface 192.168.101.31 to [8 of 10]
Nov 6 02:33:32 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412423 iface 192.168.101.31 to [9 of 10]
Nov 6 02:33:34 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412429 iface 192.168.101.31 to [10 of 10]
Nov 6 02:33:34 media1 corosync[3657]: [TOTEM ] Marking seqid 412429
ringid 1 interface 192.168.101.31 FAULTY
Nov 6 02:33:36 media1 corosync[3657]: [TOTEM ] Automatically recovered
ring 1
Nov 6 02:33:39 media1 corosync[3657]: [TOTEM ] Incrementing problem
counter for seqid 412437 iface 192.168.101.31 to [1 of 10]
============================================================
media2:
============================================================
Nov 6 02:32:59 media2 kernel: drbd r0: sock was shut down by peer
Nov 6 02:32:59 media2 kernel: drbd r0: peer( Primary -> Unknown ) conn(
Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
Nov 6 02:32:59 media2 kernel: drbd r0: short read (expected size 16)
Nov 6 02:32:59 media2 kernel: drbd r0: asender terminated
Nov 6 02:32:59 media2 kernel: drbd r0: Terminating drbd_a_r0
Nov 6 02:32:59 media2 kernel: drbd r0: Connection closed
Nov 6 02:32:59 media2 kernel: drbd r0: conn( BrokenPipe -> Unconnected
)
Nov 6 02:32:59 media2 kernel: drbd r0: receiver terminated
Nov 6 02:32:59 media2 kernel: drbd r0: Restarting receiver thread
Nov 6 02:32:59 media2 kernel: drbd r0: receiver (re)started
Nov 6 02:32:59 media2 kernel: drbd r0: conn( Unconnected ->
WFConnection )
Nov 6 02:33:01 media2 attrd[4940]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-res_drbd_mediafiles (1000)
Nov 6 02:33:01 media2 attrd[4940]: notice: attrd_perform_update: Sent
update 16: master-res_drbd_mediafiles=1000
Nov 6 02:33:01 media2 kernel: drbd r0: Handshake successful: Agreed
network protocol version 101
Nov 6 02:33:01 media2 kernel: drbd r0: Agreed to support TRIM on
protocol level
Nov 6 02:33:01 media2 kernel: drbd r0: Peer authenticated using 20
bytes HMAC
Nov 6 02:33:01 media2 kernel: drbd r0: conn( WFConnection ->
WFReportParams )
Nov 6 02:33:01 media2 kernel: drbd r0: Starting asender thread (from
drbd_r_r0 [6367])
Nov 6 02:33:01 media2 kernel: block drbd0: drbd_sync_handshake:
Nov 6 02:33:01 media2 kernel: block drbd0: self
59E43E5FD7B26A4E:0000000000000000:1963884285B385E6:1962884285B385E6
bits:0 flags:0
Nov 6 02:33:01 media2 kernel: block drbd0: peer
0C0308FA4F94F811:59E43E5FD7B26A4F:1963884285B385E6:1962884285B385E6
bits:192101 flags:0
Nov 6 02:33:01 media2 kernel: block drbd0: uuid_compare()=-1 by rule 50
Nov 6 02:33:01 media2 kernel: block drbd0: peer( Unknown -> Primary )
conn( WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk(
DUnknown -> UpToDate )
Nov 6 02:33:02 media2 kernel: block drbd0: receive bitmap stats
[Bytes(packets)]: plain 0(0), RLE 185(1), total 185; compression: 100.0%
Nov 6 02:33:03 media2 kernel: block drbd0: send bitmap stats
[Bytes(packets)]: plain 0(0), RLE 185(1), total 185; compression: 100.0%
Nov 6 02:33:03 media2 kernel: block drbd0: conn( WFBitMapT ->
WFSyncUUID )
Nov 6 02:33:03 media2 kernel: block drbd0: updated sync uuid
59E53E5FD7B26A4E:0000000000000000:1963884285B385E6:1962884285B385E6
Nov 6 02:33:03 media2 kernel: block drbd0: helper command:
/sbin/drbdadm before-resync-target minor-0
Nov 6 02:33:03 media2 kernel: block drbd0: helper command:
/sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
Nov 6 02:33:03 media2 kernel: block drbd0: conn( WFSyncUUID ->
SyncTarget ) disk( Outdated -> Inconsistent )
Nov 6 02:33:03 media2 kernel: block drbd0: Began resync as SyncTarget
(will sync 919244 KB [229811 bits set]).
Nov 6 02:33:10 media2 corosync[3690]: [TOTEM ] Incrementing problem
counter for seqid 412364 iface 192.168.101.32 to [1 of 10]
Nov 6 02:33:10 media2 corosync[3690]: [TOTEM ] Automatically recovered
ring 1
Nov 6 02:33:10 media2 corosync[3690]: [TOTEM ] Incrementing problem
counter for seqid 412366 iface 192.168.101.32 to [1 of 10]
Nov 6 02:33:11 media2 corosync[3690]: [TOTEM ] Automatically recovered
ring 1
Nov 6 02:33:11 media2 attrd[4940]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-res_drbd_mediafiles (10)
Nov 6 02:33:11 media2 attrd[4940]: notice: attrd_perform_update: Sent
update 18: master-res_drbd_mediafiles=10
Nov 6 02:33:11 media2 corosync[3690]: [TOTEM ] Incrementing problem
counter for seqid 412368 iface 192.168.101.32 to [1 of 10]
Nov 6 02:33:12 media2 corosync[3690]: [TOTEM ] ring 1 active with no
faults
Nov 6 02:33:12 media2 corosync[3690]: [TOTEM ] Incrementing problem
counter for seqid 412370 iface 192.168.101.32 to [1 of 10]
Nov 6 02:33:12 media2 corosync[3690]: [TOTEM ] Automatically recovered
ring 1
Nov 6 02:33:12 media2 corosync[3690]: [TOTEM ] Incrementing problem
counter for seqid 412372 iface 192.168.101.32 to [1 of 10]
Nov 6 02:33:13 media2 corosync[3690]: [TOTEM ] Incrementing problem
counter for seqid 412374 iface 192.168.101.32 to [2 of 10]
Nov 6 02:33:14 media2 corosync[3690]: [TOTEM ] Decrementing problem
counter for iface 192.168.101.32 to [1 of 10]
Nov 6 02:33:14 media2 corosync[3690]: [TOTEM ] Incrementing problem
counter for seqid 412376 iface 192.168.101.32 to [2 of 10]
Nov 6 02:33:15 media2 corosync[3690]: [TOTEM ] Incrementing problem
counter for seqid 412378 iface 192.168.101.32 to [3 of 10]
Nov 6 02:33:15 media2 corosync[3690]: [TOTEM ] Automatically recovered
ring 1
Nov 6 02:33:15 media2 corosync[3690]: [TOTEM ] Incrementing problem
counter for seqid 412380 iface 192.168.101.32 to [1 of 10]
Nov 6 02:33:16 media2 corosync[3690]: [TOTEM ] ring 1 active with no
faults
Nov 6 02:33:16 media2 corosync[3690]: [TOTEM ] Incrementing problem
counter for seqid 412382 iface 192.168.101.32 to [1 of 10]
Nov 6 02:33:16 media2 corosync[3690]: [TOTEM ] Automatically recovered
ring 1
Nov 6 02:33:17 media2 corosync[3690]: [TOTEM ] Incrementing problem
counter for seqid 412384 iface 192.168.101.32 to [1 of 10]
Nov 6 02:33:17 media2 corosync[3690]: [TOTEM ] Automatically recovered
ring 1
Nov 6 02:33:17 media2 corosync[3690]: [TOTEM ] Incrementing problem
counter for seqid 412386 iface 192.168.101.32 to [1 of 10]
Nov 6 02:33:18 media2 corosync[3690]: [TOTEM ] ring 1 active with no
faults
Nov 6 02:33:19 media2 corosync[3690]: [TOTEM ] Incrementing problem
counter for seqid 412388 iface 192.168.100.32 to [1 of 10]
Nov 6 02:33:20 media2 corosync[3690]: [TOTEM ] ring 0 active with no
faults
Nov 6 02:33:35 media2 corosync[3690]: [TOTEM ] Automatically recovered
ring 1
Nov 6 02:33:36 media2 corosync[3690]: [TOTEM ] Incrementing problem
counter for seqid 412430 iface 192.168.101.32 to [1 of 10]
Nov 6 02:33:36 media2 corosync[3690]: [TOTEM ] Automatically recovered
ring 1
Nov 6 02:33:37 media2 corosync[3690]: [TOTEM ] Incrementing problem
counter for seqid 412432 iface 192.168.101.32 to [1 of 10]
Nov 6 02:33:38 media2 corosync[3690]: [TOTEM ] ring 1 active with no
faults
6 years, 9 months
You have new fax, document 0000534599
by Interfax Service
You have received a new fax.
You can find your fax document in the attachment.
Number of pages: 11
File name: document_0000534599.doc
Resolution: 600 DPI
Scan duration: 12 seconds
Scan date: Mon, 30 Nov 2015 10:57:09 +0300
Scanned by: Jordan Allison
File size: 239 Kb
Thank you for using Interfax!
7 years, 6 months
[PATCH] dbus: don't do <deny send_interface="..." /> in template dbus s. f.
by Pawel Wieczorkiewicz
Remove '<deny send_interface="..." />' directives from the dbus
config template, since they were removed from the main file as
well. The template file is used to replace the original dbus
config file with per user directives specified during compile
time.
This is a change originating from commit: cf24f96afb338d.
Signed-off-by: Pawel Wieczorkiewicz <pwieczorkiewicz(a)suse.de>
---
teamd/teamd.conf.in | 2 --
1 file changed, 2 deletions(-)
diff --git a/teamd/teamd.conf.in b/teamd/teamd.conf.in
index f4e3017..6ca3282 100644
--- a/teamd/teamd.conf.in
+++ b/teamd/teamd.conf.in
@@ -4,11 +4,9 @@
<busconfig>
<policy user="root">
<allow own_prefix="org.libteam.teamd"/>
- <allow send_interface="org.libteam.teamd"/>
</policy>
<policy user="@teamd_user@">
<allow own_prefix="org.libteam.teamd"/>
- <allow send_interface="org.libteam.teamd"/>
</policy>
<policy context="default">
<deny own_prefix="org.libteam.teamd"/>
--
2.6.2
7 years, 6 months
[patch libteam] libteam: retry on NLE_DUMP_INTR error
by Chris Card
Fix the get_ifinfo_list() function to retry if nl_recvmsgs fails with
error -NLE_DUMP_INTR.
Add some extra error reporting to get_ifinfo_list().
This is a fix for https://bugzilla.redhat.com/show_bug.cgi?id=1273052
Signed-off-by: Chris Card <ctcard1(a)gmail.com>
---
libteam/ifinfo.c | 49 ++++++++++++++++++++++++++++++++-----------------
1 file changed, 32 insertions(+), 17 deletions(-)
diff --git a/libteam/ifinfo.c b/libteam/ifinfo.c
index 484ac1b..df3d49a 100644
--- a/libteam/ifinfo.c
+++ b/libteam/ifinfo.c
@@ -332,24 +332,39 @@ int get_ifinfo_list(struct team_handle *th)
.rtgen_family = AF_UNSPEC,
};
int ret;
-
- ret = nl_send_simple(th->nl_cli.sock, RTM_GETLINK, NLM_F_DUMP,
- &rt_hdr, sizeof(rt_hdr));
- if (ret < 0)
- return -nl2syserr(ret);
- orig_cb = nl_socket_get_cb(th->nl_cli.sock);
- cb = nl_cb_clone(orig_cb);
- nl_cb_put(orig_cb);
- if (!cb)
- return -ENOMEM;
-
- nl_cb_set(cb, NL_CB_VALID, NL_CB_CUSTOM, valid_handler, th);
-
- ret = nl_recvmsgs(th->nl_cli.sock, cb);
- nl_cb_put(cb);
+ int retry = 1;
+
+ while (retry) {
+ retry = 0;
+ ret = nl_send_simple(th->nl_cli.sock, RTM_GETLINK, NLM_F_DUMP,
+ &rt_hdr, sizeof(rt_hdr));
+ if (ret < 0) {
+ err(th, "get_ifinfo_list: nl_send_simple failed");
+ return -nl2syserr(ret);
+ }
+ orig_cb = nl_socket_get_cb(th->nl_cli.sock);
+ cb = nl_cb_clone(orig_cb);
+ nl_cb_put(orig_cb);
+ if (!cb) {
+ err(th, "get_ifinfo_list: nl_cb_clone failed");
+ return -ENOMEM;
+ }
+
+ nl_cb_set(cb, NL_CB_VALID, NL_CB_CUSTOM, valid_handler, th);
+
+ ret = nl_recvmsgs(th->nl_cli.sock, cb);
+ nl_cb_put(cb);
+ if (ret < 0) {
+ err(th, "get_ifinfo_list: nl_recvmsgs failed");
+ if (ret != -NLE_DUMP_INTR)
+ return -nl2syserr(ret);
+ retry = 1;
+ }
+ }
+ ret = check_call_change_handlers(th, TEAM_IFINFO_CHANGE);
if (ret < 0)
- return -nl2syserr(ret);
- return check_call_change_handlers(th, TEAM_IFINFO_CHANGE);
+ err(th, "get_ifinfo_list: check_call_change_handers failed");
+ return ret;
}
int ifinfo_list_init(struct team_handle *th)
--
2.4.3
7 years, 6 months
[patch libteam] retry on NLE_DUMP_INTR error
by Chris Card
Fix the get_ifinfo_list() function to retry if nl_recvmsgs fails with error -NLE_DUMP_INTR.
Add some extra error reporting to get_ifinfo_list().
This is a fix for https://bugzilla.redhat.com/show_bug.cgi?id=1273052
Signed-off-by: Chris Card <ctcard1(a)gmail.com>
---
libteam/ifinfo.c | 49 ++++++++++++++++++++++++++++++++-----------------
1 file changed, 32 insertions(+), 17 deletions(-)
diff --git a/libteam/ifinfo.c b/libteam/ifinfo.c
index 484ac1b..df3d49a 100644
--- a/libteam/ifinfo.c
+++ b/libteam/ifinfo.c
@@ -332,24 +332,39 @@ int get_ifinfo_list(struct team_handle *th)
.rtgen_family = AF_UNSPEC,
};
int ret;
-
- ret = nl_send_simple(th->nl_cli.sock, RTM_GETLINK, NLM_F_DUMP,
- &rt_hdr, sizeof(rt_hdr));
- if (ret < 0)
- return -nl2syserr(ret);
- orig_cb = nl_socket_get_cb(th->nl_cli.sock);
- cb = nl_cb_clone(orig_cb);
- nl_cb_put(orig_cb);
- if (!cb)
- return -ENOMEM;
-
- nl_cb_set(cb, NL_CB_VALID, NL_CB_CUSTOM, valid_handler, th);
-
- ret = nl_recvmsgs(th->nl_cli.sock, cb);
- nl_cb_put(cb);
+ int retry = 1;
+
+ while (retry) {
+ retry = 0;
+ ret = nl_send_simple(th->nl_cli.sock, RTM_GETLINK, NLM_F_DUMP,
+ &rt_hdr, sizeof(rt_hdr));
+ if (ret < 0) {
+ err(th, "get_ifinfo_list: nl_send_simple failed");
+ return -nl2syserr(ret);
+ }
+ orig_cb = nl_socket_get_cb(th->nl_cli.sock);
+ cb = nl_cb_clone(orig_cb);
+ nl_cb_put(orig_cb);
+ if (!cb) {
+ err(th, "get_ifinfo_list: nl_cb_clone failed");
+ return -ENOMEM;
+ }
+
+ nl_cb_set(cb, NL_CB_VALID, NL_CB_CUSTOM, valid_handler, th);
+
+ ret = nl_recvmsgs(th->nl_cli.sock, cb);
+ nl_cb_put(cb);
+ if (ret < 0) {
+ err(th, "get_ifinfo_list: nl_recvmsgs failed");
+ if (ret != -NLE_DUMP_INTR)
+ return -nl2syserr(ret);
+ retry = 1;
+ }
+ }
+ ret = check_call_change_handlers(th, TEAM_IFINFO_CHANGE);
if (ret < 0)
- return -nl2syserr(ret);
- return check_call_change_handlers(th, TEAM_IFINFO_CHANGE);
+ err(th, "get_ifinfo_list: check_call_change_handers failed");
+ return ret;
}
int ifinfo_list_init(struct team_handle *th)
--
2.4.3
7 years, 6 months
[PATCH] retry on NLE_DUMP_INTR error
by Chris Card
---
libteam/ifinfo.c | 55 +++++++++++++++++++++++++++++++++++++++----------------
1 file changed, 39 insertions(+), 16 deletions(-)
diff --git a/libteam/ifinfo.c b/libteam/ifinfo.c
index 484ac1b..0ad24b4 100644
--- a/libteam/ifinfo.c
+++ b/libteam/ifinfo.c
@@ -333,23 +333,46 @@ int get_ifinfo_list(struct team_handle *th)
};
int ret;
- ret = nl_send_simple(th->nl_cli.sock, RTM_GETLINK, NLM_F_DUMP,
- &rt_hdr, sizeof(rt_hdr));
- if (ret < 0)
- return -nl2syserr(ret);
- orig_cb = nl_socket_get_cb(th->nl_cli.sock);
- cb = nl_cb_clone(orig_cb);
- nl_cb_put(orig_cb);
- if (!cb)
- return -ENOMEM;
-
- nl_cb_set(cb, NL_CB_VALID, NL_CB_CUSTOM, valid_handler, th);
-
- ret = nl_recvmsgs(th->nl_cli.sock, cb);
- nl_cb_put(cb);
+ int retry = 1;
+ while (retry)
+ {
+ retry = 0;
+ ret = nl_send_simple(th->nl_cli.sock, RTM_GETLINK, NLM_F_DUMP,
+ &rt_hdr, sizeof(rt_hdr));
+ if (ret < 0)
+ {
+ err(th, "get_ifinfo_list: nl_send_simple failed: ret = %d", ret);
+ return -nl2syserr(ret);
+ }
+ orig_cb = nl_socket_get_cb(th->nl_cli.sock);
+ cb = nl_cb_clone(orig_cb);
+ nl_cb_put(orig_cb);
+ if (!cb)
+ {
+ err(th, "get_ifinfo_list: nl_cb_clone failed");
+ return -ENOMEM;
+ }
+
+ nl_cb_set(cb, NL_CB_VALID, NL_CB_CUSTOM, valid_handler, th);
+
+ ret = nl_recvmsgs(th->nl_cli.sock, cb);
+ nl_cb_put(cb);
+ if (ret < 0)
+ {
+ err(th, "get_ifinfo_list: nl_recvmsgs failed: ret = %d", ret);
+ if (ret != -NLE_DUMP_INTR)
+ {
+ return -nl2syserr(ret);
+ }
+ retry = 1;
+ }
+ }
+ ret = check_call_change_handlers(th, TEAM_IFINFO_CHANGE);
if (ret < 0)
- return -nl2syserr(ret);
- return check_call_change_handlers(th, TEAM_IFINFO_CHANGE);
+ {
+ err(th, "get_ifinfo_list: check_call_change_handers failed: ret = %d", ret);
+ }
+ return ret;
}
int ifinfo_list_init(struct team_handle *th)
--
2.4.3
7 years, 6 months
Re: [patch libteam] Log more detailed error messages from
get_ifinfo_list() and retry if nl_recvmsgs() returns NLE_DUMP_INTR
Signed-off-by: Chris Card <ctcard@hotmail.com>
by Jiri Pirko
Wed, Nov 25, 2015 at 03:21:20PM CET, ctcard(a)hotmail.com wrote:
>
>
>> Date: Wed, 25 Nov 2015 15:07:15 +0100
>> From: jiri(a)resnulli.us
>> To: ctcard(a)hotmail.com
>> CC: libteam(a)lists.fedorahosted.org
>> Subject: Re: [patch libteam] Log more detailed error messages from get_ifinfo_list() and retry if nl_recvmsgs() returns NLE_DUMP_INTR Signed-off-by: Chris Card <ctcard(a)hotmail.com>
>>
>> Chris, you are kidding, right?
>> Please take 5 minutes and think about what is wrong with this
>> submission. Next time, please do it before you send a patch.
>>
>> Thanks!
>>
>> Jiri
>>
>Hi Jiri,
>
>I'm trying to be helpful by supplying you with a (working) patch. I created the patch a while ago using git format-patch, and finally got round to investigating how to use git send-email today.
>I read the man page for git-send-email and followed the instructions.
>I ran "git send-email --to libteam(a)lists.fedorahosted.org patch" to send it via Outlook smtp.
>I've no idea what the problem might be, but I've given you the patch in 3 different ways now. If it's not in the correct format, I'm sure you can extract the patch from what I've sent you.
ccing mailing list.
Just please look at the subject line, looks like all test is squashed
there. Signed off is there. It should be in message body. Also, please
provide some patch description in message body (not subject). Subject
should just briefly tell what is going on.
Also, when you look at the patch, your editor is screwing up indentation.
Please set your editor correctly in order to not screw up indentation.
After that, it would be much more easier to review, and also possible to
apply.
I wonder how you don't see this...
Please see other patches in git history using "git show". They might help
you to format correctly.
I don't want anything abnormal. I just want sane patch. It is in fact
nothing hard. It is in fact very easy.
7 years, 6 months
[PATCH] dbus: don't do <deny send_interface="..." /> in template dbus s. f.
by Pawel Wieczorkiewicz
This is a change originating from commit: cf24f96afb338d
Signed-off-by: Pawel Wieczorkiewicz <pwieczorkiewicz(a)suse.de>
---
teamd/teamd.conf.in | 2 --
1 file changed, 2 deletions(-)
diff --git a/teamd/teamd.conf.in b/teamd/teamd.conf.in
index f4e3017..6ca3282 100644
--- a/teamd/teamd.conf.in
+++ b/teamd/teamd.conf.in
@@ -4,11 +4,9 @@
<busconfig>
<policy user="root">
<allow own_prefix="org.libteam.teamd"/>
- <allow send_interface="org.libteam.teamd"/>
</policy>
<policy user="@teamd_user@">
<allow own_prefix="org.libteam.teamd"/>
- <allow send_interface="org.libteam.teamd"/>
</policy>
<policy context="default">
<deny own_prefix="org.libteam.teamd"/>
--
2.6.2
7 years, 6 months