Hi,
we experiencing sudden stops of replication with this message in the logs:
[27/Mar/2023:21:01:12.262842335+0200] - WARN - agmt="cn=IP"(xx-sso5:636) - clcache_load_buffer - bulk load cursor(64063f010000006a0000) islower than starting csn 6421e16e0000006a0000. Ending session.
The Relevant Code seems to be here https://fossies.org/linux/389-ds-base/ldap/servers/plugins/replication/cl5_c...
/* safety checking, we do not want to (re)start replication before 344 * the inital computed starting point 345 */ 346 if (initial_starting_csn) { 347 csn_init_by_string(&limit_csn, initial_starting_csn); 348 if (csn_compare(&limit_csn, buf->buf_current_csn) > 0) { 349 char curr[CSN_STRSIZE]; 350 int loglevel = SLAPI_LOG_REPL; 351 352 if (csn_time_difference(&limit_csn, buf->buf_current_csn) > (24 * 60 * 60)) { 353 /* This is a big jump (more than a day) behind the 354 * initial starting csn. Log a warning before ending 355 * the session 356 */ 357 loglevel = SLAPI_LOG_WARNING; 358 } 359 csn_as_string(buf->buf_current_csn, 0, curr); 360 slapi_log_err(loglevel, buf->buf_agmt_name, 361 "clcache_load_buffer - bulk load cursor (%s) is lower than starting csn %s. Ending session.\n", curr, initial_starting_csn); 362 /* it just end the session with UPDATE_NO_MORE_UPDATES */ 363 rc = CLC_STATE_DONE; 364 } 365 }
The loglevel is only set to SLAPI_LOG_WARNING if the replica is more one day (!) behind... However that is not the case, and there is also no time glitch between the servers. Do you have any idea what is causing the problem?
br
Harald
Hi Harald,
What version of 389ds are you running ?
This message was introduced with #4492[1] and caught a weird condition where a replication agreement was loading changes that are older of the starting point of the replication. Finally it appeared that #4492 was likely related to #4644 [2] that was identified a bit later.
One of the condition to hit #4644 [2] is that some changes are large (several Mb). Do you know if it applies to your deployment.
[1] https://github.com/389ds/389-ds-base/issues/4492 [2] https://github.com/389ds/389-ds-base/issues/4644
regards thierry
On 3/28/23 13:56, Harald Strack wrote:
Hi,
we experiencing sudden stops of replication with this message in the logs:
[27/Mar/2023:21:01:12.262842335+0200] - WARN - agmt="cn=IP"(xx-sso5:636) - clcache_load_buffer - bulk load cursor(64063f010000006a0000) islower than starting csn 6421e16e0000006a0000. Ending session.
The Relevant Code seems to be here https://fossies.org/linux/389-ds-base/ldap/servers/plugins/replication/cl5_c...
/* safety checking, we do not want to (re)start replication before 344
- the inital computed starting point 345 */ 346 if
(initial_starting_csn) { 347 csn_init_by_string(&limit_csn, initial_starting_csn); 348 if (csn_compare(&limit_csn, buf->buf_current_csn) > 0) { 349 char curr[CSN_STRSIZE]; 350 int loglevel = SLAPI_LOG_REPL; 351 352 if (csn_time_difference(&limit_csn, buf->buf_current_csn) > (24 * 60 * 60)) { 353 /* This is a big jump (more than a day) behind the 354 * initial starting csn. Log a warning before ending 355 * the session 356 */ 357 loglevel = SLAPI_LOG_WARNING; 358 } 359 csn_as_string(buf->buf_current_csn, 0, curr); 360 slapi_log_err(loglevel, buf->buf_agmt_name, 361 "clcache_load_buffer - bulk load cursor (%s) is lower than starting csn %s. Ending session.\n", curr, initial_starting_csn); 362 /* it just end the session with UPDATE_NO_MORE_UPDATES */ 363 rc = CLC_STATE_DONE; 364 } 365 }
The loglevel is only set to SLAPI_LOG_WARNING if the replica is more one day (!) behind... However that is not the case, and there is also no time glitch between the servers. Do you have any idea what is causing the problem?
br
Harald
389-devel mailing list --389-devel@lists.fedoraproject.org To unsubscribe send an email to389-devel-leave@lists.fedoraproject.org Fedora Code of Conduct:https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines:https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.... Do not reply to spam, report it:https://pagure.io/fedora-infrastructure/new_issue
Hi Thierry,
thanks for digging this out! We use Version 1.4.4.11-2 on debian bullseye:
https://packages.debian.org/bullseye/389-ds
appearently the commits from #4644 are missing there, just one month or so :-(
https://salsa.debian.org/freeipa-team/389-ds-base/-/commits/debian/1.4.4.11-...
Seems that we have to upgrade and build our own packages? ... or run it all in podman with rocky linux. Any suggestions?
br
Harald
On 28.03.23 14:25, Thierry Bordaz wrote:
Hi Harald,
What version of 389ds are you running ?
This message was introduced with #4492[1] and caught a weird condition where a replication agreement was loading changes that are older of the starting point of the replication. Finally it appeared that #4492 was likely related to #4644 [2] that was identified a bit later.
One of the condition to hit #4644 [2] is that some changes are large (several Mb). Do you know if it applies to your deployment.
[1] https://github.com/389ds/389-ds-base/issues/4492 [2] https://github.com/389ds/389-ds-base/issues/4644
regards thierry
Hi Harald,
I agree a probable cause of your message is #4644. THe impact of the bug varies depending on the size of the changelog and how frequent you have large updates. The bug slowdown replication but eventually replication should succeed. If large updates are rare, then you should not hit it that frequently.
regards thierry
On 3/28/23 15:40, Harald Strack wrote:
Hi Thierry,
thanks for digging this out! We use Version 1.4.4.11-2 on debian bullseye:
https://packages.debian.org/bullseye/389-ds
appearently the commits from #4644 are missing there, just one month or so :-(
https://salsa.debian.org/freeipa-team/389-ds-base/-/commits/debian/1.4.4.11-...
Seems that we have to upgrade and build our own packages? ... or run it all in podman with rocky linux. Any suggestions?
br
Harald
On 28.03.23 14:25, Thierry Bordaz wrote:
Hi Harald,
What version of 389ds are you running ?
This message was introduced with #4492[1] and caught a weird condition where a replication agreement was loading changes that are older of the starting point of the replication. Finally it appeared that #4492 was likely related to #4644 [2] that was identified a bit later.
One of the condition to hit #4644 [2] is that some changes are large (several Mb). Do you know if it applies to your deployment.
[1] https://github.com/389ds/389-ds-base/issues/4492 [2] https://github.com/389ds/389-ds-base/issues/4644
regards thierry
Hi Thierry,
we built a new Version based on this commit, that includes the patch:
https://salsa.debian.org/freeipa-team/389-ds-base/-/commits/debian/1.4.4.16-...
However, when we enable MMR and update Passwords on a supplier the server segfaults:
audit log:
dn: uid=testuser@test.de,ou=People,dc=test,dc=de result: 0 changetype: modify replace: userPassword userPassword:: e0JDUllQVH0kMmIkMTYkLi4uLi4uLi4uLkEzTnRlREwxNi4uLkMuYlE4SWhsYXl OZ1BaeENGdUxHNnd0LlMuTUova3k= - replace: modifiersname modifiersname: uid=testuser@test.de,ou=people,dc=test,dc=de - replace: modifytimestamp modifytimestamp: 20230403221428Z -
time: 20230404001431 dn: uid=testuser@test.de,ou=People,dc=test,dc=de result: 0 changetype: modify replace: shadowLastChange shadowLastChange: 19450 -
time: 20230404001437 dn: cn=uniqueid generator,cn=config result: 0 changetype: modify replace: nsState nsState:: gESj62zS7QHVNtk3vH099gAAAAAAAAAA
Just before the server segfaults:
*[04/Apr/2023:00:14:34.906956463 +0200] - ERR - NSMMReplicationPlugin - IsValidOperation - NULL operation CSN** **[04/Apr/2023:00:14:35.078898697 +0200] - ERR - NSMMReplicationPlugin - write_changelog_and_ruv - Can't add a change for uid=testuser@test.de,ou=People,dc=test,dc=de (uniqid: (null), optype: 8) to changelog csn *
When the server is running with
$ /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-ldap02 -i /run/dirsrv/slapd-ldap02.pid -d 8200
[04/Apr/2023:01:57:46.794635411 +0200] - DEBUG - connection_table_dump_activity_to_errors_log - activity on 137r [04/Apr/2023:01:57:46.798242424 +0200] - DEBUG - handle_pr_read_ready - read activity on 137 [04/Apr/2023:01:57:46.801677697 +0200] - DEBUG - NSMMReplicationPlugin - release_replica - agmt="cn=168.119.133.209" (168:636): Successfully released consumer [04/Apr/2023:01:57:46.826126621 +0200] - DEBUG - NSMMReplicationPlugin - conn_start_linger -agmt="cn=168.119.133.209" (168:636) - Beginning linger on the connection [04/Apr/2023:01:57:46.829122095 +0200] - DEBUG - connection_check_activity_level - conn 136 activity level = 0 [04/Apr/2023:01:57:46.832174669 +0200] - DEBUG - handle_pr_read_ready - read activity on 135 [04/Apr/2023:01:57:46.835644233 +0200] - DEBUG - connection_read_operation - connection 137 waited 1 times for read to be ready [04/Apr/2023:01:57:46.842275810 +0200] - DEBUG - NSMMReplicationPlugin - conn_start_linger -agmt="cn=168.119.133.208" (168:636) - Beginning linger on the connection [04/Apr/2023:01:57:46.845508214 +0200] - DEBUG - NSMMReplicationPlugin - repl5_inc_run - agmt="cn=168.119.133.209" (168:636): State: sending_updates -> wait_for_changes [04/Apr/2023:01:57:46.848642908 +0200] - DEBUG - connection_make_readable_nolock - making readable conn 136 fd=84 [04/Apr/2023:01:57:46.851710732 +0200] - DEBUG - clear_signal - Listener got signaled [04/Apr/2023:01:57:46.872921861 +0200] - DEBUG - connection_read_operation - connection 135 read 128 bytes [04/Apr/2023:01:57:46.876223625 +0200] - DEBUG - connection_threadmain - conn 135 read operation successfully - thread_turbo_flag 0 more_data 0 ops_initiated 4 refcnt 2 flags 1 [04/Apr/2023:01:57:46.878961280 +0200] - DEBUG - connection_make_readable_nolock - making readable conn 135 fd=86 [04/Apr/2023:01:57:46.882027474 +0200] - DEBUG - connection_threadmain - conn 136 check more_data 0 thread_turbo_flag 0repl_conn_bef 0, repl_conn_now 0 [04/Apr/2023:01:57:46.895755108 +0200] - DEBUG - NSMMReplicationPlugin - repl5_inc_run - agmt="cn=168.119.133.208" (168:636): State: sending_updates -> wait_for_changes [04/Apr/2023:01:57:46.898725492 +0200] - DEBUG - connection_read_operation - connection 137 waited 2 times for read to be ready [04/Apr/2023:01:57:46.901590277 +0200] - DEBUG - connection_threadmain - conn 135 check more_data 0 thread_turbo_flag 0repl_conn_bef 0, repl_conn_now 0 [04/Apr/2023:01:57:46.904535831 +0200] - DEBUG - connection_read_operation - connection 137 read 69 bytes [04/Apr/2023:01:57:46.908875673 +0200] - DEBUG - connection_threadmain - conn 137 read operation successfully - thread_turbo_flag 0 more_data 0 ops_initiated 1 refcnt 2 flags 1 [04/Apr/2023:01:57:46.937355189 +0200] - DEBUG - clear_signal - Listener got signaled [04/Apr/2023:01:57:46.940642472 +0200] - DEBUG - connection_check_activity_level - conn 137 activity level = 0 [04/Apr/2023:01:57:46.943520957 +0200] - DEBUG - connection_table_dump_activity_to_errors_log - activity on 63r [04/Apr/2023:01:57:46.946436111 +0200] - DEBUG - connection_make_readable_nolock - making readable conn 137 fd=91 [04/Apr/2023:01:57:46.949539905 +0200] - DEBUG - connection_table_dump_activity_to_errors_log - activity on 136r [04/Apr/2023:01:57:46.971823503 +0200] - DEBUG - connection_threadmain - conn 137 check more_data 0 thread_turbo_flag 0repl_conn_bef 0, repl_conn_now 0 [04/Apr/2023:01:57:46.976851863 +0200] - DEBUG - handle_pr_read_ready - read activity on 136 [04/Apr/2023:01:57:46.980319317 +0200] - DEBUG - handle_pr_read_ready - read activity on 63 [04/Apr/2023:01:57:46.983570170 +0200] - DEBUG - connection_read_operation - connection 136 read 7 bytes [04/Apr/2023:01:57:46.992242074 +0200] - DEBUG - connection_read_operation - connection 63 read 157 bytes [04/Apr/2023:01:57:46.995318158 +0200] - DEBUG - connection_threadmain - conn 136 read operation successfully - thread_turbo_flag 0 more_data 0 ops_initiated 2 refcnt 2 flags 1 [04/Apr/2023:01:57:46.998488642 +0200] - DEBUG - connection_threadmain - conn 63 read operation successfully - thread_turbo_flag 0 more_data 0 ops_initiated 25 refcnt 2 flags 1 [04/Apr/2023:01:57:47.001869195 +0200] - DEBUG - connection_table_dump_activity_to_errors_log - activity on 137r [04/Apr/2023:01:57:47.022358186 +0200] - DEBUG - disconnect_server_nomutex_ext - Setting conn 136 fd=84 to be disconnected: reason -4991 [04/Apr/2023:01:57:47.026239889 +0200] - DEBUG - connection_check_activity_level - conn 63 activity level = 18 [04/Apr/2023:01:57:47.031562919 +0200] - DEBUG - handle_pr_read_ready - read activity on 137 [04/Apr/2023:01:57:47.056184893 +0200] - DEBUG - connection_threadmain - conn 136 check more_data 0 thread_turbo_flag 0repl_conn_bef 0, repl_conn_now 0 [04/Apr/2023:01:57:47.060182805 +0200] - DEBUG - connection_enter_leave_turbo - conn 63 turbo rank = 0 out of 28 conns [04/Apr/2023:01:57:47.068125600 +0200] - DEBUG - connection_read_operation - connection 137 read 77 bytes [04/Apr/2023:01:57:47.072979011 +0200] - DEBUG - clear_signal - Listener got signaled [04/Apr/2023:01:57:47.077046753 +0200] - DEBUG - connection_enter_leave_turbo - conn 63 entering turbo mode [04/Apr/2023:01:57:47.091249546 +0200] - DEBUG - connection_threadmain - conn 137 read operation successfully - thread_turbo_flag 0 more_data 0 ops_initiated 2 refcnt 2 flags 1 [04/Apr/2023:01:57:47.107753434 +0200] - DEBUG - connection_table_move_connection_out_of_active_list - Moved conn 84 out of active list and freed [04/Apr/2023:01:57:47.113208364 +0200] - DEBUG - connection_threadmain - conn 63 check more_data 0 thread_turbo_flag 1repl_conn_bef 583204874, repl_conn_now 583204874 [04/Apr/2023:01:57:47.128344235 +0200] - DEBUG - connection_make_readable_nolock - making readable conn 137 fd=91 [04/Apr/2023:01:57:47.131891528 +0200] - DEBUG - connection_read_operation - connection 63 read 125 bytes [04/Apr/2023:01:57:47.135508682 +0200] - DEBUG - connection_threadmain - conn 137 check more_data 0 thread_turbo_flag 0repl_conn_bef 0, repl_conn_now 0 [04/Apr/2023:01:57:47.153368147 +0200] - DEBUG - connection_threadmain - conn 63 read operation successfully - thread_turbo_flag 1 more_data 0 ops_initiated 26 refcnt 2 flags 1 [04/Apr/2023:01:57:47.171031014 +0200] - DEBUG - clear_signal - Listener got signaled [04/Apr/2023:01:57:47.200732177 +0200] - DEBUG - NSMMReplicationPlugin - purge_entry_state_information - From entry uid=testuser@test.de,ou=People,dc=test,dc=de up to CSN 64222d7c0001006a0000 [04/Apr/2023:01:57:47.208218783 +0200] - DEBUG - connection_table_dump_activity_to_errors_log - activity on 137r [04/Apr/2023:01:57:47.211990356 +0200] - DEBUG - handle_pr_read_ready - read activity on 137 [04/Apr/2023:01:57:47.220433720 +0200] - DEBUG - NSMMReplicationPlugin - write_changelog_and_ruv - Writing change for uid=testuser@test.de,ou=People,dc=test,dc=de (uniqid: (null), optype: 8) to changelog csn [04/Apr/2023:01:57:47.230227401 +0200] - DEBUG - connection_read_operation - connection 137 read 77 bytes [04/Apr/2023:01:57:47.242081278 +0200] - DEBUG - connection_threadmain - conn 137 read operation successfully - thread_turbo_flag 0 more_data 0 ops_initiated 3 refcnt 2 flags 1 [04/Apr/2023:01:57:47.250856202 +0200] - DEBUG - connection_make_readable_nolock - making readable conn 137 fd=91 [04/Apr/2023:01:57:47.255230563 +0200] - DEBUG - replication - multimaster_mmr_postop - error -1 for operation 561. [04/Apr/2023:01:57:47.263485397 +0200] - DEBUG - clear_signal - Listener got signaled Segmentation fault
Without replication it just runs fine... Any idea what is causing the
NSMMReplicationPlugin - write_changelog_and_ruv - Writing change for uid=testuser@test.de,ou=People,dc=test,dc=de (uniqid: (null), optype: 8) to changelog csn
?
br
Harald
On 28.03.23 15:49, Thierry Bordaz wrote:
Hi Harald,
I agree a probable cause of your message is #4644. THe impact of the bug varies depending on the size of the changelog and how frequent you have large updates. The bug slowdown replication but eventually replication should succeed. If large updates are rare, then you should not hit it that frequently.
regards thierry
On 3/28/23 15:40, Harald Strack wrote:
Hi Thierry,
thanks for digging this out! We use Version 1.4.4.11-2 on debian bullseye:
https://packages.debian.org/bullseye/389-ds
appearently the commits from #4644 are missing there, just one month or so :-(
https://salsa.debian.org/freeipa-team/389-ds-base/-/commits/debian/1.4.4.11-...
Seems that we have to upgrade and build our own packages? ... or run it all in podman with rocky linux. Any suggestions?
br
Harald
On 28.03.23 14:25, Thierry Bordaz wrote:
Hi Harald,
What version of 389ds are you running ?
This message was introduced with #4492[1] and caught a weird condition where a replication agreement was loading changes that are older of the starting point of the replication. Finally it appeared that #4492 was likely related to #4644 [2] that was identified a bit later.
One of the condition to hit #4644 [2] is that some changes are large (several Mb). Do you know if it applies to your deployment.
[1] https://github.com/389ds/389-ds-base/issues/4492 [2] https://github.com/389ds/389-ds-base/issues/4644
regards thierry
Hi *,
we digged deeper into the issue: and the problem is a null pointer here, during replication:
#0 csn_get_replicaid (csn=csn@entry=0x0) at ../ldap/servers/slapd/csn.c:167 #1 0x00007ffff42b2f09 in ruv_cancel_csn_inprogress (repl=repl@entry=0x5555578a2950, ruv=0x55555789dcc0, csn=csn@entry=0x0, local_rid=local_rid@entry=104) at ../ldap/servers/plugins/replication/repl5_ruv.c:1594 #2 0x00007ffff429cb10 in cancel_opcsn (pb=pb@entry=0x7fff70000b60) at ../ldap/servers/plugins/replication/repl5_plugins.c:1322 #3 0x00007ffff429d5a8 in write_changelog_and_ruv (pb=0x7fff70000b60) at ../ldap/servers/plugins/replication/repl5_plugins.c:983 #4 0x00007ffff429e84d in multimaster_mmr_postop (pb=0x7fff70000b60, flags=561) at ../ldap/servers/plugins/replication/repl5_plugins.c:618 #5 0x00007ffff7f2820d in plugin_call_mmr_plugin_postop (pb=pb@entry=0x7fff70000b60, e=e@entry=0x0, flags=flags@entry=561) at ../ldap/servers/slapd/plugin_mmr.c:65 #6 0x00007ffff438450c in ldbm_back_modify (pb=0x7fff70000b60) at ../ldap/servers/slapd/back-ldbm/ldbm_modify.c:1089 #7 0x00007ffff7f108ca in op_shared_modify (pb=pb@entry=0x7fff70000b60, pw_change=pw_change@entry=0, old_pw=0x0) at ../ldap/servers/slapd/modify.c:1022 #8 0x00007ffff7f117bd in do_modify (pb=pb@entry=0x7fff70000b60) at ../ldap/servers/slapd/modify.c:377 #9 0x000055555556de9b in connection_dispatch_operation (pb=0x7fff70000b60, op=<optimized out>, conn=<optimized out>) at ../ldap/servers/slapd/connection.c:635 #10 connection_threadmain () at ../ldap/servers/slapd/connection.c:1785 #11 0x00007ffff7c11941 in ??? () at /lib/x86_64-linux-gnu/libnspr4.so
and we could reproduce the problem whenever we used an user with an ACI that had an IP restriction.
Finally the commit message of this bug
https://github.com/389ds/389-ds-base/pull/4799
/Issue 4797 - ACL IP ADDRESS evaluation may corrupt c_isreplication_session connection flags (#4799)/
gave us the hint to fix this. We patched the build for debian
https://salsa.debian.org/freeipa-team/389-ds-base/-/commits/debian/1.4.4.16-...
with the commit
https://github.com/389ds/389-ds-base/pull/4799/commits/3e022260de8072459e073...
and now everything works as espected.
Thanks for your work!
On 04.04.23 02:31, Harald Strack wrote:
Hi Thierry,
we built a new Version based on this commit, that includes the patch:
https://salsa.debian.org/freeipa-team/389-ds-base/-/commits/debian/1.4.4.16-...
However, when we enable MMR and update Passwords on a supplier the server segfaults:
audit log:
dn: uid=testuser@test.de,ou=People,dc=test,dc=de result: 0 changetype: modify replace: userPassword userPassword:: e0JDUllQVH0kMmIkMTYkLi4uLi4uLi4uLkEzTnRlREwxNi4uLkMuYlE4SWhsYXl OZ1BaeENGdUxHNnd0LlMuTUova3k=
replace: modifiersname modifiersname: uid=testuser@test.de,ou=people,dc=test,dc=de
replace: modifytimestamp modifytimestamp: 20230403221428Z
time: 20230404001431 dn: uid=testuser@test.de,ou=People,dc=test,dc=de result: 0 changetype: modify replace: shadowLastChange shadowLastChange: 19450
time: 20230404001437 dn: cn=uniqueid generator,cn=config result: 0 changetype: modify replace: nsState nsState:: gESj62zS7QHVNtk3vH099gAAAAAAAAAA
Just before the server segfaults:
*[04/Apr/2023:00:14:34.906956463 +0200] - ERR - NSMMReplicationPlugin
- IsValidOperation - NULL operation CSN**
**[04/Apr/2023:00:14:35.078898697 +0200] - ERR - NSMMReplicationPlugin
- write_changelog_and_ruv - Can't add a change for
uid=testuser@test.de,ou=People,dc=test,dc=de (uniqid: (null), optype: 8) to changelog csn *
When the server is running with
$ /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-ldap02 -i /run/dirsrv/slapd-ldap02.pid -d 8200
[04/Apr/2023:01:57:46.794635411 +0200] - DEBUG - connection_table_dump_activity_to_errors_log - activity on 137r [04/Apr/2023:01:57:46.798242424 +0200] - DEBUG - handle_pr_read_ready
- read activity on 137
[04/Apr/2023:01:57:46.801677697 +0200] - DEBUG - NSMMReplicationPlugin
- release_replica - agmt="cn=168.119.133.209" (168:636): Successfully
released consumer [04/Apr/2023:01:57:46.826126621 +0200] - DEBUG - NSMMReplicationPlugin
- conn_start_linger -agmt="cn=168.119.133.209" (168:636) - Beginning
linger on the connection [04/Apr/2023:01:57:46.829122095 +0200] - DEBUG - connection_check_activity_level - conn 136 activity level = 0 [04/Apr/2023:01:57:46.832174669 +0200] - DEBUG - handle_pr_read_ready
- read activity on 135
[04/Apr/2023:01:57:46.835644233 +0200] - DEBUG - connection_read_operation - connection 137 waited 1 times for read to be ready [04/Apr/2023:01:57:46.842275810 +0200] - DEBUG - NSMMReplicationPlugin
- conn_start_linger -agmt="cn=168.119.133.208" (168:636) - Beginning
linger on the connection [04/Apr/2023:01:57:46.845508214 +0200] - DEBUG - NSMMReplicationPlugin
- repl5_inc_run - agmt="cn=168.119.133.209" (168:636): State:
sending_updates -> wait_for_changes [04/Apr/2023:01:57:46.848642908 +0200] - DEBUG - connection_make_readable_nolock - making readable conn 136 fd=84 [04/Apr/2023:01:57:46.851710732 +0200] - DEBUG - clear_signal - Listener got signaled [04/Apr/2023:01:57:46.872921861 +0200] - DEBUG - connection_read_operation - connection 135 read 128 bytes [04/Apr/2023:01:57:46.876223625 +0200] - DEBUG - connection_threadmain
- conn 135 read operation successfully - thread_turbo_flag 0 more_data
0 ops_initiated 4 refcnt 2 flags 1 [04/Apr/2023:01:57:46.878961280 +0200] - DEBUG - connection_make_readable_nolock - making readable conn 135 fd=86 [04/Apr/2023:01:57:46.882027474 +0200] - DEBUG - connection_threadmain
- conn 136 check more_data 0 thread_turbo_flag 0repl_conn_bef 0,
repl_conn_now 0 [04/Apr/2023:01:57:46.895755108 +0200] - DEBUG - NSMMReplicationPlugin
- repl5_inc_run - agmt="cn=168.119.133.208" (168:636): State:
sending_updates -> wait_for_changes [04/Apr/2023:01:57:46.898725492 +0200] - DEBUG - connection_read_operation - connection 137 waited 2 times for read to be ready [04/Apr/2023:01:57:46.901590277 +0200] - DEBUG - connection_threadmain
- conn 135 check more_data 0 thread_turbo_flag 0repl_conn_bef 0,
repl_conn_now 0 [04/Apr/2023:01:57:46.904535831 +0200] - DEBUG - connection_read_operation - connection 137 read 69 bytes [04/Apr/2023:01:57:46.908875673 +0200] - DEBUG - connection_threadmain
- conn 137 read operation successfully - thread_turbo_flag 0 more_data
0 ops_initiated 1 refcnt 2 flags 1 [04/Apr/2023:01:57:46.937355189 +0200] - DEBUG - clear_signal - Listener got signaled [04/Apr/2023:01:57:46.940642472 +0200] - DEBUG - connection_check_activity_level - conn 137 activity level = 0 [04/Apr/2023:01:57:46.943520957 +0200] - DEBUG - connection_table_dump_activity_to_errors_log - activity on 63r [04/Apr/2023:01:57:46.946436111 +0200] - DEBUG - connection_make_readable_nolock - making readable conn 137 fd=91 [04/Apr/2023:01:57:46.949539905 +0200] - DEBUG - connection_table_dump_activity_to_errors_log - activity on 136r [04/Apr/2023:01:57:46.971823503 +0200] - DEBUG - connection_threadmain
- conn 137 check more_data 0 thread_turbo_flag 0repl_conn_bef 0,
repl_conn_now 0 [04/Apr/2023:01:57:46.976851863 +0200] - DEBUG - handle_pr_read_ready
- read activity on 136
[04/Apr/2023:01:57:46.980319317 +0200] - DEBUG - handle_pr_read_ready
- read activity on 63
[04/Apr/2023:01:57:46.983570170 +0200] - DEBUG - connection_read_operation - connection 136 read 7 bytes [04/Apr/2023:01:57:46.992242074 +0200] - DEBUG - connection_read_operation - connection 63 read 157 bytes [04/Apr/2023:01:57:46.995318158 +0200] - DEBUG - connection_threadmain
- conn 136 read operation successfully - thread_turbo_flag 0 more_data
0 ops_initiated 2 refcnt 2 flags 1 [04/Apr/2023:01:57:46.998488642 +0200] - DEBUG - connection_threadmain
- conn 63 read operation successfully - thread_turbo_flag 0 more_data
0 ops_initiated 25 refcnt 2 flags 1 [04/Apr/2023:01:57:47.001869195 +0200] - DEBUG - connection_table_dump_activity_to_errors_log - activity on 137r [04/Apr/2023:01:57:47.022358186 +0200] - DEBUG - disconnect_server_nomutex_ext - Setting conn 136 fd=84 to be disconnected: reason -4991 [04/Apr/2023:01:57:47.026239889 +0200] - DEBUG - connection_check_activity_level - conn 63 activity level = 18 [04/Apr/2023:01:57:47.031562919 +0200] - DEBUG - handle_pr_read_ready
- read activity on 137
[04/Apr/2023:01:57:47.056184893 +0200] - DEBUG - connection_threadmain
- conn 136 check more_data 0 thread_turbo_flag 0repl_conn_bef 0,
repl_conn_now 0 [04/Apr/2023:01:57:47.060182805 +0200] - DEBUG - connection_enter_leave_turbo - conn 63 turbo rank = 0 out of 28 conns [04/Apr/2023:01:57:47.068125600 +0200] - DEBUG - connection_read_operation - connection 137 read 77 bytes [04/Apr/2023:01:57:47.072979011 +0200] - DEBUG - clear_signal - Listener got signaled [04/Apr/2023:01:57:47.077046753 +0200] - DEBUG - connection_enter_leave_turbo - conn 63 entering turbo mode [04/Apr/2023:01:57:47.091249546 +0200] - DEBUG - connection_threadmain
- conn 137 read operation successfully - thread_turbo_flag 0 more_data
0 ops_initiated 2 refcnt 2 flags 1 [04/Apr/2023:01:57:47.107753434 +0200] - DEBUG - connection_table_move_connection_out_of_active_list - Moved conn 84 out of active list and freed [04/Apr/2023:01:57:47.113208364 +0200] - DEBUG - connection_threadmain
- conn 63 check more_data 0 thread_turbo_flag 1repl_conn_bef
583204874, repl_conn_now 583204874 [04/Apr/2023:01:57:47.128344235 +0200] - DEBUG - connection_make_readable_nolock - making readable conn 137 fd=91 [04/Apr/2023:01:57:47.131891528 +0200] - DEBUG - connection_read_operation - connection 63 read 125 bytes [04/Apr/2023:01:57:47.135508682 +0200] - DEBUG - connection_threadmain
- conn 137 check more_data 0 thread_turbo_flag 0repl_conn_bef 0,
repl_conn_now 0 [04/Apr/2023:01:57:47.153368147 +0200] - DEBUG - connection_threadmain
- conn 63 read operation successfully - thread_turbo_flag 1 more_data
0 ops_initiated 26 refcnt 2 flags 1 [04/Apr/2023:01:57:47.171031014 +0200] - DEBUG - clear_signal - Listener got signaled [04/Apr/2023:01:57:47.200732177 +0200] - DEBUG - NSMMReplicationPlugin
- purge_entry_state_information - From entry
uid=testuser@test.de,ou=People,dc=test,dc=de up to CSN 64222d7c0001006a0000 [04/Apr/2023:01:57:47.208218783 +0200] - DEBUG - connection_table_dump_activity_to_errors_log - activity on 137r [04/Apr/2023:01:57:47.211990356 +0200] - DEBUG - handle_pr_read_ready
- read activity on 137
[04/Apr/2023:01:57:47.220433720 +0200] - DEBUG - NSMMReplicationPlugin
- write_changelog_and_ruv - Writing change for
uid=testuser@test.de,ou=People,dc=test,dc=de (uniqid: (null), optype: 8) to changelog csn [04/Apr/2023:01:57:47.230227401 +0200] - DEBUG - connection_read_operation - connection 137 read 77 bytes [04/Apr/2023:01:57:47.242081278 +0200] - DEBUG - connection_threadmain
- conn 137 read operation successfully - thread_turbo_flag 0 more_data
0 ops_initiated 3 refcnt 2 flags 1 [04/Apr/2023:01:57:47.250856202 +0200] - DEBUG - connection_make_readable_nolock - making readable conn 137 fd=91 [04/Apr/2023:01:57:47.255230563 +0200] - DEBUG - replication - multimaster_mmr_postop - error -1 for operation 561. [04/Apr/2023:01:57:47.263485397 +0200] - DEBUG - clear_signal - Listener got signaled Segmentation fault
Without replication it just runs fine... Any idea what is causing the
NSMMReplicationPlugin - write_changelog_and_ruv - Writing change for uid=testuser@test.de,ou=People,dc=test,dc=de (uniqid: (null), optype: 8) to changelog csn
?
br
Harald
On 28.03.23 15:49, Thierry Bordaz wrote:
Hi Harald,
I agree a probable cause of your message is #4644. THe impact of the bug varies depending on the size of the changelog and how frequent you have large updates. The bug slowdown replication but eventually replication should succeed. If large updates are rare, then you should not hit it that frequently.
regards thierry
On 3/28/23 15:40, Harald Strack wrote:
Hi Thierry,
thanks for digging this out! We use Version 1.4.4.11-2 on debian bullseye:
https://packages.debian.org/bullseye/389-ds
appearently the commits from #4644 are missing there, just one month or so :-(
https://salsa.debian.org/freeipa-team/389-ds-base/-/commits/debian/1.4.4.11-...
Seems that we have to upgrade and build our own packages? ... or run it all in podman with rocky linux. Any suggestions?
br
Harald
On 28.03.23 14:25, Thierry Bordaz wrote:
Hi Harald,
What version of 389ds are you running ?
This message was introduced with #4492[1] and caught a weird condition where a replication agreement was loading changes that are older of the starting point of the replication. Finally it appeared that #4492 was likely related to #4644 [2] that was identified a bit later.
One of the condition to hit #4644 [2] is that some changes are large (several Mb). Do you know if it applies to your deployment.
[1] https://github.com/389ds/389-ds-base/issues/4492 [2] https://github.com/389ds/389-ds-base/issues/4644
regards thierry
-- Harald Strack Geschäftsführer
ssystems GmbH Kastanienallee 74 10435 Berlin
389-devel mailing list --389-devel@lists.fedoraproject.org To unsubscribe send an email to389-devel-leave@lists.fedoraproject.org Fedora Code of Conduct:https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines:https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.... Do not reply to spam, report it:https://pagure.io/fedora-infrastructure/new_issue
Nice catch !!
I remember now this patch that fixed a regression I introduced myself. Indeed it can be challenging to cherry-pick specific PR as there may be others dependencies.
regards thierry
On 4/4/23 21:25, Harald Strack wrote:
Hi *,
we digged deeper into the issue: and the problem is a null pointer here, during replication:
#0 csn_get_replicaid (csn=csn@entry=0x0) at ../ldap/servers/slapd/csn.c:167 #1 0x00007ffff42b2f09 in ruv_cancel_csn_inprogress (repl=repl@entry=0x5555578a2950, ruv=0x55555789dcc0, csn=csn@entry=0x0, local_rid=local_rid@entry=104) at ../ldap/servers/plugins/replication/repl5_ruv.c:1594 #2 0x00007ffff429cb10 in cancel_opcsn (pb=pb@entry=0x7fff70000b60) at ../ldap/servers/plugins/replication/repl5_plugins.c:1322 #3 0x00007ffff429d5a8 in write_changelog_and_ruv (pb=0x7fff70000b60) at ../ldap/servers/plugins/replication/repl5_plugins.c:983 #4 0x00007ffff429e84d in multimaster_mmr_postop (pb=0x7fff70000b60, flags=561) at ../ldap/servers/plugins/replication/repl5_plugins.c:618 #5 0x00007ffff7f2820d in plugin_call_mmr_plugin_postop (pb=pb@entry=0x7fff70000b60, e=e@entry=0x0, flags=flags@entry=561) at ../ldap/servers/slapd/plugin_mmr.c:65 #6 0x00007ffff438450c in ldbm_back_modify (pb=0x7fff70000b60) at ../ldap/servers/slapd/back-ldbm/ldbm_modify.c:1089 #7 0x00007ffff7f108ca in op_shared_modify (pb=pb@entry=0x7fff70000b60, pw_change=pw_change@entry=0, old_pw=0x0) at ../ldap/servers/slapd/modify.c:1022 #8 0x00007ffff7f117bd in do_modify (pb=pb@entry=0x7fff70000b60) at ../ldap/servers/slapd/modify.c:377 #9 0x000055555556de9b in connection_dispatch_operation (pb=0x7fff70000b60, op=<optimized out>, conn=<optimized out>) at ../ldap/servers/slapd/connection.c:635 #10 connection_threadmain () at ../ldap/servers/slapd/connection.c:1785 #11 0x00007ffff7c11941 in ??? () at /lib/x86_64-linux-gnu/libnspr4.so
and we could reproduce the problem whenever we used an user with an ACI that had an IP restriction.
Finally the commit message of this bug
https://github.com/389ds/389-ds-base/pull/4799
/Issue 4797 - ACL IP ADDRESS evaluation may corrupt c_isreplication_session connection flags (#4799)/
gave us the hint to fix this. We patched the build for debian
https://salsa.debian.org/freeipa-team/389-ds-base/-/commits/debian/1.4.4.16-...
with the commit
https://github.com/389ds/389-ds-base/pull/4799/commits/3e022260de8072459e073...
and now everything works as espected.
Thanks for your work!
On 04.04.23 02:31, Harald Strack wrote:
Hi Thierry,
we built a new Version based on this commit, that includes the patch:
https://salsa.debian.org/freeipa-team/389-ds-base/-/commits/debian/1.4.4.16-...
However, when we enable MMR and update Passwords on a supplier the server segfaults:
audit log:
dn: uid=testuser@test.de,ou=People,dc=test,dc=de result: 0 changetype: modify replace: userPassword userPassword:: e0JDUllQVH0kMmIkMTYkLi4uLi4uLi4uLkEzTnRlREwxNi4uLkMuYlE4SWhsYXl OZ1BaeENGdUxHNnd0LlMuTUova3k=
replace: modifiersname modifiersname: uid=testuser@test.de,ou=people,dc=test,dc=de
replace: modifytimestamp modifytimestamp: 20230403221428Z
time: 20230404001431 dn: uid=testuser@test.de,ou=People,dc=test,dc=de result: 0 changetype: modify replace: shadowLastChange shadowLastChange: 19450
time: 20230404001437 dn: cn=uniqueid generator,cn=config result: 0 changetype: modify replace: nsState nsState:: gESj62zS7QHVNtk3vH099gAAAAAAAAAA
Just before the server segfaults:
*[04/Apr/2023:00:14:34.906956463 +0200] - ERR - NSMMReplicationPlugin
- IsValidOperation - NULL operation CSN**
**[04/Apr/2023:00:14:35.078898697 +0200] - ERR - NSMMReplicationPlugin - write_changelog_and_ruv - Can't add a change for uid=testuser@test.de,ou=People,dc=test,dc=de (uniqid: (null), optype: 8) to changelog csn *
When the server is running with
$ /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-ldap02 -i /run/dirsrv/slapd-ldap02.pid -d 8200
[04/Apr/2023:01:57:46.794635411 +0200] - DEBUG - connection_table_dump_activity_to_errors_log - activity on 137r [04/Apr/2023:01:57:46.798242424 +0200] - DEBUG - handle_pr_read_ready
- read activity on 137
[04/Apr/2023:01:57:46.801677697 +0200] - DEBUG - NSMMReplicationPlugin - release_replica - agmt="cn=168.119.133.209" (168:636): Successfully released consumer [04/Apr/2023:01:57:46.826126621 +0200] - DEBUG - NSMMReplicationPlugin - conn_start_linger -agmt="cn=168.119.133.209" (168:636) - Beginning linger on the connection [04/Apr/2023:01:57:46.829122095 +0200] - DEBUG - connection_check_activity_level - conn 136 activity level = 0 [04/Apr/2023:01:57:46.832174669 +0200] - DEBUG - handle_pr_read_ready
- read activity on 135
[04/Apr/2023:01:57:46.835644233 +0200] - DEBUG - connection_read_operation - connection 137 waited 1 times for read to be ready [04/Apr/2023:01:57:46.842275810 +0200] - DEBUG - NSMMReplicationPlugin - conn_start_linger -agmt="cn=168.119.133.208" (168:636) - Beginning linger on the connection [04/Apr/2023:01:57:46.845508214 +0200] - DEBUG - NSMMReplicationPlugin - repl5_inc_run - agmt="cn=168.119.133.209" (168:636): State: sending_updates -> wait_for_changes [04/Apr/2023:01:57:46.848642908 +0200] - DEBUG - connection_make_readable_nolock - making readable conn 136 fd=84 [04/Apr/2023:01:57:46.851710732 +0200] - DEBUG - clear_signal - Listener got signaled [04/Apr/2023:01:57:46.872921861 +0200] - DEBUG - connection_read_operation - connection 135 read 128 bytes [04/Apr/2023:01:57:46.876223625 +0200] - DEBUG - connection_threadmain - conn 135 read operation successfully - thread_turbo_flag 0 more_data 0 ops_initiated 4 refcnt 2 flags 1 [04/Apr/2023:01:57:46.878961280 +0200] - DEBUG - connection_make_readable_nolock - making readable conn 135 fd=86 [04/Apr/2023:01:57:46.882027474 +0200] - DEBUG - connection_threadmain - conn 136 check more_data 0 thread_turbo_flag 0repl_conn_bef 0, repl_conn_now 0 [04/Apr/2023:01:57:46.895755108 +0200] - DEBUG - NSMMReplicationPlugin - repl5_inc_run - agmt="cn=168.119.133.208" (168:636): State: sending_updates -> wait_for_changes [04/Apr/2023:01:57:46.898725492 +0200] - DEBUG - connection_read_operation - connection 137 waited 2 times for read to be ready [04/Apr/2023:01:57:46.901590277 +0200] - DEBUG - connection_threadmain - conn 135 check more_data 0 thread_turbo_flag 0repl_conn_bef 0, repl_conn_now 0 [04/Apr/2023:01:57:46.904535831 +0200] - DEBUG - connection_read_operation - connection 137 read 69 bytes [04/Apr/2023:01:57:46.908875673 +0200] - DEBUG - connection_threadmain - conn 137 read operation successfully - thread_turbo_flag 0 more_data 0 ops_initiated 1 refcnt 2 flags 1 [04/Apr/2023:01:57:46.937355189 +0200] - DEBUG - clear_signal - Listener got signaled [04/Apr/2023:01:57:46.940642472 +0200] - DEBUG - connection_check_activity_level - conn 137 activity level = 0 [04/Apr/2023:01:57:46.943520957 +0200] - DEBUG - connection_table_dump_activity_to_errors_log - activity on 63r [04/Apr/2023:01:57:46.946436111 +0200] - DEBUG - connection_make_readable_nolock - making readable conn 137 fd=91 [04/Apr/2023:01:57:46.949539905 +0200] - DEBUG - connection_table_dump_activity_to_errors_log - activity on 136r [04/Apr/2023:01:57:46.971823503 +0200] - DEBUG - connection_threadmain - conn 137 check more_data 0 thread_turbo_flag 0repl_conn_bef 0, repl_conn_now 0 [04/Apr/2023:01:57:46.976851863 +0200] - DEBUG - handle_pr_read_ready
- read activity on 136
[04/Apr/2023:01:57:46.980319317 +0200] - DEBUG - handle_pr_read_ready
- read activity on 63
[04/Apr/2023:01:57:46.983570170 +0200] - DEBUG - connection_read_operation - connection 136 read 7 bytes [04/Apr/2023:01:57:46.992242074 +0200] - DEBUG - connection_read_operation - connection 63 read 157 bytes [04/Apr/2023:01:57:46.995318158 +0200] - DEBUG - connection_threadmain - conn 136 read operation successfully - thread_turbo_flag 0 more_data 0 ops_initiated 2 refcnt 2 flags 1 [04/Apr/2023:01:57:46.998488642 +0200] - DEBUG - connection_threadmain - conn 63 read operation successfully - thread_turbo_flag 0 more_data 0 ops_initiated 25 refcnt 2 flags 1 [04/Apr/2023:01:57:47.001869195 +0200] - DEBUG - connection_table_dump_activity_to_errors_log - activity on 137r [04/Apr/2023:01:57:47.022358186 +0200] - DEBUG - disconnect_server_nomutex_ext - Setting conn 136 fd=84 to be disconnected: reason -4991 [04/Apr/2023:01:57:47.026239889 +0200] - DEBUG - connection_check_activity_level - conn 63 activity level = 18 [04/Apr/2023:01:57:47.031562919 +0200] - DEBUG - handle_pr_read_ready
- read activity on 137
[04/Apr/2023:01:57:47.056184893 +0200] - DEBUG - connection_threadmain - conn 136 check more_data 0 thread_turbo_flag 0repl_conn_bef 0, repl_conn_now 0 [04/Apr/2023:01:57:47.060182805 +0200] - DEBUG - connection_enter_leave_turbo - conn 63 turbo rank = 0 out of 28 conns [04/Apr/2023:01:57:47.068125600 +0200] - DEBUG - connection_read_operation - connection 137 read 77 bytes [04/Apr/2023:01:57:47.072979011 +0200] - DEBUG - clear_signal - Listener got signaled [04/Apr/2023:01:57:47.077046753 +0200] - DEBUG - connection_enter_leave_turbo - conn 63 entering turbo mode [04/Apr/2023:01:57:47.091249546 +0200] - DEBUG - connection_threadmain - conn 137 read operation successfully - thread_turbo_flag 0 more_data 0 ops_initiated 2 refcnt 2 flags 1 [04/Apr/2023:01:57:47.107753434 +0200] - DEBUG - connection_table_move_connection_out_of_active_list - Moved conn 84 out of active list and freed [04/Apr/2023:01:57:47.113208364 +0200] - DEBUG - connection_threadmain - conn 63 check more_data 0 thread_turbo_flag 1repl_conn_bef 583204874, repl_conn_now 583204874 [04/Apr/2023:01:57:47.128344235 +0200] - DEBUG - connection_make_readable_nolock - making readable conn 137 fd=91 [04/Apr/2023:01:57:47.131891528 +0200] - DEBUG - connection_read_operation - connection 63 read 125 bytes [04/Apr/2023:01:57:47.135508682 +0200] - DEBUG - connection_threadmain - conn 137 check more_data 0 thread_turbo_flag 0repl_conn_bef 0, repl_conn_now 0 [04/Apr/2023:01:57:47.153368147 +0200] - DEBUG - connection_threadmain - conn 63 read operation successfully - thread_turbo_flag 1 more_data 0 ops_initiated 26 refcnt 2 flags 1 [04/Apr/2023:01:57:47.171031014 +0200] - DEBUG - clear_signal - Listener got signaled [04/Apr/2023:01:57:47.200732177 +0200] - DEBUG - NSMMReplicationPlugin - purge_entry_state_information - From entry uid=testuser@test.de,ou=People,dc=test,dc=de up to CSN 64222d7c0001006a0000 [04/Apr/2023:01:57:47.208218783 +0200] - DEBUG - connection_table_dump_activity_to_errors_log - activity on 137r [04/Apr/2023:01:57:47.211990356 +0200] - DEBUG - handle_pr_read_ready
- read activity on 137
[04/Apr/2023:01:57:47.220433720 +0200] - DEBUG - NSMMReplicationPlugin - write_changelog_and_ruv - Writing change for uid=testuser@test.de,ou=People,dc=test,dc=de (uniqid: (null), optype: 8) to changelog csn [04/Apr/2023:01:57:47.230227401 +0200] - DEBUG - connection_read_operation - connection 137 read 77 bytes [04/Apr/2023:01:57:47.242081278 +0200] - DEBUG - connection_threadmain - conn 137 read operation successfully - thread_turbo_flag 0 more_data 0 ops_initiated 3 refcnt 2 flags 1 [04/Apr/2023:01:57:47.250856202 +0200] - DEBUG - connection_make_readable_nolock - making readable conn 137 fd=91 [04/Apr/2023:01:57:47.255230563 +0200] - DEBUG - replication - multimaster_mmr_postop - error -1 for operation 561. [04/Apr/2023:01:57:47.263485397 +0200] - DEBUG - clear_signal - Listener got signaled Segmentation fault
Without replication it just runs fine... Any idea what is causing the
NSMMReplicationPlugin - write_changelog_and_ruv - Writing change for uid=testuser@test.de,ou=People,dc=test,dc=de (uniqid: (null), optype: 8) to changelog csn
?
br
Harald
On 28.03.23 15:49, Thierry Bordaz wrote:
Hi Harald,
I agree a probable cause of your message is #4644. THe impact of the bug varies depending on the size of the changelog and how frequent you have large updates. The bug slowdown replication but eventually replication should succeed. If large updates are rare, then you should not hit it that frequently.
regards thierry
On 3/28/23 15:40, Harald Strack wrote:
Hi Thierry,
thanks for digging this out! We use Version 1.4.4.11-2 on debian bullseye:
https://packages.debian.org/bullseye/389-ds
appearently the commits from #4644 are missing there, just one month or so :-(
https://salsa.debian.org/freeipa-team/389-ds-base/-/commits/debian/1.4.4.11-...
Seems that we have to upgrade and build our own packages? ... or run it all in podman with rocky linux. Any suggestions?
br
Harald
On 28.03.23 14:25, Thierry Bordaz wrote:
Hi Harald,
What version of 389ds are you running ?
This message was introduced with #4492[1] and caught a weird condition where a replication agreement was loading changes that are older of the starting point of the replication. Finally it appeared that #4492 was likely related to #4644 [2] that was identified a bit later.
One of the condition to hit #4644 [2] is that some changes are large (several Mb). Do you know if it applies to your deployment.
[1] https://github.com/389ds/389-ds-base/issues/4492 [2] https://github.com/389ds/389-ds-base/issues/4644
regards thierry
-- Harald Strack Geschäftsführer
ssystems GmbH Kastanienallee 74 10435 Berlin
389-devel mailing list --389-devel@lists.fedoraproject.org To unsubscribe send an email to389-devel-leave@lists.fedoraproject.org Fedora Code of Conduct:https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines:https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.... Do not reply to spam, report it:https://pagure.io/fedora-infrastructure/new_issue
-- Harald Strack Geschäftsführer
ssystems GmbH Kastanienallee 74 10435 Berlin
389-devel mailing list --389-devel@lists.fedoraproject.org To unsubscribe send an email to389-devel-leave@lists.fedoraproject.org Fedora Code of Conduct:https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines:https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.... Do not reply to spam, report it:https://pagure.io/fedora-infrastructure/new_issue
389-devel@lists.fedoraproject.org