On 13-05-2021 08:32, Thierry Bordaz wrote:
On 5/12/21 8:41 PM, Kees Bakker wrote:
> On 12-05-2021 19:44, Thierry Bordaz wrote:
>> On 5/12/21 4:55 PM, Kees Bakker wrote:
>>> Hi Thierry,
>>>
>>> Just to be clear, changelogmaxage was changed to -1 by me after the
>>> upgrade and I've
>>> confirmed it is now set to -1.
>>>
>>> The reason for me to change the value was because of the deadlock.
>>> Apparently, it did not make much of a difference. It still gets into a
>>> deadlock
>>> with the value -1.
>>
>>
>> Did you set nsslapd-changelogmaxage=-1 in the retroCL config entry ?
>
> I've used this input file for ldapmodify
>
> [root@linge ~]# cat change-nsslapd-changelogmaxage.txt
> dn: cn=Retro Changelog Plugin,cn=plugins,cn=config
> changetype: modify
> replace: nsslapd-changelogmaxage
> nsslapd-changelogmaxage: -1
>
> After that
>
> [root@linge ~]# ldapsearch -H
> ldapi://%2fvar%2frun%2fslapd-GHS-NL.socket -LLL -o ldif-wrap=no -b
> 'cn=Retro Changelog Plugin,cn=plugins,cn=config'
> SASL/EXTERNAL authentication started
> SASL username: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth
> SASL SSF: 0
> dn: cn=Retro Changelog Plugin,cn=plugins,cn=config
> cn: Retro Changelog Plugin
> nsslapd-attribute: nsuniqueid:targetUniqueId
> nsslapd-changelogmaxage: -1
> nsslapd-include-suffix: cn=dns,dc=ghs,dc=nl
> nsslapd-plugin-depends-on-named: Class of Service
> nsslapd-plugin-depends-on-type: database
> nsslapd-pluginDescription: Retrocl Plugin
> nsslapd-pluginEnabled: on
> nsslapd-pluginId: retrocl
> nsslapd-pluginInitfunc: retrocl_plugin_init
> nsslapd-pluginPath: libretrocl-plugin
> nsslapd-pluginType: object
> nsslapd-pluginVendor: 389 Project
> nsslapd-pluginVersion: 1.3.10.2
> nsslapd-pluginbetxn: on
> nsslapd-pluginprecedence: 25
> objectClass: top
> objectClass: nsSlapdPlugin
> objectClass: extensibleObject
You did the right setting. I do not understand why thread 2 is trimming
records with that setting.
>
>
>>
>>
>>>
>>> BTW. I've sent the whole stacktrace directly to you to avoid the 4000+
>>> lines to
>>> this mailing list.
>>>
>>> Here is the stack trace of one of the threads. The one that hangs in
>>> trim_changelog
>>
>>
>> Something weird is that the same hang occurs
>>
https://bugzilla.redhat.com/show_bug.cgi?id=1751295.
>> If I am correct it is between thread 14th and 2nd.
>>
>> As you are running with a fixed version (> slapi-nis-0.56.5), that means
>> the fix is incomplete. It would require to debug a dump core to know why
>> the fix fails.
>
> [root@linge ~]# rpm -qa slapi\*
> slapi-nis-0.56.5-3.el7_9.x86_64
>
> So, that is not > 0.56.5, but it is the newest available for CentOS7. No?
reading the 'fixed version' in the bug says it if fixed starting 0.56.5
I do not know if it match centOS7 versions
>
> At the moment I killed it and restarted. No deadlock yet, but the
> system is extreemly
> slow. And there is just ns-slapd using CPU cycles.
Would you give some precision what is slow ? (response time, replication
time,..)
If it is eating CPU, I would recommend 'top -H -p <pid> -b'. If it is
always the same thread(s) eating the CPU, a pstack will tell you what
it/they is/are doing. If this is various different threads likely it is
due to processed requests.
I don't know what numbers to give to show the slowness. The load of
the system is almost 4, while the other master (identical system) has
a load of almost zero. (And the latter is used as the main LDAP server)
Right now thread 2 and thread 43 are the most active ones. Here are their
stacktraces.
Thread 43 (Thread 0x7f4baa1d0700 (LWP 3583)):
#0 0x00007f4bcc200a33 in select () at /lib64/libc.so.6
#1 0x00007f4bcf44ab09 in DS_Sleep (ticks=ticks@entry=250) at
ldap/servers/slapd/util.c:1086
mSecs = <optimized out>
tm = {tv_sec = 0, tv_usec = 26398}
#2 0x00007f4bbe587f77 in trickle_threadmain (param=<optimized out>) at
ldap/servers/slapd/back-ldbm/dblayer.c:4551
interval = 250
rval = <optimized out>
priv = 0x55f87ff28b00
li = <optimized out>
debug_checkpointing = 0
#3 0x00007f4bcd1bdc5b in _pt_root () at /lib64/libnspr4.so
#4 0x00007f4bccb5dea5 in start_thread () at /lib64/libpthread.so.0
#5 0x00007f4bcc2099fd in clone () at /lib64/libc.so.6
Thread 2 (Thread 0x7f4b93d39700 (LWP 15086)):
#0 0x00007f4bce211ef5 in tc_malloc () at /lib64/libtcmalloc.so.4
#1 0x00007f4bcf3c15a3 in slapi_ch_malloc (size=35) at ldap/servers/slapd/ch_malloc.c:95
newmem = <optimized out>
#2 0x00007f4bc57809d8 in __os_umalloc () at /lib64/libdb-5.3.so
#3 0x00007f4bc5742136 in __db_retcopy () at /lib64/libdb-5.3.so
#4 0x00007f4bc571a2b3 in __dbc_iget () at /lib64/libdb-5.3.so
#5 0x00007f4bc5728cb2 in __dbc_get_pp () at /lib64/libdb-5.3.so
#6 0x00007f4bbe5c1953 in entryrdn_lookup_dn (be=be@entry=0x55f8801ef040,
rdn=<optimized out>, id=id@entry=2092284, dn=dn@entry=0x7f4b93d34188,
psrdn=psrdn@entry=0x7f4b93d34190, txn=txn@entry=0x7f4b93d342d0) at
ldap/servers/slapd/back-ldbm/ldbm_entryrdn.c:1188
rc = <optimized out>
ai = 0x55f880232540
db = 0x55f881edaa00
cursor = 0x55f881efe240
db_txn = 0x0
key = {data = 0x55f9026082a0, size = 9, ulen = 9, dlen = 0, doff = 0, app_data =
0x0, flags = 2056}
data = {data = 0x0, size = 0, ulen = 0, dlen = 0, doff = 0, app_data = 0x0, flags
= 16}
keybuf = 0x55f9026082a0 "P2092284"
srdn = 0x55f8b9b11500
orignrdn = 0x0
nrdn = 0x55f93b909f80 "changenumber=2092283"
nrdn_len = 20
workid = 2092284
elem = <optimized out>
maybesuffix = 0
db_retry = <optimized out>
#7 0x00007f4bbe5929e7 in id2entry (be=<optimized out>, id=id@entry=2092284,
txn=txn@entry=0x7f4b93d342d0, err=err@entry=0x7f4b93d342c4) at
ldap/servers/slapd/back-ldbm/id2entry.c:346
sdn = 0x0
normdn = 0x0
srdn = 0x0
bdn = 0x0
rdn = 0x55f93efd4f40 "changenumber=2092283"
rc = <optimized out>
inst = 0x55f88004a680
db = 0x55f881ed8600
db_txn = <optimized out>
key = {data = 0x7f4b93d34200, size = 4, ulen = 0, dlen = 0, doff = 0, app_data =
0x0, flags = 0}
data = {data = 0x55f92c680000, size = 985, ulen = 0, dlen = 0, doff = 0, app_data
= 0x0, flags = 16}
e = 0x0
ee = <optimized out>
temp_id = "\000\037\354", <incomplete sequence \374>
#8 0x00007f4bbe5ceb72 in ldbm_back_next_search_entry_ext (pb=0x55f95513cd80,
use_extension=0) at ldap/servers/slapd/back-ldbm/ldbm_search.c:1627
be = 0x55f8801ef040
inst = 0x55f88004a680
li = 0x55f87ffbf7c0
scope = 2
managedsait = 0
attr = 0x0
filter = 0x55f8a42e8300
sr = 0x55f9557687c0
id = 2092284
e = 0x55f88b819960
nentries = 399583
expire_time = {tv_sec = 0, tv_nsec = 0}
tlimit = -1
llimit = -1
slimit = -1
isroot = 1
urls = 0x0
err = 0
basesdn = 0x55f8e973bbc0
target_uniqueid = 0x0
rc = 0
estimate = 0
txn = {back_txn_txn = 0x0}
pr_idx = -1
conn = 0x0
op = 0x55f8a2ce14a0
reverse_list = 0
#9 0x00007f4bcf405ff6 in iterate (send_result=1, be=0x55f8801ef040,
pr_statp=0x7f4b93d344f4, pagesize=<optimized out>, pnentries=0x7f4b93d34588,
pb=0x55f95513cd80) at ldap/servers/slapd/opshared.c:1292
ger_template_entry = 0x0
operation = 0x55f8a2ce14a0
done = 0
e = 0x55f8a82c0180
pr_stat = 0
pr_idx = -1
rc = <optimized out>
rval = 1
attrsonly = 0
attrs = 0x0
be = 0x55f8801ef040
rc = <optimized out>
#10 0x00007f4bcf405ff6 in send_results_ext (pb=pb@entry=0x55f95513cd80,
nentries=nentries@entry=0x7f4b93d34588, pagesize=-1, pr_stat=pr_stat@entry=0x7f4b93d344f4,
send_result=1) at ldap/servers/slapd/opshared.c:1645
be = 0x55f8801ef040
rc = <optimized out>
#11 0x00007f4bcf408159 in op_shared_search (pb=pb@entry=0x55f95513cd80,
send_result=send_result@entry=1) at ldap/servers/slapd/opshared.c:847
be_suffix = <optimized out>
err = 0
next_be = 0x0
base = 0x7f4bc394e4e2 "cn=changelog"
normbase = <optimized out>
fstr = 0x55f8904cc540
"(&(changenumber>=-1)(targetuniqueid=1e89eb86-e20111e8-8820f96e-fc0c60a4))"
scope = 2
be = 0x55f8801ef040
be_single = 0x0
be_list = {0x55f8801ef040, 0x0, 0x0, 0x0, 0x55f8801da740, 0x55f800000001,
0x55f900000000, 0x0 <repeats 994 times>}
referral_list = {0x0 <repeats 611 times>, 0x7f4bcdd788e3
<ber_sockbuf_ctrl+387>, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7f4bcdd788e3
<ber_sockbuf_ctrl+387>, 0x0 <repeats 186 times>, 0x61e7bfe702042a36,
0x40658d47b4a41ce1, 0xad61e3a2024df476, 0xde7dadc70e0b5fd8, 0xf73159a52a48ea2c,
0x4f70cb70764a2fab, 0x674523017f57e936, 0x98badcfeefcdab89, 0xc3d2e1f010325476,
0x7f4b93d37f50, 0x683f974357510ca2, 0x559e6a9b5a8a133, 0xce1fa8c242f162a9,
0xa6643132814dc94e, 0xadde7596feaa7041, 0xca0e6a6d6158d8b, 0xeb6b73d137578f64,
0x5a6da1000aeb278e, 0x24faa792f298903b, 0xf1b60146abc162e4, 0xf55006a5ea03353b,
0x7f4b93d37f50, 0x6e2c2f26e077f9de, 0x71a715d3942421dd, 0x8000000058200514, 0x0, 0x0, 0x0,
0x0, 0x2a000000000, 0x0, 0xa12a74c04e49e800, 0x0, 0x7f4b93d37f50, 0x2, 0x7f4b93d38020,
0x7f4b93d380e0, 0x2, 0x55f9217d5e00, 0x7f4bcee55d9d <k5_sha1_hash+125>,
0xfea223f37bbd812, 0xf8ec1ab4f83d9a2b, 0x2a0e21f8c65, 0xe077f9de00000000,
0x942421dd6e2c2f26, 0x5820051471a715d3, 0x80000000, 0x0, 0x0, 0x0, 0x0,
0x2a0, 0xfffffffffffffffe, 0xa12a74c04e49e800, 0x7f4bcf07fd60, 0x55f9038e6620,
0x55f8a467c400, 0x55f8cb6cc1e0, 0x0, 0x7f4bcee713e2 <krb5int_hmac_keyblock+418>,
0x40, 0x7f4b93d38170, 0x7f4b93d380e0, 0x5, 0x14970ea702, 0x55f9038e6620, 0x2,
0x40970ea702, 0x55f9217d5e00, 0x1d99616e00000002, 0x14970ea702, 0x55f9038e6620, 0x14,
0xa12a74c04e49e800, 0x20, 0x4, 0x7f4b93d38170, 0xa12a74c04e49e800, 0x4, 0x7f4b93d38170,
0x7f4bcf07ff00 <krb5int_enc_aes256>, 0x7f4bcee6a8f6 <krb5int_dk_decrypt+406>,
0x20, 0xc00000003, 0x55f90dde5e30, 0x7f4b93d380e0, 0x55f9038e6ca0, 0x0, 0x55f8cb6cd3e0,
0x55f8cb6cc1e0, 0x1400000000, 0x55f9038e6ca0, 0x5516000000, 0xa12a74c04e49e800, 0x20,
0x7f4bcf080b00, 0x1e, 0x7f4b93d38290, 0x41ed982026117e4f, 0xb3c887fe91a43017,
0x720d385d3bc2cd56, 0x50a61fb007a7a9ca, 0xa5d3e14dc70e09bd, 0xb6ead04df0228653,
0x674523017e49db9f, 0x98badcfeefcdab89, 0xc3d2e1f010325476, 0x7f4b93d38260,
0x7234e4c832eb2e6b, 0xbab56daaa29a3c86, 0xfda5dc82799faefa, 0x7b24e8909274e0fa,
0xe1bbebd5d6ad6668, 0x6e8fa927ce38a9b, 0x816585922b509df, 0xd2e805e35f84d38d,
0xdf7f8b184e7dfaca, 0x59585d6f1926cba8, 0x59d1d26715819ef8, 0x7f4b93d38260,
0x80a891fe5f5cfd30, 0x705ca35758499ae9, 0x80000000c5f084c6, 0x0, 0x0, 0x0, 0x0,
0x2a000000000, 0x7f4b93d38320, 0xa12a74c04e49e800, 0x55f977b74fc0, 0x7f4b93d38260, 0x2,
0x7f4b93d38330, 0x7f4b93d38400, 0x2, 0x55f94e35f680, 0xa12a74c04e49e800,
0xd7c43895b01e211d, 0x7f4bcee65b20 <k5_iov_cursor_put+176>, 0x2a0f3cb812d,
0xa12a74c04e49e800, 0x58499ae980a891fe, 0x7f4b93d38340, 0x55f89b92fad0, 0x7f4b93d38350,
0x7f4b93d38330, 0x7f4b93d382e0, 0x55f901527130, 0x7f4bcee547cd
<krb5int_aes_encrypt+813>, 0x20, 0xa7ff7ad44e49e800, 0x0, 0x7f4b93d382e0,
0x7f4b93d38470, 0x4, 0x10, 0x7f4b00000000, 0x4, 0x0, 0x4, 0x0, 0x21cafd39017982c4,
0x6e273f72669afee0, 0x530d2dd27b61963b, 0xff29c39b99a71313, 0x9ecaf11f4ef579a1,
0x8cb433b09c5edc92, 0x21cafd39017982c4, 0x6e273f72669afee0, 0x0, 0xa12a74c04e49e800,
0x7f4b93d38470...}
attrlistbuf =
"\300\207ӓK\177\000\000\301\207ӓK\177\000\000\000\000\000\000K\177\000\000\034_\032\315K\177\000\000;\232F\317K\177\000\000\020\211ӓK\177\000\000\000\000\000\000\000\000\000\000P\232F\317K\177\000\000Q\232F\317K\177\000\000\356a\032\315K\177\000\000\020\210ӓK\177",
'\000' <repeats 18 times>,
"\357\206ӓK\177\000\000\360\206ӓK\177\000\000\006\000\000\000K\177\000\000@\210ӓK\177\000\000A\210ӓK\177\000\000\000\000\000\000K\177\000\000\005\214D\317\000\000\000\000\000\207ӓK\177\000\000P\204x\006\371U\000
_\232F\317K\177", '\000' <repeats 19 times>,
"\350IN\300t*\241\000"...
attrliststr = <optimized out>
attrs = 0x0
rc = 0
internal_op = <optimized out>
basesdn = 0x55f8d6a316b0
sdn = 0x55f8e973bbc0
operation = 0x55f8a2ce14a0
referral = 0x0
proxydn = 0x0
proxystr = 0x0
proxy_err = <optimized out>
errtext = 0x0
nentries = 0
pnentries = 399583
flag_search_base_found = 1
flag_no_such_object = 0
flag_referral = 0
flag_psearch = <optimized out>
err_code = 0
ctrlp = 0x0
ctl_value = 0x0
iscritical = 0
be_name = <optimized out>
index = -1
sent_result = 0
pr_stat = 0
pb_conn = 0x0
pagesize = -1
estimate = 0
curr_search_count = <optimized out>
pr_be = <optimized out>
pr_search_result = <optimized out>
pr_idx = -1
orig_sdn = 0x0
free_sdn = 1
vattr_lock_acquired = 1
#12 0x00007f4bcf419697 in search_internal_callback_pb (pb=pb@entry=0x55f95513cd80,
callback_data=callback_data@entry=0x7f4b93d38b60, prc=prc@entry=0x7f4bcf418770
<internal_plugin_result_callback>, psec=psec@entry=0x7f4bcf4187c0
<internal_plugin_search_entry_callback>, prec=prec@entry=0x7f4bcf418780
<internal_plugin_search_referral_callback>) at
ldap/servers/slapd/plugin_internal_op.c:727
controls = 0x0
op = 0x55f8a2ce14a0
filter = 0x55f8aad34480
fstr = 0x55f8a3cd7180
"(&(changenumber>=-1)(targetuniqueid=1e89eb86-e20111e8-8820f96e-fc0c60a4))"
callback_handler_data = {p_res_callback = 0x7f4bcf418770
<internal_plugin_result_callback>, p_srch_entry_callback = 0x7f4bcf4187c0
<internal_plugin_search_entry_callback>, p_ref_entry_callback = 0x7f4bcf418780
<internal_plugin_search_referral_callback>, callback_data = 0x7f4b93d38b60}
scope = 2
ifstr = 0x55f8904cc540
"(&(changenumber>=-1)(targetuniqueid=1e89eb86-e20111e8-8820f96e-fc0c60a4))"
opresult = -817789072
rc = 0
tmp_attrs = 0x0
#13 0x00007f4bcf4198e8 in search_internal_pb (pb=pb@entry=0x55f95513cd80) at
ldap/servers/slapd/plugin_internal_op.c:592
psid = {rc = -1, num_entries = 399583, num_referrals = 0, entry_list_head =
0x55f8e74c92d0, referral_list_head = 0x0}
iterator = <optimized out>
tmp = 0x7f4b93d38b60
ref_iterator = <optimized out>
ref_tmp = 0x7f4bcf6af240 <global_server_id_plg>
i = <optimized out>
opresult = 0
pb_search_entries = 0x0
pb_search_referrals = 0x0
#14 0x00007f4bcf419b33 in slapi_search_internal_pb (pb=pb@entry=0x55f95513cd80) at
ldap/servers/slapd/plugin_internal_op.c:502
#15 0x00007f4bc394bd13 in sync_cookie_get_change_number (uniqueid=<optimized out>,
lastnr=<optimized out>) at ldap/servers/plugins/sync/sync_util.c:429
srch_pb = 0x55f95513cd80
cl_entry = <optimized out>
rv = 0
filter = 0x55f8904cc540
"(&(changenumber>=-1)(targetuniqueid=1e89eb86-e20111e8-8820f96e-fc0c60a4))"
entries = 0x7f4b93d38c6c
newnr = 18446744073709551615
uniqueid = <optimized out>
attr = 0x55f96ad77e60
val = 0x55f964c56860
#16 0x00007f4bc394bd13 in sync_cookie_update (sc=0x55f8c1753040,
ec=ec@entry=0x55f9565852c0) at ldap/servers/plugins/sync/sync_util.c:514
uniqueid = <optimized out>
attr = 0x55f96ad77e60
val = 0x55f964c56860
#17 0x00007f4bc394d66d in sync_send_results (arg=<optimized out>) at
ldap/servers/plugins/sync/sync_persist.c:643
noattrs = 0x0
ec = 0x55f9565852c0
attrsonly = 0
attrs = 0x0
ectrls = 0x55f906788450
chg_type = 2
req = 0x55f9704795c0
qnode = 0x55f97ea3d320
qnodenext = <optimized out>
conn_acq_flag = <optimized out>
conn = 0x55f882282ba8
op = 0x55f880953680
rc = <optimized out>
connid = 5052
opid = 0
#18 0x00007f4bcd1bdc5b in _pt_root () at /lib64/libnspr4.so
#19 0x00007f4bccb5dea5 in start_thread () at /lib64/libpthread.so.0
#20 0x00007f4bcc2099fd in clone () at /lib64/libc.so.
>
>>
>> regards
>> thierry
>> [...]