On Fri, Sep 18, 2020 at 05:02:39PM -0000, Boris Sukhinin via FreeIPA-users wrote:
> Would you mind having a look through the DS error and access
logs on
> the affected system, to see if there are any clues about why the VLV
> index became inconsistent?
It seems there are no records of VLV-related errors in DS logs.
The only messages in error log that contain 'vlv' term are either
backup-related (dblayer_copyfile, dblayer_copy_directory) or about
building VLV index (ldbm_back_ldbm2index) that I initiated myself.
Didn't find any clues in access logs either, records look like regular
LDAP queries to me:
conn=130 op=11 SRCH base="ou=ca,ou=requests,o=ipaca" scope=1
filter="(requestState=*)" attrs=ALL
conn=130 op=11 SORT requestId
conn=130 op=11 VLV 5:0:0819990000 2:2 (0)
conn=130 op=11 RESULT err=0 tag=101 nentries=2 etime=0.0001728512
My only guess is that VLV index was damaged some time ago when BDB ran out
of file descriptors and panicked (which was caused by default value of
nsslapd-maxdescriptors=1024 in cn=config being too low for our setup):
ERR - libdb - BDB2520 /var/lib/dirsrv/slapd-LOCAL-DOMAIN/db/log.0000000242: log file
unreadable: Too many open files
ERR - libdb - BDB0061 PANIC: Too many open files
ERR - libdb - BDB0060 PANIC: fatal region error detected; run recovery
ERR - idl_new_fetch - idl_new.c (1); server stopping as database recovery needed.
Boris, thank you for the info. Adding Thierry (DS engineer) -
do you think it could be related?
Cheers,
Fraser
I've restored domain database from another replica but didn't
do anything about
CA database which probebly was a mistake.
It feels a little worrying that logs show no signs of inconsistent VLV because
that means we're unable to monitor and fix the issue before it becomes a problem
as it happened in our case.
> I also wrote a blog post about this scenario:
>
https://frasertweedale.github.io/blog-redhat/posts/2020-09-17-dogtag-vlv-...
That's a great post, thank you!
Regards,
Boris