Hello All,
I hope you can forgive a request which I am sure doesn't have enough
information in it, please let me know what else I can add if you might
be able to help.
I have a problem with our installation of RHDS9 and practically
nothing in the logs to suggest where to look.
We have a multi-master pair, with DNS round robin to load balance.
Due to the problem I have updated DNS to point all traffic to the
working server so I hope I can get this working again without
impacting the users. But while I don't know the reason I'm concerned
it may occur on the working server and prevent all logins. :(
We first noticed that replication was not working, now it seems that I
can't get slapd to start on one of the pair.
Have restarted both dirsrv and both servers.
There is woefully little in the log files, but if there is a way to
increase logging levels I haven't found it yet. If there is, please
advise and I'll do that and post.
This is the info I have gathered so far. Please let me know what else
might help.
/usr/sbin/ns-slapd -v
389 Project
389-Directory/1.2.11.15 B2013.211.1952
dirsrv dir01 is stopped
There is no:
/var/run/dirsrv/slapd-dir01.pid
# service dirsrv start
*** Error: 1 instance(s) failed to start
The start-up runs the wait loop and finally exists, with the message above.
errors log includes the message:
[01/Oct/2013:12:14:47 +0100] - 389-Directory/1.2.11.15 B2013.211.1952
starting up
[01/Oct/2013:12:14:47 +0100] - WARNING: userRoot: entry cache size
10485760B is less than db size 10739712B; We recommend to increase the
entry cache size nsslapd-cachememsize.
The start-up process leaves one slapd running:
# ps -ef |grep slapd
dsuser 12560 1 0 09:51 ? 00:00:03 /usr/sbin/ns-slapd -D
/etc/dirsrv/slapd-dir01 -i /var/run/dirsrv/slapd-dir01.pid -w
/var/run/dirsrv/slapd-dir01.startpid
but no working ns-slapd.
I recognise that we need to tune the cache, but don't believe that it
will cause the start-up failure, just a performance hit. To tune via
the console I suspect I have to get it running first!
The working server shows the same error, along with:
[01/Oct/2013:12:16:26 +0100] slapi_ldap_bind - Error: could not send
bind request for id [cn=repman,cn=config] mech [SIMPLE]: error -1
(Can't contact LDAP server) 0 (unknown) 107 (Transport endpoint is not
connected)
Which makes sense.
The logs errors and access provide no other content at all, so nothing
to indicate what is failing.
Any ideas where I might start will be greatly welcomed.
Many thanks, Ric.