[389-users] Problem starting and replicating RHDS9

Justin Edmands shockwavecs at gmail.com
Tue Oct 1 14:41:51 UTC 2013


On Tue, Oct 1, 2013 at 8:19 AM, Ric <389-users-list at vorticity.org> wrote:

> Hello All,
>
> I hope you can forgive a request which I am sure doesn't have enough
> information in it, please let me know what else I can add if you might
> be able to help.
>
> I have a problem with our installation of RHDS9 and practically
> nothing in the logs to suggest where to look.
>
> We have a multi-master pair, with DNS round robin to load balance.
> Due to the problem I have updated DNS to point all traffic to the
> working server so I hope I can get this working again without
> impacting the users. But while I don't know the reason I'm concerned
> it may occur on the working server and prevent all logins. :(
>
> We first noticed that replication was not working, now it seems that I
> can't get slapd to start on one of the pair.
> Have restarted both dirsrv and both servers.
>
> There is woefully little in the log files, but if there is a way to
> increase logging levels I haven't found it yet. If there is, please
> advise and I'll do that and post.
>
> This is the info I have gathered so far. Please let me know what else
> might help.
>
>
> /usr/sbin/ns-slapd -v
> 389 Project
> 389-Directory/1.2.11.15 B2013.211.1952
>
> dirsrv dir01 is stopped
> There is no:
> /var/run/dirsrv/slapd-dir01.pid
>
> # service dirsrv start
>   *** Error: 1 instance(s) failed to start
>
> The start-up runs the wait loop and finally exists, with the message above.
> errors log includes the message:
>
> [01/Oct/2013:12:14:47 +0100] - 389-Directory/1.2.11.15 B2013.211.1952
> starting up
> [01/Oct/2013:12:14:47 +0100] - WARNING: userRoot: entry cache size
> 10485760B is less than db size 10739712B; We recommend to increase the
> entry cache size nsslapd-cachememsize.
>
>
> The start-up process leaves one slapd running:
> # ps -ef |grep slapd
> dsuser   12560     1  0 09:51 ?        00:00:03 /usr/sbin/ns-slapd -D
> /etc/dirsrv/slapd-dir01 -i /var/run/dirsrv/slapd-dir01.pid -w
> /var/run/dirsrv/slapd-dir01.startpid
>
> but no working ns-slapd.
>
> I recognise that we need to tune the cache, but don't believe that it
> will cause the start-up failure, just a performance hit. To tune via
> the console I suspect I have to get it running first!
> The working server shows the same error, along with:
>
> [01/Oct/2013:12:16:26 +0100] slapi_ldap_bind - Error: could not send
> bind request for id [cn=repman,cn=config] mech [SIMPLE]: error -1
> (Can't contact LDAP server) 0 (unknown) 107 (Transport endpoint is not
> connected)
>
> Which makes sense.
>
> The logs errors and access provide no other content at all, so nothing
> to indicate what is failing.
>
> Any ideas where I might start will be greatly welcomed.
>
> Many thanks, Ric.
> --
> 389 users mailing list
> 389-users at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users


I'm surprised to see that the failing node doesn't produce a real working
output from a startup failure. Try permissions of the /var/run/dirsrv
folder to root:nobody and then nobody:nobody. Remove any PID files from
within the directories.

A few to start:
 - Check for differences in the dse.ldif files. Node specific info will
show normal differences like agreements, etc. See if something was changed
on the non starting node. What logs are you looking at?
 - Permissions on the files/directories that directory server uses
(nobody:nobody) should be the permissions for 389 DS.
 - Location and status of a PID file such as /var/run/dirsrv/admin-serv.pid
and /var/run/dirsrv/slapd-dirsrv1.pid
 - Check logs of working node during the time of initial failure

A few for the hopeful:
- Do you have backups? Mine are in "/var/lib/dirsrv/slapd-baldirsrv1/bak"
- Can you build a new node and join it to the multimaster? I think it
supports 20+ masters now. Add more as they are fairly easy to get up and
running after working out the kinks.
-
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/389-users/attachments/20131001/e545c211/attachment.html>


More information about the 389-users mailing list