[389-users] importing large subtree crashes ns-slapd
Rich Megginson
rmeggins at redhat.com
Thu Mar 4 03:30:19 UTC 2010
Christopher Wood wrote:
> I'm just getting started with 389 Directory Server (at work), and I've run into an issue that I'm not certain how to troubleshoot. I would greatly appreciate any assistance or tips you could offer, especially on where to look to see what's failing.
>
> Also, I apologize in advance for changing strings related to my employer's directory names and such, as I'm not comfortable with leaking that level information to a public list.
>
As well you should be - you should always obscure sensitive information
like this.
>
> Overview:
>
> Initializing a large subtree from NDS 6.2 crashes ns-slapd, but other subtrees are fine.
>
>
> Top-Level Questions:
>
> 1) How do I stop ns-slapd from crashing?
>
Good question.
> 2) How do I figure out what precisely is causing the crash? (With various levels of debug logging I get the same log entry.)
>
You've already used the TRACE level (1) for logging - that's as verbose
as it gets for this particular operation. Next step would be to try to
get a core file.
> 3) Is it possible to simply import my initialization ldif without duplication checks?
>
No.
>
> Background:
>
> At work we have NDS 6.2 (single master on a physical server, virtual machine slaves), and would like to move our directories intact to a 389 2.6 installation via replication.
>
What platform/OS? 32-bit or 64-bit? By NDS 6.2 I'm assuming you mean
Netscape Directory Server - by 2.6 I'm assuming you mean 1.2.6.a1 (a2
should be hitting the mirrors tomorrow).
> I already have replicated several of our NDS 6.2 subtrees to 389 2.6 with no difficulties.
>
> I compiled our 389 installation from the source packages downloaded from http://directory.fedoraproject.org/wiki/Source.
Did you grab 389-ds-base 1.2.6.a1 or 1.2.6.a2?
What compiler flags did you use?
Do you have a core file? If so, try using gdb
gdb /path/to/ns-slapd /path/to/core.pid
once in gdb, type the "where" command
(gdb) where
> The underlying platform is:
>
> $ uname -a
> Linux cwlab-02.mycompany.com 2.6.18-164.el5 #1 SMP Thu Sep 3 03:33:56 EDT 2009 i686 i686 i386 GNU/Linux
> $ cat /etc/redhat-release
> CentOS release 5.4 (Final)
>
> $ free
> total used free shared buffers cached
> Mem: 3894000 1336012 2557988 0 144944 1004716
> -/+ buffers/cache: 186352 3707648
> Swap: 2031608 0 2031608
>
>
> Procedure To Crash 389's ns-slapd:
>
> a) In the NDS 6.2 admin console, create a new replication agreement for the "o=This Big Net" subtree, and choose to "Create consumer initialization file".
>
> b) Copy the file to the 389 server.
>
> c) In the 389 2.6 admin console for the Directory Server, in the Configuration tab (Data -> o=This Big Net -> dbRoot), right-click and choose "Initialize Database". Use the ldif file copied over.
>
> The ns-slapd process crashes, and I always get this in /opt/dirsrv/var/log/dirsrv/slapd-cwlab-02/errors as the last two lines:
>
> [03/Mar/2010:12:50:04 -0500] - import ldapAuthRoot: Processing file "/home/cwood/tbn.ldif"
> [03/Mar/2010:12:50:04 -0500] - => str2entry_dupcheck
>
>
> Other Details:
>
>
> I found two bugs with the str2entry_dupcheck string in it, but they don't seem pertinent:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=548115
> https://bugzilla.redhat.com/show_bug.cgi?id=243488
>
>
> This says that str2entry_dupcheck could be about two things:
>
> http://docs.sun.com/source/816-6699-10/ax_errcd.html
>
> "While attempting to convert a string entry to an LDAP entry, the server found that the entry has no DN."
>
> "The server failed to add a value to the value tree."
>
> (But this is an exported database from NDS 6.2, and I'm fairly sure, without reading them all, that every entry will have a DN.)
>
The log message
[03/Mar/2010:12:50:04 -0500] - => str2entry_dupcheck
is just trace information, not a report of a problem or error.
Does the crash happen almost immediately? Or does it take a while? If
the problem happens quickly, it would be worthwhile to scan the first
couple of dozen entries looking for things like - entries without a DN -
attributes without a value
>
> If 389 is trying to check for duplicate entries, perhaps there are simply too many DNs?
>
> $ grep '^dn:' tbn.ldif | wc -l
> 636985
> $ ls -lh acc.ldif
> -rw-r--r-- 1 cwood cwood 755M Mar 3 11:24 tbn.ldif
>
No. The server should be able to handle this much data easily. And it
must check for duplicate entries.
>
> Per the instructions here:
>
> http://directory.fedoraproject.org/wiki/FAQ#Troubleshooting
>
> I set my debug logging first to 24579:
>
> 1 Trace function calls
> 2 Debug packet handling
> 8192 Replication debugging
> 16384 Critical messages
>
> Then for the next try at reading logs I set it to 90115, the above plus:
>
> 65536 Plug-in debugging
>
> However, every time the log ended with the same set of lines noted above.
>
1 Trace is really the best for this particular problem, and as you have
found it is limited for this particular problem.
I think the next step would be to build the server with full debugging
information (use -g and omit -O2 or any other -Ox) and get a stack trace
with full debug information.
> --
> 389 users mailing list
> 389-users at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users
>
More information about the 389-users
mailing list