Hello,
We recently started using the 389 Directory Server and are seeing an odd problem where searches start returning the wrong DN for a small number of entries in our directory. For example, our users are in ou=People,dc=acs,dc=albany,dc=edu but for a few user entries the server starts returning the DN as "uid=(username),dc=acs,dc=albany,dc=edu," "uid=(username),ou=Group,dc=acs,dc=albany,dc=edu," or sometimes even something like "uid=(username),=albany,,dc=acs,dc=albany,dc=edu." The problem seems to be in memory, as restarting the directory server fixes the problem temporarily but then it will start happening again with a different set of entries. A db2ldif extract made while the server is in this state does not contain any of the bad DNs.
I tried upgrading to 1.2.6.1-2. Since the upgrade, this has not happened for any user entries, but has happened for group entries.
Has anyone else run into this problem? We are running 389 Directory Server on Oracle Linux 5.5 and using the x86_64 389 DS packages from EPEL. We have about 125,000 entries in our directory, most of which are in ou=People. We recently migrated from the Sun Directory Server 5.2.
Thanks, Eric
Eric Torgersen Assistant Director ITS Systems Management & Operations 518-437-3665
Hello, Eric.
Is it possible to share any sample DNs with us? (No need to be as it is, you could replace a real uid with something else. But if there are any special characters in the string, we'd like to learn it.)
And if you could tell us your use cases, it'd be a big help. For instance, do you modify the entries or rename them or just search them?
Thank you for your help, --noriko
On 10/14/2010 11:48 AM, Eric Torgersen wrote:
Hello,
We recently started using the 389 Directory Server and are seeing an odd problem where searches start returning the wrong DN for a small number of entries in our directory. For example, our users are in ou=People,dc=acs,dc=albany,dc=edu but for a few user entries the server starts returning the DN as "uid=(username),dc=acs,dc=albany,dc=edu," "uid=(username),ou=Group,dc=acs,dc=albany,dc=edu," or sometimes even something like "uid=(username),=albany,,dc=acs,dc=albany,dc=edu." The problem seems to be in memory, as restarting the directory server fixes the problem temporarily but then it will start happening again with a different set of entries. A db2ldif extract made while the server is in this state does not contain any of the bad DNs.
I tried upgrading to 1.2.6.1-2. Since the upgrade, this has not happened for any user entries, but has happened for group entries.
Has anyone else run into this problem? We are running 389 Directory Server on Oracle Linux 5.5 and using the x86_64 389 DS packages from EPEL. We have about 125,000 entries in our directory, most of which are in ou=People. We recently migrated from the Sun Directory Server 5.2.
Thanks, Eric
Eric Torgersen Assistant Director ITS Systems Management& Operations 518-437-3665
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Noriko,
I did not notice any special characters, but I have not been checking specifically for non-printable characters - the next time I run into this problem I will check for that. I am monitoring for this problem by periodically outputting a search for objectclass=* as directory manager to a file, and then using grep to identify DNs that are not in the expected OU. We first discovered the problem when a user reported not being able to login to some applications. It turned out that the user's DN had "shifted" up one level out of ou=People so applications that are configured to look only in ou=People for users would not find them.
Our base dn is dc=acs,dc=albany,dc=edu. Here are some of sample DNs, with the uid substituted:
user examples:
dn: uid=ab123456,dc=acs,dc=albany,dc=edu
dn: uid=da123456,ou=People,ou=Netgroup,dc=acs,dc=albany,dc=edu
group examples:
dn: cn=wwwcarey,=albany,,dc=acs,dc=albany,dc=edu
dn: cn=mongrp,dc=acs,dc=albany,dc=edu
Here is an example of what one of our typical user entries looks like, with attributes:
dn: uid=et123456,ou=People,dc=acs,dc=albany,dc=edu sambaPwdLastSet: 1286193602 albanyEduPersonAffiliation: CEMP albanyEduPersonAffiliation: EMPA eduPersonPrimaryAffiliation: staff eduPersonScopedAffiliation: staff@albany.edu eduPersonAffiliation: staff employeeNumber: 999999999 primaryDeptUADContact: Doe,John secondDeptCSSContact: Doe,John primaryDeptCSSContact: Smith, Bob campusAddress: Some Building Rm 5 albanyEduPersonPrimarySubdivision: Systems Management & Operation albanyEduPersonPrimaryDivision: Information Technology Srvcs telephoneNumber: +1 518 4373665 mail: eric@albany.edu albanyEduPersonPrimaryDept: 99999 eduPersonPrincipalName: et123456@albany.edu preferredDisplayName: Eric Torgersen eduPersonNickName: Eric albanyEduPersonMiddleInitial: A givenName: Eric sn: Torgersen cn: Eric Torgersen displayName: Eric Torgersen ITSLIHomeDirectory: /home/et123456 loginShell: /bin/csh artmac-homeDirectory: /Users/LDAP/et123456 apple-generateduid: 008A0AAE-3C41-197D-CAA7-A5C0475D749D authAuthority: ;basic; sambaSID: S-1-5-21-1234567890-3016994024-4563145123-1234 albanyEduPersonNopwrset: gidNumber: 1001 RITHomeDirectory: /network/rit/home/et123456 albanyEduPersonMboxStatus: active albanyEduPersonNetworkAccess: uid: et123456 objectClass: account objectClass: posixAccount objectClass: top objectClass: shadowAccount objectClass: inetorgperson objectClass: person objectClass: organizationalPerson objectClass: albanyeduperson objectClass: albanyedulinuxaccount objectClass: eduperson objectClass: sunyperson objectClass: sambaSamAccount objectClass: apple-user objectClass: artmac-user objectClass: splunkUser uidNumber: 1002 homeDirectory: /home2/c/e/et123456 gecos: Eric Torgersen,,,
We have a number of web applications that use LDAP for authentication, by searching for the user and then binding. We also use our directory as a naming service for Solaris, Linux and Mac hosts. In addition, we have FreeRADIUS servers using our directory to authenticate users on our wireless network. As far as changes to the directory information, we rename entries very rarely, and have not done any since the migration to 389. Most of the modify operations we see are password changes. We have the Samba PAM module on our main Solaris login host, which tends to be quite chatty, deleting and then adding back the sambaPwdLastSet attribute for users each time they login there, even if there is no corresponding change to the sambaNTPassword attribute.
When I have come across the incorrect DNs, I have checked the audit log and not found any correlation between the affected DNs and recent modify operations.
We have a single master and two read-only replicas. I have seen some of the incorrect DNs replicate over while performing a consumer initialization, but otherwise have not seen this problem on the replicas.
Thanks in advance for any help you can provide.
Eric
Eric Torgersen Assistant Director ITS Systems Management & Operations 518-437-3665
On Thu, 14 Oct 2010, Noriko Hosoi wrote:
Hello, Eric.
Is it possible to share any sample DNs with us? (No need to be as it is, you could replace a real uid with something else. But if there are any special characters in the string, we'd like to learn it.)
And if you could tell us your use cases, it'd be a big help. For instance, do you modify the entries or rename them or just search them?
Thank you for your help, --noriko
On 10/14/2010 11:48 AM, Eric Torgersen wrote:
Hello,
We recently started using the 389 Directory Server and are seeing an odd problem where searches start returning the wrong DN for a small number of entries in our directory. For example, our users are in ou=People,dc=acs,dc=albany,dc=edu but for a few user entries the server starts returning the DN as "uid=(username),dc=acs,dc=albany,dc=edu," "uid=(username),ou=Group,dc=acs,dc=albany,dc=edu," or sometimes even something like "uid=(username),=albany,,dc=acs,dc=albany,dc=edu." The problem seems to be in memory, as restarting the directory server fixes the problem temporarily but then it will start happening again with a different set of entries. A db2ldif extract made while the server is in this state does not contain any of the bad DNs.
I tried upgrading to 1.2.6.1-2. Since the upgrade, this has not happened for any user entries, but has happened for group entries.
Has anyone else run into this problem? We are running 389 Directory Server on Oracle Linux 5.5 and using the x86_64 389 DS packages from EPEL. We have about 125,000 entries in our directory, most of which are in ou=People. We recently migrated from the Sun Directory Server 5.2.
Thanks, Eric
Eric Torgersen Assistant Director ITS Systems Management& Operations 518-437-3665
-- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
Eric,
Thanks for your input. It contains lots of useful information. Can I ask some more details about this section? The corrupted DN problem is observed only on a replica after a consumer initialization is done? Or it is observed on the master as well? When the incorrect DNs are detected in the consumer initialization, it is rejected due to the invalid DN or just passed through? Were the events logged in the error log? Did you have a chance to search the entry having the corrupted DN (corrupted and original one) on the master then?
Thanks! --noriko
Eric Torgersen wrote:
We have a single master and two read-only replicas. I have seen some of the incorrect DNs replicate over while performing a consumer initialization, but otherwise have not seen this problem on the replicas.
Noriko,
Please see my comments below.
On Thu, 14 Oct 2010, Noriko Hosoi wrote:
Eric,
Thanks for your input. It contains lots of useful information. Can I ask some more details about this section? The corrupted DN problem is observed only on a replica after a consumer initialization is done? Or it is observed on the master as well?
It is mainly observed on the master. I think I only observed it on the replica because I happened to be doing an initialization at a time when the master had some of the corrupted DNs in memory.
On the master, the corrupted DNs can be cleared by a restart - they seem to be in memory only. To fix the replica, I had to reinitialize again after restarting the master (because the entries with corrupt DNs were written to disk.) I think the source of the error on the replica was just that it was passed bad information from the master.
When the incorrect DNs are detected in the consumer initialization, it is rejected due to the invalid DN or just passed through?
Many were just passed though because they were actually valid, but incorrect DNs, in the case where the ou=Group part was dropped from the DN:
dn: cn=mongrp,dc=acs,dc=albany,dc=edu
A few were rejected because the DN was invalid, like in the case of wwwcarey,=albany,,dc=acs,dc=albany,dc=edu
Were the events logged in the error log?
For the rejected DNs, yes:
[14/Oct/2010:10:35:35 -0400] NSMMReplicationPlugin - multimaster_be_state_change : replica dc=acs,dc=albany,dc=edu is going offline; disabling replication [14/Oct/2010:10:35:35 -0400] - WARNING: Import is running with nsslapd-db-privat e-import-mem on; No other process is allowed to access the database [14/Oct/2010:10:35:56 -0400] - import userRoot: Processed 50368 entries -- avera ge rate 2398.5/sec, recent rate 2398.4/sec, hit ratio 0% [14/Oct/2010:10:36:00 -0400] - import userRoot: WARNING: Skipping entry "cn=wwwc hsr,ou=Group,ou=Group,dc=acs,dc=albany,dc=edu" which has no parent, ending at line 0 of file "(bulk import)" [14/Oct/2010:10:36:01 -0400] - import userRoot: WARNING: bad entry: ID 57588 ... [14/Oct/2010:10:36:08 -0400] - import userRoot: WARNING: Skipping entry "cn=wwwtmu,ou=Group,=albany,dc=edu,dc=acs,dc=albany,dc=edu" which has no parent, ending at line 0 of file "(bulk import)" [14/Oct/2010:10:36:09 -0400] - import userRoot: WARNING: bad entry: ID 72233 ... [14/Oct/2010:10:36:39 -0400] - import userRoot: Processed 107643 entries -- aver age rate 1708.6/sec, recent rate 1363.7/sec, hit ratio 95% [14/Oct/2010:10:36:47 -0400] - import userRoot: WARNING: Skipping entry "cn=wwwc arey,=albany,,dc=acs,dc=albany,dc=edu" which has no parent, ending at line 0 of file "(bulk import)" [14/Oct/2010:10:36:48 -0400] - import userRoot: WARNING: bad entry: ID 116490
Did you have a chance to search the entry having the corrupted DN (corrupted and original one) on the master then?
Yes. The same DNs showed up corrupted on the master, until I restarted it. Then they appeared fine.
So far, the corrupted DNs seem to be happening less frequently with 1.2.6.1, as compared to 1.2.6. We have been on 1.2.6.1 since yesterday evening, and only had this happen once so far with some of the group entries. On 1.2.6, this was usually happening multiple times per day, and affecting user entries.
Thanks, Eric
Eric,
Thank you for your response. Just to make sure your db is not broken, could you run these command lines and look for any corrupted DIT link when the DN corruption is observed? The outputs should be huge. So, I recommend you to redirect them to a file. I think we are interested in just around "ou=People,dc=acs,dc=albany,dc=edu" and "ou=Group,dc=acs,dc=albany,dc=edu". Since restarting the server fixes the problem, (I'm hoping) you don't see any corruption in this level.
$ dbscan -f /var/lib/dirsrv/slapd-YOURID/db/YOURBACKEND/id2entry.db4 | egrep "dn:|entryid:|parentid:" rdn: dc=acs,dc=albany,dc=edu entryid: 1 rdn: ou=People parentid: 1 entryid: 2 [...]
$ dbscan -f /var/lib/dirsrv/slapd-YOURID/db/YOURBACKEND/entryrdn.db4 -k "ou=People,dc=acs,dc=albany,dc=edu" ou=People,dc=acs,dc=albany,dc=edu ID: #; RDN: "ou=People,dc=acs,dc=albany,dc=edu"; NRDN: "ou=people,dc=acs,dc=albany,dc=edu" [...]
$ dbscan -f /var/lib/dirsrv/slapd-YOURID/db/YOURBACKEND/entryrdn.db4 -k "ou=Group,dc=acs,dc=albany,dc=edu" ou=Group,dc=acs,dc=albany,dc=edu ID: #; RDN: "ou=Group,dc=acs,dc=albany,dc=edu"; NRDN: "ou=group,dc=acs,dc=albany,dc=edu" [...]
Thanks! --noriko
Eric Torgersen wrote:
Noriko,
Please see my comments below.
On Thu, 14 Oct 2010, Noriko Hosoi wrote:
Eric,
Thanks for your input. It contains lots of useful information. Can I ask some more details about this section? The corrupted DN problem is observed only on a replica after a consumer initialization is done? Or it is observed on the master as well?
It is mainly observed on the master. I think I only observed it on the replica because I happened to be doing an initialization at a time when the master had some of the corrupted DNs in memory.
On the master, the corrupted DNs can be cleared by a restart - they seem to be in memory only. To fix the replica, I had to reinitialize again after restarting the master (because the entries with corrupt DNs were written to disk.) I think the source of the error on the replica was just that it was passed bad information from the master.
When the incorrect DNs are detected in the consumer initialization, it is rejected due to the invalid DN or just passed through?
Many were just passed though because they were actually valid, but incorrect DNs, in the case where the ou=Group part was dropped from the DN:
dn: cn=mongrp,dc=acs,dc=albany,dc=edu
A few were rejected because the DN was invalid, like in the case of wwwcarey,=albany,,dc=acs,dc=albany,dc=edu
Were the events logged in the error log?
For the rejected DNs, yes:
[14/Oct/2010:10:35:35 -0400] NSMMReplicationPlugin - multimaster_be_state_change : replica dc=acs,dc=albany,dc=edu is going offline; disabling replication [14/Oct/2010:10:35:35 -0400] - WARNING: Import is running with nsslapd-db-privat e-import-mem on; No other process is allowed to access the database [14/Oct/2010:10:35:56 -0400] - import userRoot: Processed 50368 entries -- avera ge rate 2398.5/sec, recent rate 2398.4/sec, hit ratio 0% [14/Oct/2010:10:36:00 -0400] - import userRoot: WARNING: Skipping entry "cn=wwwc hsr,ou=Group,ou=Group,dc=acs,dc=albany,dc=edu" which has no parent, ending at line 0 of file "(bulk import)" [14/Oct/2010:10:36:01 -0400] - import userRoot: WARNING: bad entry: ID 57588 ... [14/Oct/2010:10:36:08 -0400] - import userRoot: WARNING: Skipping entry "cn=wwwtmu,ou=Group,=albany,dc=edu,dc=acs,dc=albany,dc=edu" which has no parent, ending at line 0 of file "(bulk import)" [14/Oct/2010:10:36:09 -0400] - import userRoot: WARNING: bad entry: ID 72233 ... [14/Oct/2010:10:36:39 -0400] - import userRoot: Processed 107643 entries -- aver age rate 1708.6/sec, recent rate 1363.7/sec, hit ratio 95% [14/Oct/2010:10:36:47 -0400] - import userRoot: WARNING: Skipping entry "cn=wwwc arey,=albany,,dc=acs,dc=albany,dc=edu" which has no parent, ending at line 0 of file "(bulk import)" [14/Oct/2010:10:36:48 -0400] - import userRoot: WARNING: bad entry: ID 116490
Did you have a chance to search the entry having the corrupted DN (corrupted and original one) on the master then?
Yes. The same DNs showed up corrupted on the master, until I restarted it. Then they appeared fine.
So far, the corrupted DNs seem to be happening less frequently with 1.2.6.1, as compared to 1.2.6. We have been on 1.2.6.1 since yesterday evening, and only had this happen once so far with some of the group entries. On 1.2.6, this was usually happening multiple times per day, and affecting user entries.
Thanks, Eric -- 389 users mailing list 389-users@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/389-users
389-users@lists.fedoraproject.org