I am seeking some assistance in isolating a re-occurring problem we are experiencing with our 389 DS Version 1.2.8.3 installation. We use the directory server for user authentication to our website. Every couple of days we start getting errors from our website login application reporting a user authentication timed out. These timeouts get more frequent as time passes. Our fix now is to restart the directory server which fixes the problem for a couple of days then the timeouts start happening again. I traced one application timeout back to the ds access logs and found the following entry at the same time:

 

[14/Mar/2012:10:23:01 -0500] conn=14730 op=-1 fd=1093 closed error 104 (Connection reset by peer) - TCP connection reset by peer.

 

I looked through the older logs and the only time this conn/fd was used was two days ago. Here are the access log entries:

 

[12/Mar/2012:14:33:06 -0500] conn=14730 fd=1093 slot=1093 connection from 10.1.xx.xx to 10.1.xx.xx

[12/Mar/2012:14:33:06 -0500] conn=14730 op=0 BIND dn="uid,dc=domain,dc=com" method=128 version=3

[12/Mar/2012:14:33:06 -0500] conn=14730 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=theManager,dc=domain,dc=com"

[12/Mar/2012:14:33:06 -0500] conn=14730 op=1 SRCH base="ou=users,ou=external,dc=domain,dc=com" scope=2 filter="(&(uid=xxxxx)(objectClass=inetUser))" attrs="1.1"

[12/Mar/2012:14:33:06 -0500] conn=14730 op=1 RESULT err=0 tag=101 nentries=1 etime=0

[12/Mar/2012:14:33:06 -0500] conn=14730 op=2 BIND dn="uid=xxxxx,ou=users,ou=external,dc=domain,dc=com" method=128 version=3

[12/Mar/2012:14:33:06 -0500] conn=14730 op=2 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=xxxxx,ou=users,ou=external,dc=domain,dc=com"

[12/Mar/2012:14:33:06 -0500] conn=14730 op=3 BIND dn="uid,dc=domain,dc=com" method=128 version=3

[12/Mar/2012:14:33:06 -0500] conn=14730 op=3 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=theManager,dc= domain,dc=com"

[12/Mar/2012:14:35:20 -0500] conn=14730 op=4 SRCH base="ou=groups,ou=external,dc= domain,dc=com" scope=2 filter="(&(cn=domain)(|(objectClass=groupOfURLs)(objectClass=groupOfNames)))" attrs="1.1"

[12/Mar/2012:14:35:20 -0500] conn=14730 op=4 RESULT err=0 tag=101 nentries=1 etime=0

[12/Mar/2012:14:35:20 -0500] conn=14730 op=5 SRCH base="ou=groups,ou=external,dc= domain,dc=com" scope=2 filter="(&(member=cn=domain,ou=groups,ou=external,dc=domain,dc=com)(objectClass=groupOfNames))" attrs="cn"

[12/Mar/2012:14:35:20 -0500] conn=14730 op=5 RESULT err=0 tag=101 nentries=0 etime=0

[12/Mar/2012:14:36:50 -0500] conn=14730 op=6 SRCH base="ou=users,ou=external,dc=domain,dc=com" scope=2 filter="(&(uid=xxxxxx)(objectClass=inetUser))" attrs="1.1"

[12/Mar/2012:14:36:50 -0500] conn=14730 op=6 RESULT err=0 tag=101 nentries=1 etime=0

[12/Mar/2012:14:36:50 -0500] conn=14730 op=7 BIND dn="uid=xxxxxx,ou=users,ou=external,dc=domain,dc=com" method=128 version=3

[12/Mar/2012:14:36:50 -0500] conn=14730 op=7 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=xxxxxx,ou=users,ou=external,dc=domain,dc=com"

[12/Mar/2012:14:36:50 -0500] conn=14730 op=8 BIND dn="uid=theManager,dc=domain,dc=com" method=128 version=3

[12/Mar/2012:14:36:50 -0500] conn=14730 op=8 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=theManager,dc=domain,dc=com"

[12/Mar/2012:14:37:02 -0500] conn=14730 op=9 SRCH base="ou=groups,ou=external,dc=domain,dc=com" scope=2 filter="(&(cn=domain)(|(objectClass=groupOfURLs)(objectClass=groupOfNames)))" attrs="1.1"

[12/Mar/2012:14:37:02 -0500] conn=14730 op=9 RESULT err=0 tag=101 nentries=1 etime=0

[12/Mar/2012:14:37:02 -0500] conn=14730 op=10 SRCH base="ou=groups,ou=external,dc=domain,dc=com" scope=2 filter="(&(member=cn=domain,ou=groups,ou=external,dc=domain,dc=com)(objectClass=groupOfNames))" attrs="cn"

[12/Mar/2012:14:37:02 -0500] conn=14730 op=10 RESULT err=0 tag=101 nentries=0 etime=0

[12/Mar/2012:14:39:35 -0500] conn=14730 op=11 SRCH base="ou=users,ou=external,dc=domain,dc=com" scope=2 filter="(&(uid=xxxxxx)(objectClass=inetUser))" attrs="1.1"

[12/Mar/2012:14:39:35 -0500] conn=14730 op=11 RESULT err=0 tag=101 nentries=1 etime=0

[12/Mar/2012:14:39:35 -0500] conn=14730 op=12 BIND dn="uid=xxxxxx,ou=users,ou=external,dc=domain,dc=com" method=128 version=3

[12/Mar/2012:14:39:35 -0500] conn=14730 op=12 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=xxxxxx,ou=users,ou=external,dc=domain,dc=com"

[12/Mar/2012:14:39:35 -0500] conn=14730 op=13 BIND dn="uid=theManager,dc=domain,dc=com" method=128 version=3

[12/Mar/2012:14:39:35 -0500] conn=14730 op=13 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=theManager,dc=domain,dc=com"

[12/Mar/2012:14:40:23 -0500] conn=14730 op=14 SRCH base="ou=groups,ou=external,dc=domain,dc=com" scope=2 filter="(&(cn=domain)(|(objectClass=groupOfURLs)(objectClass=groupOfNames)))" attrs="1.1"

[12/Mar/2012:14:40:23 -0500] conn=14730 op=14 RESULT err=0 tag=101 nentries=1 etime=0

[12/Mar/2012:14:40:23 -0500] conn=14730 op=15 SRCH base="ou=groups,ou=external,dc=domain,dc=com" scope=2 filter="(&(member=cn=domain,ou=groups,ou=external,dc=domain,dc=com)(objectClass=groupOfNames))" attrs="cn"

[12/Mar/2012:14:40:23 -0500] conn=14730 op=15 RESULT err=0 tag=101 nentries=0 etime=0

 

The scenario seems to be that the DS works fine after a restart until it runs out of unused connections and/or file descriptors (max FDs= 8192). When it starts recycling connections and/or file descriptors the 104 errors start appearing more often in the access logs and we start getting more authentication errors.  We suspect that the original connection never got terminated correctly but don’t know if it is the application that is at fault or a DS setting.

 

Our servers have been tuned according to the wiki doc at http://directory.fedoraproject.org/wiki/Performance_Tuning#Linux

We have set our idle “timeout” to 60 seconds and search “timelimit” to 120 seconds with no change in behavior.

 

Watching netstat -nap | grep slapd shows established connections that do not drop off, just continually grow.

 

Any help would be greatly appreciated.

 

Nicholas J Alther

Sr. Software Developer/Analyst

Black Hills Corporation

Phone: 605.721.2158

Cell:     605.593.1899

 

 

 

Nicholas J Alther

Sr. Software Developer/Analyst

Phone: 605.721.2158

Cell:     605.593.1899

 




This electronic message transmission contains information from Black Hills Corporation, its affiliate or subsidiary, which may be confidential or privileged. The information is intended to be for the use of the individual or entity named above. If you are not the intended recipient, be aware the disclosure, copying, distribution or use of the contents of this information is prohibited. If you received this electronic transmission in error, please reply to sender immediately; then delete this message without copying it or further reading.