[389-users] Troublesome Connection reset by peer errors

Alther, Nicholas Nicholas.Alther at blackhillscorp.com
Wed Mar 14 23:58:29 UTC 2012


I am seeking some assistance in isolating a re-occurring problem we are experiencing with our 389 DS Version 1.2.8.3 installation. We use the directory server for user authentication to our website. Every couple of days we start getting errors from our website login application reporting a user authentication timed out. These timeouts get more frequent as time passes. Our fix now is to restart the directory server which fixes the problem for a couple of days then the timeouts start happening again. I traced one application timeout back to the ds access logs and found the following entry at the same time:

[14/Mar/2012:10:23:01 -0500] conn=14730 op=-1 fd=1093 closed error 104 (Connection reset by peer) - TCP connection reset by peer.

I looked through the older logs and the only time this conn/fd was used was two days ago. Here are the access log entries:

[12/Mar/2012:14:33:06 -0500] conn=14730 fd=1093 slot=1093 connection from 10.1.xx.xx to 10.1.xx.xx
[12/Mar/2012:14:33:06 -0500] conn=14730 op=0 BIND dn="uid,dc=domain,dc=com" method=128 version=3
[12/Mar/2012:14:33:06 -0500] conn=14730 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=theManager,dc=domain,dc=com"
[12/Mar/2012:14:33:06 -0500] conn=14730 op=1 SRCH base="ou=users,ou=external,dc=domain,dc=com" scope=2 filter="(&(uid=xxxxx)(objectClass=inetUser))" attrs="1.1"
[12/Mar/2012:14:33:06 -0500] conn=14730 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[12/Mar/2012:14:33:06 -0500] conn=14730 op=2 BIND dn="uid=xxxxx,ou=users,ou=external,dc=domain,dc=com" method=128 version=3
[12/Mar/2012:14:33:06 -0500] conn=14730 op=2 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=xxxxx,ou=users,ou=external,dc=domain,dc=com"
[12/Mar/2012:14:33:06 -0500] conn=14730 op=3 BIND dn="uid,dc=domain,dc=com" method=128 version=3
[12/Mar/2012:14:33:06 -0500] conn=14730 op=3 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=theManager,dc= domain,dc=com"
[12/Mar/2012:14:35:20 -0500] conn=14730 op=4 SRCH base="ou=groups,ou=external,dc= domain,dc=com" scope=2 filter="(&(cn=domain)(|(objectClass=groupOfURLs)(objectClass=groupOfNames)))" attrs="1.1"
[12/Mar/2012:14:35:20 -0500] conn=14730 op=4 RESULT err=0 tag=101 nentries=1 etime=0
[12/Mar/2012:14:35:20 -0500] conn=14730 op=5 SRCH base="ou=groups,ou=external,dc= domain,dc=com" scope=2 filter="(&(member=cn=domain,ou=groups,ou=external,dc=domain,dc=com)(objectClass=groupOfNames))" attrs="cn"
[12/Mar/2012:14:35:20 -0500] conn=14730 op=5 RESULT err=0 tag=101 nentries=0 etime=0
[12/Mar/2012:14:36:50 -0500] conn=14730 op=6 SRCH base="ou=users,ou=external,dc=domain,dc=com" scope=2 filter="(&(uid=xxxxxx)(objectClass=inetUser))" attrs="1.1"
[12/Mar/2012:14:36:50 -0500] conn=14730 op=6 RESULT err=0 tag=101 nentries=1 etime=0
[12/Mar/2012:14:36:50 -0500] conn=14730 op=7 BIND dn="uid=xxxxxx,ou=users,ou=external,dc=domain,dc=com" method=128 version=3
[12/Mar/2012:14:36:50 -0500] conn=14730 op=7 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=xxxxxx,ou=users,ou=external,dc=domain,dc=com"
[12/Mar/2012:14:36:50 -0500] conn=14730 op=8 BIND dn="uid=theManager,dc=domain,dc=com" method=128 version=3
[12/Mar/2012:14:36:50 -0500] conn=14730 op=8 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=theManager,dc=domain,dc=com"
[12/Mar/2012:14:37:02 -0500] conn=14730 op=9 SRCH base="ou=groups,ou=external,dc=domain,dc=com" scope=2 filter="(&(cn=domain)(|(objectClass=groupOfURLs)(objectClass=groupOfNames)))" attrs="1.1"
[12/Mar/2012:14:37:02 -0500] conn=14730 op=9 RESULT err=0 tag=101 nentries=1 etime=0
[12/Mar/2012:14:37:02 -0500] conn=14730 op=10 SRCH base="ou=groups,ou=external,dc=domain,dc=com" scope=2 filter="(&(member=cn=domain,ou=groups,ou=external,dc=domain,dc=com)(objectClass=groupOfNames))" attrs="cn"
[12/Mar/2012:14:37:02 -0500] conn=14730 op=10 RESULT err=0 tag=101 nentries=0 etime=0
[12/Mar/2012:14:39:35 -0500] conn=14730 op=11 SRCH base="ou=users,ou=external,dc=domain,dc=com" scope=2 filter="(&(uid=xxxxxx)(objectClass=inetUser))" attrs="1.1"
[12/Mar/2012:14:39:35 -0500] conn=14730 op=11 RESULT err=0 tag=101 nentries=1 etime=0
[12/Mar/2012:14:39:35 -0500] conn=14730 op=12 BIND dn="uid=xxxxxx,ou=users,ou=external,dc=domain,dc=com" method=128 version=3
[12/Mar/2012:14:39:35 -0500] conn=14730 op=12 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=xxxxxx,ou=users,ou=external,dc=domain,dc=com"
[12/Mar/2012:14:39:35 -0500] conn=14730 op=13 BIND dn="uid=theManager,dc=domain,dc=com" method=128 version=3
[12/Mar/2012:14:39:35 -0500] conn=14730 op=13 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=theManager,dc=domain,dc=com"
[12/Mar/2012:14:40:23 -0500] conn=14730 op=14 SRCH base="ou=groups,ou=external,dc=domain,dc=com" scope=2 filter="(&(cn=domain)(|(objectClass=groupOfURLs)(objectClass=groupOfNames)))" attrs="1.1"
[12/Mar/2012:14:40:23 -0500] conn=14730 op=14 RESULT err=0 tag=101 nentries=1 etime=0
[12/Mar/2012:14:40:23 -0500] conn=14730 op=15 SRCH base="ou=groups,ou=external,dc=domain,dc=com" scope=2 filter="(&(member=cn=domain,ou=groups,ou=external,dc=domain,dc=com)(objectClass=groupOfNames))" attrs="cn"
[12/Mar/2012:14:40:23 -0500] conn=14730 op=15 RESULT err=0 tag=101 nentries=0 etime=0

The scenario seems to be that the DS works fine after a restart until it runs out of unused connections and/or file descriptors (max FDs= 8192). When it starts recycling connections and/or file descriptors the 104 errors start appearing more often in the access logs and we start getting more authentication errors.  We suspect that the original connection never got terminated correctly but don't know if it is the application that is at fault or a DS setting.

Our servers have been tuned according to the wiki doc at http://directory.fedoraproject.org/wiki/Performance_Tuning#Linux
We have set our idle "timeout" to 60 seconds and search "timelimit" to 120 seconds with no change in behavior.

Watching netstat -nap | grep slapd shows established connections that do not drop off, just continually grow.

Any help would be greatly appreciated.

Nicholas J Alther
Sr. Software Developer/Analyst
Black Hills Corporation
Phone: 605.721.2158
Cell:     605.593.1899



Nicholas J Alther
Sr. Software Developer/Analyst
Phone: 605.721.2158
Cell:     605.593.1899


________________________________

This electronic message transmission contains information from Black Hills Corporation, its affiliate or subsidiary, which may be confidential or privileged. The information is intended to be for the use of the individual or entity named above. If you are not the intended recipient, be aware the disclosure, copying, distribution or use of the contents of this information is prohibited. If you received this electronic transmission in error, please reply to sender immediately; then delete this message without copying it or further reading.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/389-users/attachments/20120314/11e2da9b/attachment.html>


More information about the 389-users mailing list