On 11/11/2014 05:01 PM, Orion Poplawski wrote:
On 11/05/2014 08:16 PM, Orion Poplawski wrote:
> Just recently we're seeing some very strange behavior on our system.
> Periodically we will see a sssd process start to have an ever greater number
> of connections to our ldap server until the server runs out of file
> descriptors. This seems to be happening with a particular user, who is having
> trouble logging in at times, particularly with email (dovecot). We see
> entries like the following on our sever:
>
> [05/Nov/2014:17:14:51 -0700] conn=1786153 op=0 EXT
> oid="1.3.6.1.4.1.1466.20037" name="startTLS"
> [05/Nov/2014:17:14:51 -0700] conn=1786153 op=0 RESULT err=0 tag=120 nentries=0
> etime=0
> [05/Nov/2014:17:14:51 -0700] conn=1786153 SSL 128-bit AES
> [05/Nov/2014:17:14:51 -0700] conn=1786153 op=1 BIND
> dn="uid=user,ou=People,dc=domain,dc=com" method=128 version=3
> [05/Nov/2014:17:14:56 -0700] conn=1786153 op=2 ABANDON targetop=NOTFOUND msgid=2
> [05/Nov/2014:17:14:56 -0700] conn=1786153 op=3 UNBIND
> [05/Nov/2014:17:14:56 -0700] conn=1786153 op=3 fd=1022 closed - U1
>
> I don't yet have debug info from the sssd process. Any ideas from the above?
>
> Restarting the sssd process seems to clear things up for a while.
>
> - Orion
>
I tried turning on some error log tracing to see if that helps:
....
[11/Nov/2014:16:33:55 -0700] - <= find_entry_internal
[11/Nov/2014:16:33:55 -0700] - => slapi_pw_find value: "PASSWORD"
[11/Nov/2014:16:33:55 -0700] - <= slapi_pw_find matched "PASSWORD_HASH"
using
scheme "SSHA"
[11/Nov/2014:16:33:55 -0700] - => reslimit_update_from_entry()
conn=0x9465f290, entry=0x48011630
[11/Nov/2014:16:33:55 -0700] - reslimit_update_from_entry(): setting limit for
handle 0 (based on nsLookThroughLimit)
[11/Nov/2014:16:33:55 -0700] - reslimit_update_from_entry(): setting limit for
handle 1 (based on nsIDListScanLimit)
[11/Nov/2014:16:33:55 -0700] - reslimit_update_from_entry(): setting limit for
handle 2 (based on nsPagedLookThroughLimit)
[11/Nov/2014:16:33:55 -0700] - reslimit_update_from_entry(): setting limit for
handle 3 (based on nsPagedIDListScanLimit)
[11/Nov/2014:16:33:55 -0700] - reslimit_update_from_entry(): setting limit for
handle 4 (based on nsRangeSearchLookThroughLimit)
[11/Nov/2014:16:33:55 -0700] - reslimit_update_from_entry(): setting limit for
handle 5 (based on nsSizeLimit)
[11/Nov/2014:16:33:55 -0700] - reslimit_update_from_entry(): setting limit for
handle 6 (based on nsTimeLimit)
[11/Nov/2014:16:33:55 -0700] - reslimit_update_from_entry(): setting limit for
handle 7 (based on nsPagedSizeLimit)
[11/Nov/2014:16:33:55 -0700] - reslimit_update_from_entry(): setting limit for
handle 8 (based on nsIdleTimeout)
[11/Nov/2014:16:33:55 -0700] - <= reslimit_update_from_entry() returning status 0
[11/Nov/2014:16:33:55 -0700] - => slapi_reslimit_get_integer_limit()
conn=0x9465f290, handle=8
[11/Nov/2014:16:33:55 -0700] - <= slapi_reslimit_get_integer_limit() returning
NO VALUE
So, on my good ldap server at this point I get:
[12/Nov/2014:10:54:20 -0700] - => send_ldap_result 0::
[12/Nov/2014:10:54:20 -0700] - <= send_ldap_result
but on my bad one instead I get:
[11/Nov/2014:16:33:55 -0700] - =>
slapi_pwpolicy_make_response_control[11/Nov/2014:16:33:55 -0700] - <=
slapi_pwpolicy_make_response_control[11/Nov/2014:16:33:55 -0700] - =>
slapi_add_pwd_control
which I don't see at all on the good one.
[11/Nov/2014:16:33:55 -0700] - Calling plugin 'Multimaster
replication
postoperation plugin' #5 type 501
[11/Nov/2014:16:33:55 -0700] - --> pagedresults_is_timedout
[11/Nov/2014:16:33:55 -0700] - <-- pagedresults_is_timedout: -
[11/Nov/2014:16:33:55 -0700] - --> pagedresults_is_timedout
[11/Nov/2014:16:33:55 -0700] - <-- pagedresults_is_timedout: -
[11/Nov/2014:16:33:55 -0700] - --> pagedresults_is_timedout
[11/Nov/2014:16:33:55 -0700] - <-- pagedresults_is_timedout: -
[11/Nov/2014:16:33:55 -0700] - --> pagedresults_is_timedout
[11/Nov/2014:16:33:55 -0700] - <-- pagedresults_is_timedout: -
[11/Nov/2014:16:33:55 -0700] - --> pagedresults_is_timedout
[11/Nov/2014:16:33:55 -0700] - <-- pagedresults_is_timedout: -
[11/Nov/2014:16:33:55 -0700] - --> pagedresults_is_timedout
[11/Nov/2014:16:33:55 -0700] - <-- pagedresults_is_timedout: -
[11/Nov/2014:16:33:56 -0700] - conn 2836 leaving turbo mode due to 4
[11/Nov/2014:16:33:56 -0700] - listener got signaled
--
Orion Poplawski
Technical Manager 303-415-9701 x222
NWRA, Boulder/CoRA Office FAX: 303-415-9702
3380 Mitchell Lane orion(a)nwra.com
Boulder, CO 80301
http://www.nwra.com