Hi, I've recently been trying to hunt down some odd performance problems with our installation of 389 LDAP (currently 1.3.2.19 but been following recent debian unstable). We've been seeing long delays (tens of seconds at times) handling even the simplest new bind()s while the server otherwise has idle worker threads (and other non-idle worker threads servicing existing conenctions).
Upon grabbing some userland thread stacks during these "hangs" when no new external connections could be established, I saw what looked to be the thread associated with slapd_daemon() in ldap/servers/slapd/daemon.c hung up in setup_pr_read_pds() walking the list of active connections acquiring connection locks (c->c_mutex) sequentially in the process. I stuck some calls to clock_gettime() around the PR_Lock(c->c_mutex) call or or about ldap/servers/slapd/daemon.c:1690 and warned when we waitied for more than a set duration:
[22/Jul/2014:17:37:05 +0000] - setup_pr_read_pds: (fd=192) waited 995.375473 msecs for lock
[22/Jul/2014:17:37:08 +0000] - setup_pr_read_pds: (fd=202) waited 3003.548263 msecs for lock
[22/Jul/2014:17:37:10 +0000] - setup_pr_read_pds: (fd=181) waited 1997.828897 msecs for lock
<up to 20-30 seconds in some extreme cases>
It looks like this could hang for up to CONN_TURBO_TIMEOUT_INTERVAL (default 1 second) per thread in turbo (up to 50% of worker pool by default). While stuck there, it isn't calling handle_listeners() to pull new connections off of the well known port.
Perhaps handle_listeners() should run off in its own thread, away from this connection maitenance? (or if it must be there, a non-blocking PRP_TryLock() or somesuch?)
TIA
Thomas