[389-users] Crashing
Rich Megginson
rmeggins at redhat.com
Mon Aug 8 15:13:36 UTC 2011
On 08/05/2011 10:46 AM, Wendt, Trevor wrote:
>
> Hello all,
>
> Need some help with tuning and crash debugging. We're running
> Fedora-Directory/1.0.4 B2006.312.1539. The problem is on our
> "Dedicated Consumer" machine running on RHEL 5. We have over ~150,000
> users authenticating against our FDS systems. System resources are not
> a problem (~.39 load, free memory, 92k swap)
>
> For months, the system is solid without any issues then we seem to get
> a large spike in traffic and FDS crashes. I run Monit so the service
> is restarted automatically but I cannot figure out why the service
> keeps crashing.
>
> FDS was setup and tuned based off:
> http://directory.fedoraproject.org/wiki/Performance_Tuning#Linux
>
> I have reviewed
> http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes as well,
> but some of that is over my head.
>
Unfortunately these directions are for 1.1.x and later. Most of the
paths/filenames have changed since 1.0.4 in the move to the FHS style
layout, and there is no debuginfo package. But we may still be able to
get a core file and some stack information:
sysctl -w fs.suid_dumpable=1
edit /opt/fedora-ds/slapd-YOURINSTANCE/start-slapd
somewhere near the top, add the line
ulimit -c unlimited
restart the directory server
/opt/fedora-ds/slapd-YOURINSTANCE/restart-slapd
If you get a crash, you should have a core file in
/opt/fedora-ds/slapd-YOURINSTANCE/logs
After that, install gdb
follow the instructions at
http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes
except:
cd /opt/fedora-ds/slapd-YOURINSTANCE/logs
gdb ../../bin/slapd/server/ns-slapd core.PID
> I have turned buffering off and increased the logging level in the
> LDAP config.
>
What is the last operation in the access log before a crash? Any
corresponding errors in the errors log?
> Here is our "monitor" script output:
>
> version: 1
>
> dn: cn=monitor
>
> objectClass: top
>
> objectClass: extensibleObject
>
> cn: monitor
>
> version: Fedora-Directory/1.0.4 B2006.312.1539
>
> threads: 30
>
> currentconnections: 19
>
> totalconnections: 11918
>
> dtablesize: 8192
>
> readwaiters: 0
>
> opsinitiated: 43703
>
> opscompleted: 43702
>
> entriessent: 16086
>
> bytessent: 2911011
>
> currenttime: 20110805164243Z
>
> starttime: 20110805114053Z
>
> nbackends: 2
>
So about 8700 ops/hour. Not a heavy load.
>
> Here is our "Access Log Analyzer" summary for a 24 hour period:
>
> ---------------------------------------------------------------
>
> Access Log Analyzer 6.0
>
> Filename Total Lines Lines processed
>
> ---------------------------------------------------------------
>
> /opt/fedora-ds/slapd/logs/access 298225 298231
>
> ----------- Access Log Output ------------
>
> Restarts: 6
>
> Total Connections: 39720
>
> Peak Concurrent Connections: 84
>
> Total Operations: 95471
>
> Total Results: 95393
>
> Overall Performance: 99.9%
>
> Searches: 48215
>
> Modifications: 167
>
> Adds: 551
>
> Deletes: 2
>
> Mod RDNs: 0
>
> 6.x Stats
>
> Persistent Searches: 0
>
> Internal Operations: 0
>
> Entry Operations: 0
>
> Extended Operations: 845
>
> Abandoned Requests: 0
>
> Smart Referrals Received: 0
>
> VLV Operations: 0
>
> VLV Unindexed Searches: 0
>
> SORT Operations: 0
>
> SSL Connections: 0
>
> Entire Search Base Queries: 0
>
> Unindexed Searches: 6
>
> FDs Taken: 39720
>
> FDs Returned: 39657
>
> Highest FD Taken: 93
>
> Broken Pipes: 0
>
> Connections Reset By Peer: 0
>
> Resource Unavailable: 10872
>
> - 10872 (T1) Idle Timeout Exceeded
>
> Binds: 45691
>
> Unbinds: 27987
>
> LDAP v2 Binds: 15694
>
> LDAP v3 Binds: 29997
>
> SSL Client Binds: 0
>
> Failed SSL Client Binds: 0
>
> SASL Binds: 0
>
> Directory Manager Binds: 0
>
> Anonymous Binds: 16346
>
> Other Binds: 29345
>
> ---------------------------------------------------------------
>
> In FDS console:
>
> -- Configuration > Performance tab: Size Limit: 2000, Time Limit:
> 3600, Idle Timeout: 60, Max file descriptors: 8192.
>
The idle timeout is 1 minute - could be too low for some of your
clients, which is why you're seeing a lot of (T1) Idle Timeout Exceeded
connection closes.
>
> -- Configuration > Data > Database Link Settings > Connection
> Management: Max TCP Connections: 10, Bind timeout: 20, Max binds per
> connection: 20, Timeout before abandon: 10, Max LDAP Connections: 20,
> Max bind retries: 3, Max operations per connection: 5, connection
> life: 60.
>
Are you using database links?
I also suggest looking at your database cache tuning - see
http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/8.2/html-single/Administration_Guide/index.html#Monitoring_Server_and_Database_Activity-Monitoring_Database_Activity
>
> We have talked about moving to the latest 389 Directory packages and I
> have a migration process tested out so it's a matter of getting the
> OK and time but I doubt the upgrade will solve our crashing problem.
>
I can't say for sure, but 1.0.4 is very old, and since then we have
fixed many issues which have caused crashes.
>
> It seems to me we are hitting some limits that just haven't been
> accounted for yet and that is where I need help.
>
Let's start with analyzing the crash data - if we can get a core file
and a stack trace, then we can work from there to figure out why it's
crashing.
>
> Any suggestions on how to proceed with stopping these crashes is
> welcomed! Thanks for reading.
>
> *Trevor*
>
>
> ------------------------------------------------------------------------
>
> This electronic message transmission contains information from Black
> Hills Corporation, its affiliate or subsidiary, which may be
> confidential or privileged. The information is intended to be for the
> use of the individual or entity named above. If you are not the
> intended recipient, be aware the disclosure, copying, distribution or
> use of the contents of this information is prohibited. If you received
> this electronic transmission in error, please reply to sender
> immediately; then delete this message without copying it or further
> reading.
>
>
> --
> 389 users mailing list
> 389-users at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/389-users/attachments/20110808/c888d1e9/attachment.html>
More information about the 389-users
mailing list