[389-users] Crashing

Rich Megginson rmeggins at redhat.com
Mon Aug 8 15:13:36 UTC 2011


On 08/05/2011 10:46 AM, Wendt, Trevor wrote:
>
> Hello all,
>
> Need some help with tuning and crash debugging. We're running 
> Fedora-Directory/1.0.4 B2006.312.1539. The problem is on our 
> "Dedicated Consumer" machine running on RHEL 5. We have over ~150,000 
> users authenticating against our FDS systems. System resources are not 
> a problem (~.39 load, free memory, 92k swap)
>
> For months, the system is solid without any issues then we seem to get 
> a large spike in traffic and FDS crashes. I run Monit so the service 
> is restarted automatically but I cannot figure out why the service 
> keeps crashing.
>
> FDS was setup and tuned based off: 
> http://directory.fedoraproject.org/wiki/Performance_Tuning#Linux
>
> I have reviewed 
> http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes as well, 
> but some of that is over my head.
>
Unfortunately these directions are for 1.1.x and later.  Most of the 
paths/filenames have changed since 1.0.4 in the move to the FHS style 
layout, and there is no debuginfo package.  But we may still be able to 
get a core file and some stack information:

sysctl -w fs.suid_dumpable=1

edit /opt/fedora-ds/slapd-YOURINSTANCE/start-slapd
somewhere near the top, add the line
ulimit -c unlimited

restart the directory server
/opt/fedora-ds/slapd-YOURINSTANCE/restart-slapd

If you get a crash, you should have a core file in 
/opt/fedora-ds/slapd-YOURINSTANCE/logs

After that, install gdb

follow the instructions at 
http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes
except:
cd /opt/fedora-ds/slapd-YOURINSTANCE/logs
gdb ../../bin/slapd/server/ns-slapd core.PID

> I have turned buffering off and increased the logging level in the 
> LDAP config.
>
What is the last operation in the access log before a crash?  Any 
corresponding errors in the errors log?


> Here is our "monitor" script output:
>
> version: 1
>
> dn: cn=monitor
>
> objectClass: top
>
> objectClass: extensibleObject
>
> cn: monitor
>
> version: Fedora-Directory/1.0.4 B2006.312.1539
>
> threads: 30
>
> currentconnections: 19
>
> totalconnections: 11918
>
> dtablesize: 8192
>
> readwaiters: 0
>
> opsinitiated: 43703
>
> opscompleted: 43702
>
> entriessent: 16086
>
> bytessent: 2911011
>
> currenttime: 20110805164243Z
>
> starttime: 20110805114053Z
>
> nbackends: 2
>
So about 8700 ops/hour.  Not a heavy load.
>
> Here is our "Access Log Analyzer" summary for a 24 hour period:
>
> ---------------------------------------------------------------
>
> Access Log Analyzer 6.0
>
> Filename                        Total Lines     Lines processed
>
> ---------------------------------------------------------------
>
> /opt/fedora-ds/slapd/logs/access  298225                298231
>
> ----------- Access Log Output ------------
>
> Restarts:                     6
>
> Total Connections:            39720
>
> Peak Concurrent Connections:  84
>
> Total Operations:             95471
>
> Total Results:                95393
>
> Overall Performance:          99.9%
>
> Searches:                     48215
>
> Modifications:                167
>
> Adds:                         551
>
> Deletes:                      2
>
> Mod RDNs:                     0
>
> 6.x Stats
>
> Persistent Searches:          0
>
> Internal Operations:          0
>
> Entry Operations:             0
>
> Extended Operations:          845
>
> Abandoned Requests:           0
>
> Smart Referrals Received:     0
>
> VLV Operations:               0
>
> VLV Unindexed Searches:       0
>
> SORT Operations:              0
>
> SSL Connections:              0
>
> Entire Search Base Queries:   0
>
> Unindexed Searches:           6
>
> FDs Taken:                    39720
>
> FDs Returned:                 39657
>
> Highest FD Taken:             93
>
> Broken Pipes:                 0
>
> Connections Reset By Peer:    0
>
> Resource Unavailable:         10872
>
>      -  10872 (T1) Idle Timeout Exceeded
>
> Binds:                        45691
>
> Unbinds:                      27987
>
> LDAP v2 Binds:               15694
>
> LDAP v3 Binds:               29997
>
> SSL Client Binds:            0
>
> Failed SSL Client Binds:     0
>
> SASL Binds:                  0
>
> Directory Manager Binds:     0
>
> Anonymous Binds:             16346
>
> Other Binds:                 29345
>
> ---------------------------------------------------------------
>
> In FDS console:
>
> -- Configuration > Performance tab: Size Limit: 2000, Time Limit: 
> 3600, Idle Timeout: 60, Max file descriptors: 8192.
>
The idle timeout is 1 minute - could be too low for some of your 
clients, which is why you're seeing a lot of (T1) Idle Timeout Exceeded 
connection closes.
>
> -- Configuration > Data > Database Link Settings > Connection 
> Management: Max TCP Connections: 10, Bind timeout: 20, Max binds per 
> connection: 20, Timeout before abandon: 10, Max LDAP Connections: 20, 
> Max bind retries: 3, Max operations per connection: 5, connection 
> life: 60.
>
Are you using database links?

I also suggest looking at your database cache tuning - see 
http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/8.2/html-single/Administration_Guide/index.html#Monitoring_Server_and_Database_Activity-Monitoring_Database_Activity
>
> We have talked about moving to the latest 389 Directory packages and I 
> have  a migration process tested out so it's a matter of getting the 
> OK and time but I doubt the upgrade will solve our crashing problem.
>
I can't say for sure, but 1.0.4 is very old, and since then we have 
fixed many issues which have caused crashes.
>
> It seems to me we are hitting some limits that just haven't been 
> accounted for yet and that is where I need help.
>
Let's start with analyzing the crash data - if we can get a core file 
and a stack trace, then we can work from there to figure out why it's 
crashing.
>
> Any suggestions on how to proceed with stopping these crashes is 
> welcomed! Thanks for reading.
>
> *Trevor*
>
>
> ------------------------------------------------------------------------
>
> This electronic message transmission contains information from Black 
> Hills Corporation, its affiliate or subsidiary, which may be 
> confidential or privileged. The information is intended to be for the 
> use of the individual or entity named above. If you are not the 
> intended recipient, be aware the disclosure, copying, distribution or 
> use of the contents of this information is prohibited. If you received 
> this electronic transmission in error, please reply to sender 
> immediately; then delete this message without copying it or further 
> reading.
>
>
> --
> 389 users mailing list
> 389-users at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/389-users/attachments/20110808/c888d1e9/attachment.html>


More information about the 389-users mailing list