[389-users] Multi-Theading writes to the same 389 Master Server

Rich Megginson rmeggins at redhat.com
Thu Aug 29 22:33:40 UTC 2013


On 08/29/2013 04:22 PM, Jeffrey Dunham wrote:
> So following your advice I was able to get some stack traces while the 
> server was hanging/slow to respond.  This is from one of our search hosts.
> I have shortened it here considerably because we do have customer data 
> that is present, I can do some more scrubbing later if it will help.

I would like to have the full stack trace, all the way up to 
connection_threadmain - if you need to elide/obscure customer 
information, please do, but please include the full stack trace.

> Seems to me to be revolved around indexes, I know we increased our 
> allidslimit pretty high to 500000, I'm wondering if that has anything 
> to do with it.

Looks like the unindexed searches are hogging all of the resources and 
locking pages needed by updates.

>
>
> Out of the 30 worker threads 28 of them are in a state like:
> Thread 3 (Thread 0x2aef51f20940 (LWP 2569)):
> #0  0x000000328800b019 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> No symbol table info available.
> #1  0x00002aeeae1ba4f6 in __db_pthread_mutex_lock () from 
> /lib64/libdb-4.3.so <http://libdb-4.3.so>
> No symbol table info available.
> #2  0x00002aeeae242619 in __lock_get_internal () from 
> /lib64/libdb-4.3.so <http://libdb-4.3.so>
> No symbol table info available.
> #3  0x00002aeeae242b7f in __lock_vec () from /lib64/libdb-4.3.so 
> <http://libdb-4.3.so>
> No symbol table info available.
> #4  0x00002aeeae222d30 in __db_lget () from /lib64/libdb-4.3.so 
> <http://libdb-4.3.so>
> No symbol table info available.
> #5  0x00002aeeae1cac72 in __bam_search () from /lib64/libdb-4.3.so 
> <http://libdb-4.3.so>
> No symbol table info available.
> #6  0x00002aeeae1bd8d7 in ?? () from /lib64/libdb-4.3.so 
> <http://libdb-4.3.so>
> No symbol table info available.
> #7  0x00002aeeae1bea4f in ?? () from /lib64/libdb-4.3.so 
> <http://libdb-4.3.so>
> No symbol table info available.
> #8  0x00002aeeae218829 in __db_c_get () from /lib64/libdb-4.3.so 
> <http://libdb-4.3.so>
> No symbol table info available.
> #9  0x00002aeeadf289ed in idl_new_fetch (be=0x1dd03130, db=<value 
> optimized out>, inkey=0x2aef51f10760, txn=<value optimized out>, 
> a=0x1dd44940, flag_err=0x2aef51f175bc, allidslimit=500000) at 
> ldap/servers/slapd/back-ldbm/idl_new.c:223
>
> There is a large unindex'd query running on one of the other threads [ 
> base:  o=example.com <http://example.com>, filter: 
> (&(objectclass=posixaccount)(uid=*)) ] :
> Thread 8 (Thread 0x2aef4ed1b940 (LWP 2564)):
> #0  0x000000328800e5c8 in pread64 () from /lib64/libpthread.so.0
> No symbol table info available.
> #1  0x00002aeeae25c5dd in __os_io () from /lib64/libdb-4.3.so 
> <http://libdb-4.3.so>
> No symbol table info available.
> #2  0x00002aeeae25168b in __memp_pgread () from /lib64/libdb-4.3.so 
> <http://libdb-4.3.so>
> No symbol table info available.
> #3  0x00002aeeae2527dd in __memp_fget () from /lib64/libdb-4.3.so 
> <http://libdb-4.3.so>
> No symbol table info available.
> #4  0x00002aeeae1ca938 in __bam_search () from /lib64/libdb-4.3.so 
> <http://libdb-4.3.so>
> No symbol table info available.
> #5  0x00002aeeae1bd8d7 in ?? () from /lib64/libdb-4.3.so 
> <http://libdb-4.3.so>
> No symbol table info available.
> #6  0x00002aeeae1bea4f in ?? () from /lib64/libdb-4.3.so 
> <http://libdb-4.3.so>
> No symbol table info available.
> #7  0x00002aeeae218829 in __db_c_get () from /lib64/libdb-4.3.so 
> <http://libdb-4.3.so>
> No symbol table info available.
> #8  0x00002aeeae220fe6 in __db_get () from /lib64/libdb-4.3.so 
> <http://libdb-4.3.so>
> No symbol table info available.
> #9  0x00002aeeae22115a in __db_get_pp () from /lib64/libdb-4.3.so 
> <http://libdb-4.3.so>
> No symbol table info available.
> #10 0x00002aeeadf24266 in id2entry (be=0x1dd03130, id=7630577, 
> txn=0x2aef4ed104e0, err=0x2aef4ed10544) at 
> ldap/servers/slapd/back-ldbm/id2entry.c:315
>         inst = (ldbm_instance *) 0x1dc8d180
>         db = (DB *) 0x1dd01080
>         db_txn = (DB_TXN *) 0x0
>         key = {data = 0x2aef4ed10450, size = 4, ulen = 0, dlen = 0, 
> doff = 0, flags = 0}
>         data = {data = 0x0, size = 0, ulen = 0, dlen = 0, doff = 0, 
> flags = 4}
>         e = (struct backentry *) 0x0
>         ee = <value optimized out>
>         temp_id = "\000tnñ"
>
> And another locked worker thread:
> #0  0x000000328800d654 in __lll_lock_wait () from /lib64/libpthread.so.0
> No symbol table info available.
> #1  0x0000003288008f4a in _L_lock_1034 () from /lib64/libpthread.so.0
> No symbol table info available.
> #2  0x0000003288008e0c in pthread_mutex_lock () from 
> /lib64/libpthread.so.0
> No symbol table info available.
> #3  0x00002aeeae1ba54c in __db_pthread_mutex_lock () from 
> /lib64/libdb-4.3.so <http://libdb-4.3.so>
> No symbol table info available.
> #4  0x00002aeeae252a51 in __memp_fget () from /lib64/libdb-4.3.so 
> <http://libdb-4.3.so>
> No symbol table info available.
> #5  0x00002aeeae218d73 in __db_c_get () from /lib64/libdb-4.3.so 
> <http://libdb-4.3.so>
> No symbol table info available.
> #6  0x00002aeeadf28b63 in idl_new_fetch (be=0x1dd03130, db=<value 
> optimized out>, inkey=0x735755, txn=<value optimized out>, 
> a=0x1dd421f0, flag_err=0x2aef4e3115bc, allidslimit=500000) at 
> ldap/servers/slapd/back-ldbm/idl_new.c:298
>
> And the replication thread appears to be locked as well:
>
> #0  0x000000328800d654 in __lll_lock_wait () from /lib64/libpthread.so.0
> No symbol table info available.
> #1  0x0000003288008f80 in _L_lock_1233 () from /lib64/libpthread.so.0
> No symbol table info available.
> #2  0x0000003288008f03 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0
> No symbol table info available.
> #3  0x000000328ac23289 in PR_Lock () from /usr/lib64/libnspr4.so
> No symbol table info available.
> #4  0x000000328ac234cb in PR_EnterMonitor () from /usr/lib64/libnspr4.so
> No symbol table info available.
> #5  0x00002aeeadf1496c in cache_lock_entry (cache=0x1dc8d208, 
> e=0x2af02d468c00) at ldap/servers/slapd/back-ldbm/cache.c:1455
> No locals.
> #6  0x00002aeeadf23b31 in find_entry_internal (pb=0x2af022054ca0, 
> be=0x1dd03130, addr=<value optimized out>, lock=1, txn=0x2aef3ddf9cb0, 
> flags=0) at ldap/servers/slapd/back-ldbm/findentry.c:237
> No locals.
> #7  0x00002aeeadf4df1a in ldbm_back_modify (pb=0x2af022054ca0) at 
> ldap/servers/slapd/back-ldbm/ldbm_modify.c:269
>
>
> On Wed, Aug 21, 2013 at 9:14 AM, Rich Megginson <rmeggins at redhat.com 
> <mailto:rmeggins at redhat.com>> wrote:
>
>     On 08/21/2013 09:53 AM, David Boreham wrote:
>
>
>         Another thing you might try :
>
>         While the server is under stress, run the "pstack" command a
>         few times and save the output.
>
>
>     gdb will give much more detail
>     http://port389.org/wiki/FAQ#Debugging_Hangs
>
>
>         If you post the thread stacks here, someone familiar with the
>         code can say with more accuracy what's going on. For example
>         it will be obvious whether you have starved out the thread
>         pool, or you have threads mostly waiting on page locks in the
>         DB, etc.
>
>
>         -- 
>         389 users mailing list
>         389-users at lists.fedoraproject.org
>         <mailto:389-users at lists.fedoraproject.org>
>         https://admin.fedoraproject.org/mailman/listinfo/389-users
>
>
>     --
>     389 users mailing list
>     389-users at lists.fedoraproject.org
>     <mailto:389-users at lists.fedoraproject.org>
>     https://admin.fedoraproject.org/mailman/listinfo/389-users
>
>
>
>
> --
> 389 users mailing list
> 389-users at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/389-users/attachments/20130829/be5fca50/attachment.html>


More information about the 389-users mailing list