[389-users] Skipped request ...

Reinhard Nappert rnappert at juniper.net
Fri May 14 12:38:45 UTC 2010


No, the client is at the local box. What I did was that I "replay" the access file, where the initial problem occurred. This may be just a matter of replaying it too fast, but then it occurred already after 257 searches, which is not a lot.....

-Reinhard

-----Original Message-----
From: 389-users-bounces at lists.fedoraproject.org [mailto:389-users-bounces at lists.fedoraproject.org] On Behalf Of Rich Megginson
Sent: Thursday, May 13, 2010 10:58 PM
To: General discussion list for the 389 Directory server project.
Subject: Re: [389-users] Skipped request ...

Reinhard Nappert wrote:
>  
> Rich,
>
> Let's see if this give you a hint. Hopefully, it does ....
>   
Looks really strange, as if the stack trace is corrupted e.g.

#7  0x00002b5577b32fbb in plugin_call_plugins (pb=0xa8ad30, whichfunction=410)
    at ../ldap/servers/slapd/plugin.c:326
#8  0x00002b5577b3e0fb in flush_ber (pb=0x1, conn=0x2b55789faa49, op=0xa8ad30, 
    ber=0x2, type=2) at ../ldap/servers/slapd/result.c:1500

flush_ber() line 1500 is a call to PR_Lock().  It looks like many, many threads attempting to write to the client, but the client is simply not reading the reply, or there is some sort of network problem (are you using some sort of IP failover/load balancing device?).

What is your ioblocktimeout?
> Thanks,
> -Reinhard
>
>
> -----Original Message-----
> From: 389-users-bounces at lists.fedoraproject.org 
> [mailto:389-users-bounces at lists.fedoraproject.org] On Behalf Of Rich 
> Megginson
> Sent: Thursday, May 13, 2010 6:48 PM
> To: General discussion list for the 389 Directory server project.
> Subject: Re: [389-users] Skipped request ...
>
> Reinhard Nappert wrote:
>   
>> How can I make ns-slapd to produce a core.
>>   
>>     
> You first have to start ns-slapd in an environment that you have done 
> ulimit -c unlimited then kill -11 <ns-slapd pid>
>   
>> I got it in the same state again, and I did a gdb 
>> /opt/UMC/jdb/sbin/ns-slapd 16712
>>
>> 0x00002b8e908889a2 in poll () from /lib64/tls/libc.so.6
>> (gdb) where
>> #0  0x00002b8e908889a2 in poll () from /lib64/tls/libc.so.6
>> #1  0x00002b8e906b6a5f in PR_Poll () from 
>> /opt/UMC/jdb/lib/dirsrv/libnspr4.so
>> #2  0x0000000000415ae7 in slapd_daemon (ports=0x7fff1b44cf50)
>>     at ../ldap/servers/slapd/daemon.c:662
>> #3  0x000000000041c0b3 in main (argc=7, argv=0x7fff1b44d098)
>>     at ../ldap/servers/slapd/main.c:1162
>>
>> Does this help?
>>   
>>     
> Try this:
> (gdb) set logging file /tmp/slapd.txt
> (gdb) set logging on
> (gdb) thread apply all bt
> # hit return as many times as needed
> (gdb) quit
>
> Then send me /tmp/slapd.txt
>   
>> -Reinhard
>>
>> -----Original Message-----
>> From: 389-users-bounces at lists.fedoraproject.org
>> [mailto:389-users-bounces at lists.fedoraproject.org] On Behalf Of Rich 
>> Megginson
>> Sent: Thursday, May 13, 2010 6:04 PM
>> To: General discussion list for the 389 Directory server project.
>> Subject: Re: [389-users] Skipped request ...
>>
>> Reinhard Nappert wrote:
>>   
>>     
>>> Hi Rick,
>>>
>>> I attached access and error file with debug level 8. The server does not respond to any requests anymore. If you kill the client, it responds afterwards.
>>>
>>> Let me know, what you see.
>>>   
>>>     
>>>       
>> I don't see anything obvious.  One thing I do know is that this code has been improved since 1.1.2 (especially the debugging, which not very usefully prints the file descriptor addresses in int format :P)  I don't suppose you could try to reproduce this with 1.2.5?
>>   
>>     
>>> Thanks,
>>> -Reinhard
>>>
>>> -----Original Message-----
>>> From: 389-users-bounces at lists.fedoraproject.org
>>> [mailto:389-users-bounces at lists.fedoraproject.org] On Behalf Of Rich 
>>> Megginson
>>> Sent: Thursday, May 13, 2010 1:10 PM
>>> To: General discussion list for the 389 Directory server project.
>>> Subject: Re: [389-users] Skipped request ...
>>>
>>> Reinhard Nappert wrote:
>>>   
>>>     
>>>       
>>>> Rich, which debugging level do you suggest? Apparently, I tried to much, because it would crash the server constantly.
>>>>     
>>>>       
>>>>         
>>> Debugging levels should not crash the server - can provide more information about the crash?
>>>   
>>>     
>>>       
>>>> For now, I go just with 8 (Connection Management). Seeing the problem, what would you enable?
>>>>   
>>>>     
>>>>       
>>>>         
>>> Yes, start with 8.
>>>   
>>>     
>>>       
>>>> Thanks,
>>>> -Reinhard
>>>>
>>>> -----Original Message-----
>>>> From: 389-users-bounces at lists.fedoraproject.org
>>>> [mailto:389-users-bounces at lists.fedoraproject.org] On Behalf Of 
>>>> Rich Megginson
>>>> Sent: Wednesday, May 12, 2010 6:50 PM
>>>> To: General discussion list for the 389 Directory server project.
>>>> Subject: Re: [389-users] Skipped request ...
>>>>
>>>> Reinhard Nappert wrote:
>>>>   
>>>>     
>>>>       
>>>>         
>>>>> Hi Rich,
>>>>>
>>>>> I ran some further tests. This entire thing looks kind of weird. I have a kind of monitoring tool, I use to figure out if the server still responds in a timely manner. This tool performs an anonymous bind and reads a specific object, every 30 seconds.
>>>>>     
>>>>>       
>>>>>         
>>>>>           
>>>> Does it perform an unbind operation?  Does it disconnect the socket?
>>>>   
>>>>     
>>>>       
>>>>         
>>>>>  What I see is that the server responds to the incoming request and it performs about 500 requests within those 30 seconds. Then, I see, when the next monitoring connection request comes is, but I never see the bind. Since this times out, the monitoring tool restarts the server after a while (about 10 seconds).
>>>>>
>>>>> Here are the logs in access:
>>>>> [11/May/2010:22:12:20 -0400] conn=94 fd=83 slot=83 connection from
>>>>> 127.0.0.1 to 127.0.0.1
>>>>> [11/May/2010:22:13:24 -0400] conn=0 fd=64 slot=64 SSL connection 
>>>>> from
>>>>> 10.227.6.45 to 10.227.6.53
>>>>>
>>>>> So, you see the server does not respond to any requests after 
>>>>> [11/May/2010:22:12:20 -0400] conn=94 fd=83 slot=83 connection from
>>>>> 127.0.0.1 to 127.0.0.1
>>>>>
>>>>> And start responding, once it was restarted:
>>>>> [11/May/2010:22:13:24 -0400] conn=0 fd=64 slot=64 SSL connection 
>>>>> from
>>>>> 10.227.6.45 to 10.227.6.53
>>>>>
>>>>> I was wondering , if we could get somehow some debugging out of ns-slapd, once it is in this state (truss or something else).
>>>>>   
>>>>>     
>>>>>       
>>>>>         
>>>>>           
>>>> http://directory.fedoraproject.org/wiki/FAQ#Troubleshooting
>>>> If that produces too much error log output, or kills the 
>>>> performance, you can also try replacing the error log with a named
>>>> pipe+script -
>>>> http://directory.fedoraproject.org/wiki/Named_Pipe_Log_Script
>>>> man ds-logpipe.py
>>>>   
>>>>     
>>>>       
>>>>         
>>>>> Any help is appreciated.
>>>>>
>>>>> Thanks,
>>>>> -Reinhard
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: 389-users-bounces at lists.fedoraproject.org
>>>>> [mailto:389-users-bounces at lists.fedoraproject.org] On Behalf Of 
>>>>> Rich Megginson
>>>>> Sent: Tuesday, May 11, 2010 5:21 PM
>>>>> To: General discussion list for the 389 Directory server project.
>>>>> Subject: Re: [389-users] Skipped request ...
>>>>>
>>>>> Reinhard Nappert wrote:
>>>>>   
>>>>>     
>>>>>       
>>>>>         
>>>>>           
>>>>>> Hi all,
>>>>>>  
>>>>>> I have seen a weird behavior of my DS (1.1.2). It has a very 
>>>>>> small database (only about 2300 objects). A client performed a 
>>>>>> one-level search retrieving the children. The server find 114 
>>>>>> objects, but the search was very slow:
>>>>>>  
>>>>>> [06/May/2010:12:23:11 +0000] conn=127 op=149 SRCH base=<base>
>>>>>> scope=1 filter="(&(&(objectClass=<xyz>)(<att1>=value))(!(<att2>=TRUE)))"
>>>>>>  
>>>>>> yes, the filter is a bit complex, but both attribute types <att1> 
>>>>>> and <att2> are indexed. This search usually is fast. It looks to 
>>>>>> me that the server is already in a funny state.
>>>>>> ...
>>>>>> [06/May/2010:12:23:17 +0000] conn=127 op=149 RESULT err=3 tag=101
>>>>>> nentries=114 etime=7
>>>>>>     
>>>>>>       
>>>>>>         
>>>>>>           
>>>>>>             
>>>>> err=3 is TIMELIMIT_EXCEEDED - that's probably why you aren't getting all of the results you expect, and could be why it's skipping the op.
>>>>>   
>>>>>     
>>>>>       
>>>>>         
>>>>>           
>>>>>>  
>>>>>> When the client gets the results, it iterates over those and gets 
>>>>>> its children, like:
>>>>>>  
>>>>>> [06/May/2010:12:23:17 +0000] conn=127 op=150 SRCH base=<dn of 
>>>>>> result from previous SRCH> scope=1 
>>>>>> filter="(&(&(objectClass=<uvw>)(<attr3>=*))(!(<attr2>=TRUE)))" attrs=ALL.
>>>>>> Those searches are quick:
>>>>>> [06/May/2010:12:23:17 +0000] conn=127 op=150 RESULT err=0 tag=101
>>>>>> nentries=1 etime=0
>>>>>>  
>>>>>> but somehow the server does not process on of the requests, when 
>>>>>> the client iterates over the results:
>>>>>>  
>>>>>> [06/May/2010:12:23:18 +0000] conn=127 op=263 SRCH base=<dn of 
>>>>>> result from previous SRCH> scope=1 
>>>>>> filter="(&(&(objectClass=<uvw>)(<attr3>=*))(!(<attr2>=TRUE)))" attrs=ALL.
>>>>>> [06/May/2010:12:23:18 +0000] conn=127 op=263 RESULT err=0 tag=101
>>>>>> nentries=1 etime=0
>>>>>> [06/May/2010:12:23:26 +0000] conn=127 op=265 SRCH base=<dn of 
>>>>>> result from previous SRCH> scope=1 
>>>>>> filter="(&(&(objectClass=<uvw>)(<attr3>=*))(!(<attr2>=TRUE)))" attrs=ALL.
>>>>>> [06/May/2010:12:23:26 +0000] conn=127 op=265 RESULT err=0 tag=101 
>>>>>> nentries=0 etime=0 You can see that the server skipped op=264. It 
>>>>>> looks to me that the request came in, but somehow the server 
>>>>>> joked up, before it could log the request in access.
>>>>>>  
>>>>>> Has anybody seen such a behavior before?
>>>>>>  
>>>>>> Thanks,
>>>>>> -Reinhard
>>>>>>
>>>>>>  
>>>>>> -----------------------------------------------------------------
>>>>>> -
>>>>>> -
>>>>>> -
>>>>>> --
>>>>>> --
>>>>>>
>>>>>> --
>>>>>> 389 users mailing list
>>>>>> 389-users at lists.fedoraproject.org 
>>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>>>>     
>>>>>>       
>>>>>>         
>>>>>>           
>>>>>>             
>>>>> --
>>>>> 389 users mailing list
>>>>> 389-users at lists.fedoraproject.org
>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>>> --
>>>>> 389 users mailing list
>>>>> 389-users at lists.fedoraproject.org
>>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>>>   
>>>>>     
>>>>>       
>>>>>         
>>>>>           
>>>> --
>>>> 389 users mailing list
>>>> 389-users at lists.fedoraproject.org
>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>> --
>>>> 389 users mailing list
>>>> 389-users at lists.fedoraproject.org
>>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>>   
>>>>     
>>>>       
>>>>         
>>> --
>>> 389 users mailing list
>>> 389-users at lists.fedoraproject.org
>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>   
>>> --------------------------------------------------------------------
>>> -
>>> -
>>> --
>>>
>>> --
>>> 389 users mailing list
>>> 389-users at lists.fedoraproject.org
>>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>>     
>>>       
>> --
>> 389 users mailing list
>> 389-users at lists.fedoraproject.org
>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>> --
>> 389 users mailing list
>> 389-users at lists.fedoraproject.org
>> https://admin.fedoraproject.org/mailman/listinfo/389-users
>>   
>>     
>
> --
> 389 users mailing list
> 389-users at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users
>   
> ----------------------------------------------------------------------
> --
>
> --
> 389 users mailing list
> 389-users at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/389-users

--
389 users mailing list
389-users at lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users



More information about the 389-users mailing list