[Fedora-directory-users] FDS 1.1 Transport endpoint is not connected

Rich Megginson rmeggins at redhat.com
Wed Feb 20 23:39:27 UTC 2008


Richard Hesse wrote:
> Yeah, we¹re using SSL and TLS so ethereal/tcpdump isn¹t going to yield much
> info.
It would give us the TCP/IP protocol data, so we could see what clients 
and servers are sending the FIN and RST.  It's not so much the LDAP data 
I care about, although ssltap might be useful for that.
> The process hung again and strace didn¹t provide too much information
> other than this:
>
> futex(0x20b9260, FUTEX_WAIT, 2, NULL)
>
> Would that give you a place to start looking?
>   
That does suggest a possible deadlock.
> -richard
>
>
> On 2/19/08 4:04 PM, "Rich Megginson" <rmeggins at redhat.com> wrote:
>
>   
>> Richard Hesse wrote:
>>     
>>> Not much new to report. The server hung again and the only thing in the
>>> error log with connection tracing is this:
>>>
>>> [18/Feb/2008:13:14:03 +0000] - PR_Write(41818752) Netscape Portable Runtime
>>> error -5961 (TCP connection reset by peer.)
>>> [18/Feb/2008:13:14:03 +0000] - ber_flush failed, error 104 (Connection reset
>>> by peer)
>>>
>>> Which doesn't look like much.
>>>       
>> Well, it tells me that the server was attempting to write to a socket,
>> and got an error.  -5961 is PR_CONNECT_RESET_ERROR which can occur if
>> the system call returns either EPIPE or ECONNRESET.  And error 104 is
>> indeed ECONNRESET.
>> /usr/include/asm-generic/errno.h:#define        ECONNRESET      104
>> /* Connection reset by peer */
>>
>> AFAICT, this can happen if the client shuts down the socket (for any
>> number of reasons) but the server is still attempting to send data.  In
>> this case, the client will respond with a TCP RST.  I'm not sure how or
>> why this could happen.  I'm open to other causes for ECONNRESET.
>> What would be really, really interesting is if we could narrow this down
>> to a particular client application and run ethereal on the connection.
>>
>> Are you using SSL?
>>     
>>> As for network tuning, it's already been done.
>>>
>>> Max descriptors is set to 32768.
>>>
>>> Are there any gdb commands I can run while the server is in a hung state?
>>>
>>>       
>> Sure.  For whatever the cause of the ECONNRESET, it should not cause the
>> server to hang, and it would be interesting to find out what it's
>> doing.  You'll have to install the fedora-ds-base-debuginfo package.
>> Attach to the process - gdb /usr/sbin/ns-slapd <pid of process>
>> Then, dump the thread stacks -
>>
>> (gdb) thread apply all bt
>>
>> If you want the output to go to a file, redirect gdb logging to a file
>> first before doing the thread apply e.g.
>>
>> (gdb) set logging on
>> (gdb) set logging file stack.txt
>>
>>
>>     
>>> I'm going to try running strace while the process is working, and hope for a
>>> hang. Maybe that will give us some more info.
>>>
>>> -richard
>>>
>>> On 2/19/08 10:23 AM, "Rich Megginson" <rmeggins at redhat.com> wrote:
>>>
>>>
>>>       
>>>> Richard Hesse wrote:
>>>>
>>>>         
>>>>> Yes, every host (except the ldap hosts) runs nscd. The ldap servers are not
>>>>> configured to use directory data for anything.
>>>>>
>>>>>
>>>>>           
>>>> I just don't know.  I've not seen this before.  I suppose you could try
>>>> checking your kernel TCP/IP settings, and increasing the number of file
>>>> descriptors used -
>>>> http://directory.fedoraproject.org/wiki/Performance_Tuning
>>>>
>>>>         
>>>>> -richard
>>>>>
>>>>>
>>>>> On 2/15/08 2:11 PM, "Rich Megginson" <rmeggins at redhat.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> Richard Hesse wrote:
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> nsswitch posix users/groups,
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> Are you using nscd?
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> ssh, sudo, puppet (config management), and
>>>>>>> internally written applications.
>>>>>>>
>>>>>>> -richard
>>>>>>>
>>>>>>> On 2/15/08 12:53 PM, "Rich Megginson" <rmeggins at redhat.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> What is the application which is generating this load?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> --
>>>>>>> Fedora-directory-users mailing list
>>>>>>> Fedora-directory-users at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>> --
>>>>> Fedora-directory-users mailing list
>>>>> Fedora-directory-users at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>>>>
>>>>>
>>>>>           
>>> --
>>> Fedora-directory-users mailing list
>>> Fedora-directory-users at redhat.com
>>> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>>>
>>>       
>>     
>
>
> --
> Fedora-directory-users mailing list
> Fedora-directory-users at redhat.com
> https://www.redhat.com/mailman/listinfo/fedora-directory-users
>   


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3245 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.fedoraproject.org/pipermail/389-users/attachments/20080220/8d7fedc7/attachment.bin>


More information about the 389-users mailing list