Mark,
See attached. The break in the log is where it hung and then came back to life.
I’m also going to follow up with our networking and security folks to see if we can find
anything there. These hosts are all on the same subnet for what it’s worth.
Thanks for the help.
-morgan
On Aug 23, 2017, at 12:35 PM, Mark Reynolds
<mareynol(a)redhat.com> wrote:
On 08/23/2017 12:31 PM, Morgan Jones wrote:
>> On Aug 23, 2017, at 12:17 PM, Mark Reynolds <mareynol(a)redhat.com> wrote:
>>
>>
>>> [pid 27442] recvmsg(14, 0x7f3880ef74d0, 0) = -1 EAGAIN (Resource temporarily
unavailable)
>>> [pid 27442] recvmsg(14, 0x7f3880ef74d0, 0) = -1 EAGAIN (Resource temporarily
unavailable)
>>> [pid 27442] poll([{fd=14, events=POLLRDNORM}, {fd=15, events=POLLRDNORM}], 2,
500 <unfinished ...>
>>> [pid 27440] <... futex resumed> ) = -1 ETIMEDOUT (Connection
timed out)
>>> [pid 27440] futex(0x7f38940cfd28, FUTEX_WAKE_PRIVATE, 1) = 0
>> Sorry forgot to comment on this...
>>
>> This explains the "hang" - connections to the remove server(s) are
>> timing out.
>>
>> Can you look at the DS access logs on a remote server during the hang
>> (note there is a 30 sec log buffer with the access log). Perhaps just
>> tail the access log, reproduce the hang (wait 30 seconds), and provide
>> the complete tail output.
> Aha, ok, thanks. Which access log do you want or do you want all of them?
Just the access log from the server that triggers the initial hang.
>
> -morgan