Re: [SSSD-users] [sssd[pam]] [accept_fd_handler] (0x0020): Accept failed [Too many open files]

Monday, 20 August 2012

On Mon, August 20, 2012 12:05, Jakub Hrozek wrote:
...
 On Mon, Aug 20, 2012 at 08:33:47AM +0200, Sigbjorn Lie wrote:

> Hi,
>
>
> When I arrived into the office this morning our Nagios server was displaying a lot of
alarms.
>
>
> The "sssd_pam" process was consuming 100% CPU, and I was unable to log on
to the box as
> anything else than root.
>
> 2310 root      20   0  219m  44m 2176 R 99.6  0.3   2883:27 sssd_pam
>
>
>
> In the var/log/sssd/sssd_pam.log file, the following error message was repeated:
>
>
> [sssd[pam]] [accept_fd_handler] (0x0020): Accept failed [Too many open files]
>
>
>
> This being our Nagios server the maximum amount of concurrent open files has been
increased
> from the default 1024 to 4096 for all users.
>
> This is RHEL 6.3 with sssd-1.8.0-32.
>
>
> What can I do to prevent this from happening in the future?
>
>
>
> Regards,
> Siggi
>

 In SSSD 1.8 the limit of file descriptors was raised to either 8k or the
 hard limit from limits.conf, whichever was lesser.

That would be 4k for me then.

...
 There is also a new option fd_limit that can be used to set the limit
and
 in cases where the SSSD has the CAP_SYS_RESOURCE capability, even override the hard limit
from
 limits.conf [1] 
When is the appropriate time to use this? I presume what I need is more file descriptors
and not
less?

...

 I'd like to ask for some more info to tell if the server was simply busy or
 if we were really leaking a file descriptor:

 Do you know how many files were open at the time? No, recovery of the services
we're of a higher priority than collecting info at the time.

Were there many
...
 concurrent logins happening to that server? Yes, nagios spans a
lot of ssh sessions to other hosts. There we're also other processes that had
been spawned from cron which we're hung.

 Did you have a chance to run lsof to check what file
...
 descriptors were open? I'm sorry no.

We did increase the system-wide nproc value in limits.conf from 1024 to 4096 4 days ago
due to too
many Nagios checks running at the same time. If that is the issue when we will see the
SSSD issue
happening again in a few days.

Anything I should change in SSSD's config or just wait for this to happen again and
collect more
information?

We are not running SELinux on this box.

Any other existing known bugs I should be aware about?

Rgds,
Siggi

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [SSSD-users] [sssd[pam]] [accept_fd_handler] (0x0020): Accept failed [Too many open files]