On Tue, May 03, 2016 at 05:30:23PM +0200, Jakub Hrozek wrote:
On Tue, May 03, 2016 at 03:52:03PM +0100, Patrick Coleman wrote:
> On 1 May 2016 at 17:04, Jakub Hrozek <jhrozek(a)redhat.com> wrote:
> >> On 30 Apr 2016, at 10:28, Patrick Coleman
<patrick.coleman(a)meraki.com> wrote:
> >> On 29 Apr 2016 9:10 pm, "Lukas Slebodnik"
<lslebodn(a)redhat.com> wrote:
> >> >
> >> > Do you meand IO related load or CPU related load?
> >>
> >> Lots of both, but we're typically IO bound more of the time.
> >>
> >> > If there is issue with CPU then you can mount sssd cache to tmpfs
> >> > to avoid such issues. (there are plans to improve it in 1.14)
> >>
> >> Cool, I'll give that a go.
> >
> > Alternatively, increase the 'timeout' option in sssd's sections..
>
> I appreciate the advice, thankyou. I've put /var/lib/sss on to a tmpfs
> filesystem on a couple of loaded machines and seen what I believe to
> be improvements - it's a little too early to say, but I'll report back
> once I have a wider deployment.
>
> I did want to feed back a little of our research into this issue. If
> we strace the sssd_be subprocess on a loaded machine, we see it
> sitting in msync() and fdatasync() for periods of up to 7.3 seconds in
> one test. This is perhaps expected, given the machine is under heavy
> IO load, but sssd makes a *lot* of these calls.
Yes, every cache update does 4 of these. This is a know issue I'm
working on right now:
https://fedorahosted.org/sssd/ticket/2602
In particular:
https://fedorahosted.org/sssd/wiki/DesignDocs/OneFourteenPerformanceImpro...
By the way, some comparison from my WIP branch. Without the patches,
updating a user who is a member of several hundred large groups with 'id'
takes the following:
Total run time of id was: 19415 ms
Number of zero-level cache transactions: 283
--> Time spent in level-0 sysdb transactions: 7694 ms
--> Time spent writing to LDB: 2958 ms
Number of LDAP searches: 562
Time spent waiting for LDAP: 4548 ms
With the patches to avoid cache writes:
Total run time of id was: 9482 ms
Number of zero-level cache transactions: 283
--> Time spent in level-0 sysdb transactions: 1074 ms
--> Time spent writing to LDB: 38 ms
Number of LDAP searches: 562
Time spent waiting for LDAP: 4792 ms
So I think this already shows a nice improvement, although there is
still quite a bit to work on..