Design stub: performance enhancements for 1.14
by Jakub Hrozek
Hi,
I would like to get some opinions on where I'm heading with the
performance enhancements for 1.14. Please note this is /not/ a complete
design page. The goal is to just identify some blockers first before I
spend more time working on this feature, even though I already discussed
the page with some developers (thanks!).
If we agree this is the way to go, I will polish the design page as I
work on the feature.
I've started the design page here:
https://fedorahosted.org/sssd/wiki/DesignDocs/OneFourteenPerformanceImpro...
For your convenience, I've included the text below as well:
= Feature Name =
SSSD Performance enhancements for the 1.14 release
Related ticket(s):
* https://fedorahosted.org/sssd/ticket/2602
* https://fedorahosted.org/sssd/ticket/2062
=== Problem statement ===
At the moment SSSD doesn't perform well in large environments. Most of
the use-cases we've had reported revolved around logins of users who are
members of large groups or a large amount of groups. Another reported
use-case was the time it takes to resolve a large group.
While workarounds are available for some of the issues (such as using
`ignore_group_members` for resolution of large groups), our goal is to be
able to perform well without these workarounds.
=== Use cases ===
* User who is a member of a large amount of AD groups logs in to a Linux server that is a member of the AD domain.
* User who is a member of a large amount of AD or IPA groups logs in to a Linux server that is a member of an IPA domain with a trust relationship to an AD domain
* Administrator of a Linux server runs "ls -l" in a directory where files are owned by a large group. An example would be group called "students" in an university setup
=== Overview of the solution ===
During performance analysis with systemtap, we found out that the biggest
delay happens when SSSD writes an entry to the cache. We can't skip cache
writes completely, even if no attributes changed, because we store also the
expiration timestamps in the cache. Also, even if a single attribute (like
the timestamp) changes, ldb would need to unpack the whole entry, change
the record, pack it back and then write the whole blob.
In order to mitigate the costly cache writes, we should avoid writing the
whole cache entry on every cache update.
To avoid this, we will split the monolithic ldb cache representing the
sysdb cache into two ldb files. One would contain the entry itself and would
be fully synchronous. The other (new one) would only contain the timestamps
and would be open using the `LDB_FLG_NOSYNC` to avoid synchronous cache writes.
This would have two advantages:
1. If we detect that the entry hasn't changed on the LDAP server at all, we could avoid writing into the main ldb cache which would still be costly.
1. The writes to the new async ldb cache would be much faster, because the entry is smaller and because the writes wouldn't call `fsync()` due to using the async flag, but rather rely on the underlying filesystem to sync the data to the disk.
On SSSD shutdown, we would write a canary to the cache, denoting graceful
shutdown. On SSSD startup, if the canary wasn't found, we would just ditch
the timestamp cache, which would result in refresh and write of the entry
on the next lookup.
Other minor performance enhancements might include:
* using syncrepl in the server mode for HBAC rules and external groups in refreshAndPersistMode. This would provide performance benefit for legacy clients that rely on server's HBAC rules for access control.
* using syncrepl in the server mode for external groups in refreshAndPersistMode. This would mainly simplify the external groups handling, rather than improve performance
* A lot of time is spent looking up attributes in the `sysdb_attrs` array. This is something we might want to optimize after we're done with the cache writes.
* We might even consider offering syncrepl in refreshOnly mode as an client-side option for enumeration. However, this would have to be an opt-in because every refresh causes the server to walk the changelog since the last refresh operation. Enabling this option on all clients would trash the server performance.
The basic idea is to use a combination of the operational `modifyTimestamp`
attribute and checking the entry itself to see if the entry changed at
all and if not, avoid writing to the cache.
=== Implementation details ===
Details TBD, but so far we were thinking along the lines of:
* using `modifyTimestamp` to detect if the entry changed at all. We would have to be smart when switching to a new server, because the new server might be out-of-sync and the timestamps might differ between replicas
* using `modifyTimestamp` wouldn't work well for users, because (at least with IPA), every authentication is a write operation, due to updating the `krbLastSuccessfulAuth` attribute. Therefore, we also need to compare the cached entry's attributes with what we read from LDAP. We might also need to store also additional attributes such as `originalModifyTimestamp` or `entryUSN`.
8 years, 1 month
RFC: Remove conditional build for sudo?
by Pavel Březina
I'm just reviewing Petr's sudo rule invalidation patches and when I hit
> + case TYPE_SUDO_RULE:
> + type_string = "sudo_rule";
> +#ifdef BUILD_SUDO
> + ret = sysdb_search_sudo_rules(ctx, dinfo,
> + filter, attrs, &msg_count, &msgs);
> +#else /* BUILD_SUDO */
> + ret = ENOSYS;
> +#endif /* BUILD_SUDO */
> + break;
I though why not remove it anyway. Sudo has grown from experimental
feature to heavily used one so I think we can simply remove the
conditional build. What do you think?
8 years, 1 month
[PATCH] Use refcount to keep track of server structures returned from failover
by Jakub Hrozek
Hi,
the attached patches are my proposal to fix
https://fedorahosted.org/sssd/ticket/2829
I haven't tested them past make check yet, because I'm not sure I like
them myself :) but at the same time I can't see a better way to keep
track of the servers and let callers set state of servers.
The most ugly thing so far IMO is the fo_internal_owner member. I would
prefer to instead have a fo_server_wrap structure that would be used for
the server_list, but I didn't want to do a large change before we agree
the refcount is a good idea at all.
The other ugly side-effect is that we need to be sure that nobody calls
talloc_free on the fo_server structure. Instead, only the parent context
can be freed (that's also what the first patch is about).
8 years, 1 month
[PATCH] FO: Use tevent_req_defer_callback() when notifying callers
by Jakub Hrozek
Hi,
I found this potential crash when trying to find another issue in the
failover code. To reproduce, just revert the changes to
src/providers/fail_over.c and run make check, you should see either a
crash or at least an error if you use valgrind.
8 years, 1 month