I would like to get some opinions on where I'm heading with the
performance enhancements for 1.14. Please note this is /not/ a complete
design page. The goal is to just identify some blockers first before I
spend more time working on this feature, even though I already discussed
the page with some developers (thanks!).
If we agree this is the way to go, I will polish the design page as I
work on the feature.
I've started the design page here:
For your convenience, I've included the text below as well:
= Feature Name =
SSSD Performance enhancements for the 1.14 release
=== Problem statement ===
At the moment SSSD doesn't perform well in large environments. Most of
the use-cases we've had reported revolved around logins of users who are
members of large groups or a large amount of groups. Another reported
use-case was the time it takes to resolve a large group.
While workarounds are available for some of the issues (such as using
`ignore_group_members` for resolution of large groups), our goal is to be
able to perform well without these workarounds.
=== Use cases ===
* User who is a member of a large amount of AD groups logs in to a Linux server that is a member of the AD domain.
* User who is a member of a large amount of AD or IPA groups logs in to a Linux server that is a member of an IPA domain with a trust relationship to an AD domain
* Administrator of a Linux server runs "ls -l" in a directory where files are owned by a large group. An example would be group called "students" in an university setup
=== Overview of the solution ===
During performance analysis with systemtap, we found out that the biggest
delay happens when SSSD writes an entry to the cache. We can't skip cache
writes completely, even if no attributes changed, because we store also the
expiration timestamps in the cache. Also, even if a single attribute (like
the timestamp) changes, ldb would need to unpack the whole entry, change
the record, pack it back and then write the whole blob.
In order to mitigate the costly cache writes, we should avoid writing the
whole cache entry on every cache update.
To avoid this, we will split the monolithic ldb cache representing the
sysdb cache into two ldb files. One would contain the entry itself and would
be fully synchronous. The other (new one) would only contain the timestamps
and would be open using the `LDB_FLG_NOSYNC` to avoid synchronous cache writes.
This would have two advantages:
1. If we detect that the entry hasn't changed on the LDAP server at all, we could avoid writing into the main ldb cache which would still be costly.
1. The writes to the new async ldb cache would be much faster, because the entry is smaller and because the writes wouldn't call `fsync()` due to using the async flag, but rather rely on the underlying filesystem to sync the data to the disk.
On SSSD shutdown, we would write a canary to the cache, denoting graceful
shutdown. On SSSD startup, if the canary wasn't found, we would just ditch
the timestamp cache, which would result in refresh and write of the entry
on the next lookup.
Other minor performance enhancements might include:
* using syncrepl in the server mode for HBAC rules and external groups in refreshAndPersistMode. This would provide performance benefit for legacy clients that rely on server's HBAC rules for access control.
* using syncrepl in the server mode for external groups in refreshAndPersistMode. This would mainly simplify the external groups handling, rather than improve performance
* A lot of time is spent looking up attributes in the `sysdb_attrs` array. This is something we might want to optimize after we're done with the cache writes.
* We might even consider offering syncrepl in refreshOnly mode as an client-side option for enumeration. However, this would have to be an opt-in because every refresh causes the server to walk the changelog since the last refresh operation. Enabling this option on all clients would trash the server performance.
The basic idea is to use a combination of the operational `modifyTimestamp`
attribute and checking the entry itself to see if the entry changed at
all and if not, avoid writing to the cache.
=== Implementation details ===
Details TBD, but so far we were thinking along the lines of:
* using `modifyTimestamp` to detect if the entry changed at all. We would have to be smart when switching to a new server, because the new server might be out-of-sync and the timestamps might differ between replicas
* using `modifyTimestamp` wouldn't work well for users, because (at least with IPA), every authentication is a write operation, due to updating the `krbLastSuccessfulAuth` attribute. Therefore, we also need to compare the cached entry's attributes with what we read from LDAP. We might also need to store also additional attributes such as `originalModifyTimestamp` or `entryUSN`.
the attached patches are my proposal to fix
I haven't tested them past make check yet, because I'm not sure I like
them myself :) but at the same time I can't see a better way to keep
track of the servers and let callers set state of servers.
The most ugly thing so far IMO is the fo_internal_owner member. I would
prefer to instead have a fo_server_wrap structure that would be used for
the server_list, but I didn't want to do a large change before we agree
the refcount is a good idea at all.
The other ugly side-effect is that we need to be sure that nobody calls
talloc_free on the fo_server structure. Instead, only the parent context
can be freed (that's also what the first patch is about).
I found this potential crash when trying to find another issue in the
failover code. To reproduce, just revert the changes to
src/providers/fail_over.c and run make check, you should see either a
crash or at least an error if you use valgrind.
I decided to share this design document although it still a work in progress. Attached patches are just prove of concept and are very much work in progress. So far patches also defers from design in order in which secondary slices are generated.
Thanks for feedback on this early state of effort.
Hi, FreeIPA and SSSD communities!
I am working on adding URI to HBAC as my thesis . The goal is to
control access not only based on (user, host, service), but on (user,
host, service, resource's URI).
I created a patch for FreeIPA  so it is capable of storing URI as
part of HBAC rule. I created a patch for SSSD  so it is able to get
this URI from FreeIPA and use it in HBAC evaluation.
I still need to develop a part of SSSD receiving URI-aware requests. It
will either be an enhancement of Infopipe or I will use PAM responder
I wanted to kindly ask you for review and your opinions on the patches
and generally on my approach. This would be my first contribution to
FreeIPA and SSSD so there might be bugs. What do you think?
Btw, is there some better place to share patches than a pasting tool?
Maybe some form of pull request?
Thanks for your opinions!
[PATCH 1/2] GPO: Add Cockpit to the Remote Interactive defaults
The Cockpit Project is an administrative console that is gaining in
popularity and is a default component on some operating systems (such
as Fedora Server). Since it is becoming more common, we should ensure
that it is part of the standard mapping.
[PATCH 2/2] GPO: Add other display managers to interactive logon
Gone are the days when all systems used GDM or KDM. We need to support
other display managers in the default configuration to avoid issues
when enrolled in AD domains.