-----BEGIN PGP SIGNED MESSAGE-----
On 06/04/2013 05:27 PM, Jakub Hrozek wrote:
I'm wondering whether we should be indexing the ghost users
attribute. Currently we are not.
In general, the ghost attribute is quite similar to the memberuid
attribute and I'm trying to see if the balance between the speed
benefits of having the attribute indexes vs the cost of indexing
are worth it.
The ghost attribute is used on a couple of places. Most prominent
are: * nss responder - when the responder is gathering the list of
users who are members of a group, the members are the combined
values of memberUID and ghost attributes. Here we just check an
element of the group object, no search that includes the "ghost"
attribute is performed. * LDAP provider - whenever a user is saved
or deleted, the sysdb is searched for any "ghost" entries with
value equal to user's name Which means saving or deleting a user
triggers a search that includes the ghost attribute. * LDAP
provider - when a group is saved and its members are not resolved
yet, a ghost entry is saved instead. I don't see us searching using
the ghost attribute there.
I suspect that indexing would only speed up the situation where we
refer to the ghost attribute in a search filter, right?
Then I suspect we wouldn't gain much by indexing the attribute, but
I wanted to check with the list anyway.
This is the sort of question that probably needs answering via
experimentation rather than analysis. Ultimately, it depends on the
workload: how often are we saving or deleting users vs. how often are
we saving groups with unresolved users?) I suspect the answer is going
to be that we save groups with unresolved users *far* more often than
we access the users in those groups.
On private laptops, for example, there will generally only be full
user entries in the cache for the primary user and possibly for users
of any shared files that they have viewed. Of course, there's also the
one-time cost of running 'ls -l /home' where /home is an NFS share
with one file for every user on the system. In this specific case, the
lack of indexing on the search is going to be painful (but at the same
time, we're also adding the initial indexing cost throughout the same
operation, so the net result could go in either direction. It's
unclear to me whether the cost of a progressively larger indexing
would be more or less costly than just running the linear searches).
I suspect that this would be the easiest way to test this
experimentally: Create an LDAP server with 500 users and a directory
with one file owned by each UID, then time an 'ls -l' with a purged
(not just expired) cache and see how it performs with and without
indexing. This will give us a good idea about our worst-case performance.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
-----END PGP SIGNATURE-----