Simo Sorce wrote:
during this month I have been slowly working on a set of patches to move
from storing information in 2 different formats (legacy and
member/memberOf based) to just one format (member/memberOf based).
While doing this I had to address some problems that come up when you
want to store a group and its members have not been stored yet, and
cases like this.
All the while I have been testing doing enumerations against a server
that has more than 3k users and 3k groups.
This is a medium sized database, and yet getting groups from scratch
(startup after deleting the .ldb database) could take up to a minute;
granted the operation is quite a bit faster if the database just needs
updating and not creation from scratch, but I still think it's too much.
I've been thinking hard about how to address this problem and solve the
few hacks we have in the code when it comes to enumeration caching and
retrieval. We always said that enumerations are evil (and they are
indeed) and in fact we even have options that disable enumerations by
default. Yet I think this is not necessarily the right way to go.
I think we have 2 major problems in our current architecture when it
comes to enumerations.
1) we try to hit the wire when an enumeration request comes in from a
process and a (small) timeout for the previous enumeration has been
May be then we should as I quick fix have a separate timeout for the
2) We run the enumeration in a single transaction (and yes I have
recently introduced this), which means any other operation is blocked
until the enumeration is finished.
Can we create a special back end for enumerations and separate it from
The problem I actually see is that user space apps may have to wait
too much, and this *will* turn out to be a problem.
Even if we give the
option to turn off enumeration I think that for apps that needs it the
penalty has become simply too big. Also I think the way we have to
perform updates using this model is largely inefficient, as we basically
perform a full new search potentially every few minutes.
Agree, though I think that we can separate these enhancements and do
them as a separate effort (may be later, after F12 if we do not have time).
After some hard thinking I wrote down a few points I'd like the
opinion on. If people agree I will start acting on them.
* stop performing enumerations on demand, and perform them in background
if enumerations are activated (change the enumeration parameter from a
bitfield to a true/flase boolean)
Is a separate back end may be?
* perform a full user+group enumeration at startup (possibly using a
paged or vlv search)
Yes, I agree, but there is a concern (see below).
* when possible request the modifyTimestamp attribute and save the
highest modifyTimestamp into the domain entry as originalMaxTimestamp
* on a tunable interval run a task that refreshes all users and
in the background using a search filter that includes
Okey... Steven explain it in more details since i was concerned about
the time stamp being also updated
on individual refreshes but he said that this time stamp will be touched
only by enumerations.
Hm I see how that would work.
* still do a full refresh every X minutes/hours
* disable using a single huge transaction for enumerations (we might
ok doing a transaction for each page search if pages are small,
otherwise just revert to the previous behavior of having a transaction
per stored object)
Can you do page - individual request - page -individual request?
If yes then it makes sense to have transaction per page.
If you have to do pages one after another and can't do other requests in
the middle I do not see how changing transaction scope would help.
* Every time we update an entry we store the originalModifyTimestamp
it as a copy of the remote modifyTimestamp, this allows us to know if we
actually need to touch the cached entry at all upon refresh (like when a
getpwnam() is called, speeding up operations for entries that need no
refresh (we will avoid data transformation and a writing to ldb).
Is this only for enumerations or on individual updates too?
How it is related to the currently designed and being implemented cache
Can you please explain how this would affect the cache logic?
* Every time we run the general refresh task or we save a changed
we store a LastUpdatedTimestamp
* When the refresh task is completed successfully we run another cleanup
task that searches our LDB for any entry that has a too old
LastUpdatedTimestamp. If any is found, we double check against the
remote server if the entry still exists (and update it if it does), and
otherwise we delete it.
Makes sense. I actually makes me think that with out of band periodic
enumerations and cleanups it becomes more and more appealing
to have it in a separate back end.
NOTE: this means that until the first background enumeration is
complete, a getent passwd or a getent group call may return incomplete
results. I think this is acceptable as it will really happen only at
startup, when the daemon caches are empty.
Here is my concern I mentioned above: How many services and daemons at
startup rely on the enumeration?
Do we know? Is there any way to know? Should we ask the communities
about those to make sure we meet the expectations?
What about things line network manager, HAL, System bus, auditd, GDE and
many other processes that start at boot?
If we block them it might be a show stopper if we provide partial data
it might be Ok might be not.
I guess we need more input on the matter. Do you agree?
NOTE2: Off course the scheduled refreshes and cleanup tasks are
rescheduled if we are offline or if a fatal error occurs during the
IMO this is yet another reason to have a separate back end.
NOTE3: I am proposing to change only the way enumerations are
single user or group lookups will remain unchanged for now and will be
dealt with later if needed.
Sure. I think the only impact is that the entry might be refreshed with
the enumeration and we need to factor a new time stamp
in the cache refresh logic. Other than that I do not see any impact.
Please provide comments or questions if you think there is anything
clear with the proposed items or if you think I forgot to take some
important aspect in account.
sssd-devel mailing list
Engineering Manager IPA project,
Red Hat Inc.
Looking to carve out IT costs?