Hello all, during this month I have been slowly working on a set of patches to move from storing information in 2 different formats (legacy and member/memberOf based) to just one format (member/memberOf based). While doing this I had to address some problems that come up when you want to store a group and its members have not been stored yet, and cases like this. All the while I have been testing doing enumerations against a server that has more than 3k users and 3k groups. This is a medium sized database, and yet getting groups from scratch (startup after deleting the .ldb database) could take up to a minute; granted the operation is quite a bit faster if the database just needs updating and not creation from scratch, but I still think it's too much.
I've been thinking hard about how to address this problem and solve the few hacks we have in the code when it comes to enumeration caching and retrieval. We always said that enumerations are evil (and they are indeed) and in fact we even have options that disable enumerations by default. Yet I think this is not necessarily the right way to go.
I think we have 2 major problems in our current architecture when it comes to enumerations. 1) we try to hit the wire when an enumeration request comes in from a process and a (small) timeout for the previous enumeration has been expired. 2) We run the enumeration in a single transaction (and yes I have recently introduced this), which means any other operation is blocked until the enumeration is finished.
The problem I actually see is that user space apps may have to wait just too much, and this *will* turn out to be a problem. Even if we give the option to turn off enumeration I think that for apps that needs it the penalty has become simply too big. Also I think the way we have to perform updates using this model is largely inefficient, as we basically perform a full new search potentially every few minutes.
After some hard thinking I wrote down a few points I'd like the list opinion on. If people agree I will start acting on them.
* stop performing enumerations on demand, and perform them in background if enumerations are activated (change the enumeration parameter from a bitfield to a true/flase boolean) * perform a full user+group enumeration at startup (possibly using a paged or vlv search) * when possible request the modifyTimestamp attribute and save the highest modifyTimestamp into the domain entry as originalMaxTimestamp * on a tunable interval run a task that refreshes all users and groups in the background using a search filter that includes &(modifyTimestamp>$originalMaxtimestamp) * still do a full refresh every X minutes/hours * disable using a single huge transaction for enumerations (we might be ok doing a transaction for each page search if pages are small, otherwise just revert to the previous behavior of having a transaction per stored object) * Every time we update an entry we store the originalModifyTimestamp on it as a copy of the remote modifyTimestamp, this allows us to know if we actually need to touch the cached entry at all upon refresh (like when a getpwnam() is called, speeding up operations for entries that need no refresh (we will avoid data transformation and a writing to ldb). * Every time we run the general refresh task or we save a changed entry we store a LastUpdatedTimestamp * When the refresh task is completed successfully we run another cleanup task that searches our LDB for any entry that has a too old LastUpdatedTimestamp. If any is found, we double check against the remote server if the entry still exists (and update it if it does), and otherwise we delete it.
NOTE: this means that until the first background enumeration is complete, a getent passwd or a getent group call may return incomplete results. I think this is acceptable as it will really happen only at startup, when the daemon caches are empty.
NOTE2: Off course the scheduled refreshes and cleanup tasks are always rescheduled if we are offline or if a fatal error occurs during the task.
NOTE3: I am proposing to change only the way enumerations are handled, single user or group lookups will remain unchanged for now and will be dealt with later if needed.
Please provide comments or questions if you think there is anything not clear with the proposed items or if you think I forgot to take some important aspect in account.
Simo.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 08/13/2009 06:23 AM, Simo Sorce wrote:
Hello all, during this month I have been slowly working on a set of patches to move from storing information in 2 different formats (legacy and member/memberOf based) to just one format (member/memberOf based). While doing this I had to address some problems that come up when you want to store a group and its members have not been stored yet, and cases like this. All the while I have been testing doing enumerations against a server that has more than 3k users and 3k groups. This is a medium sized database, and yet getting groups from scratch (startup after deleting the .ldb database) could take up to a minute; granted the operation is quite a bit faster if the database just needs updating and not creation from scratch, but I still think it's too much.
I've been thinking hard about how to address this problem and solve the few hacks we have in the code when it comes to enumeration caching and retrieval. We always said that enumerations are evil (and they are indeed) and in fact we even have options that disable enumerations by default. Yet I think this is not necessarily the right way to go.
I think we have 2 major problems in our current architecture when it comes to enumerations.
- we try to hit the wire when an enumeration request comes in from a
process and a (small) timeout for the previous enumeration has been expired. 2) We run the enumeration in a single transaction (and yes I have recently introduced this), which means any other operation is blocked until the enumeration is finished.
The problem I actually see is that user space apps may have to wait just too much, and this *will* turn out to be a problem. Even if we give the option to turn off enumeration I think that for apps that needs it the penalty has become simply too big. Also I think the way we have to perform updates using this model is largely inefficient, as we basically perform a full new search potentially every few minutes.
One potential idea would be to have the SSSD automatically start an enumeration at startup time if the cache is stale. Then, instead of blocking updates waiting for subsequent enumerations, we could just go immediately to the cache until the enumeration was complete.
After some hard thinking I wrote down a few points I'd like the list opinion on. If people agree I will start acting on them.
- stop performing enumerations on demand, and perform them in background
if enumerations are activated (change the enumeration parameter from a bitfield to a true/flase boolean)
- perform a full user+group enumeration at startup (possibly using a
paged or vlv search)
- when possible request the modifyTimestamp attribute and save the
highest modifyTimestamp into the domain entry as originalMaxTimestamp
- on a tunable interval run a task that refreshes all users and groups
in the background using a search filter that includes &(modifyTimestamp>$originalMaxtimestamp)
- still do a full refresh every X minutes/hours
- disable using a single huge transaction for enumerations (we might be
ok doing a transaction for each page search if pages are small, otherwise just revert to the previous behavior of having a transaction per stored object)
- Every time we update an entry we store the originalModifyTimestamp on
it as a copy of the remote modifyTimestamp, this allows us to know if we actually need to touch the cached entry at all upon refresh (like when a getpwnam() is called, speeding up operations for entries that need no refresh (we will avoid data transformation and a writing to ldb).
- Every time we run the general refresh task or we save a changed entry
we store a LastUpdatedTimestamp
- When the refresh task is completed successfully we run another cleanup
task that searches our LDB for any entry that has a too old LastUpdatedTimestamp. If any is found, we double check against the remote server if the entry still exists (and update it if it does), and otherwise we delete it.
I do like the idea of updating only the entries on the remote server that have been updated since the previous enumeration.
NOTE: this means that until the first background enumeration is complete, a getent passwd or a getent group call may return incomplete results. I think this is acceptable as it will really happen only at startup, when the daemon caches are empty.
I disagree. If we're going to have a startup enumeration, then we should simply not enable handling NSS requests until that first enumeration is complete. Incomplete results can be worse than no results. I assume NSS has a return code for temporary failure?
NOTE2: Off course the scheduled refreshes and cleanup tasks are always rescheduled if we are offline or if a fatal error occurs during the task.
NOTE3: I am proposing to change only the way enumerations are handled, single user or group lookups will remain unchanged for now and will be dealt with later if needed.
Please provide comments or questions if you think there is anything not clear with the proposed items or if you think I forgot to take some important aspect in account.
Simo.
sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/sssd-devel
- -- Stephen Gallagher RHCE 804006346421761
Looking to carve out IT costs? www.redhat.com/carveoutcosts/
On Thu, 2009-08-13 at 08:54 -0400, Stephen Gallagher wrote:
One potential idea would be to have the SSSD automatically start an enumeration at startup time if the cache is stale. Then, instead of blocking updates waiting for subsequent enumerations, we could just go immediately to the cache until the enumeration was complete.
See below, this is what I propose :)
After some hard thinking I wrote down a few points I'd like the list opinion on. If people agree I will start acting on them.
- stop performing enumerations on demand, and perform them in background
if enumerations are activated (change the enumeration parameter from a bitfield to a true/flase boolean)
- perform a full user+group enumeration at startup (possibly using a
paged or vlv search)
- when possible request the modifyTimestamp attribute and save the
highest modifyTimestamp into the domain entry as originalMaxTimestamp
- on a tunable interval run a task that refreshes all users and groups
in the background using a search filter that includes &(modifyTimestamp>$originalMaxtimestamp)
- still do a full refresh every X minutes/hours
- disable using a single huge transaction for enumerations (we might be
ok doing a transaction for each page search if pages are small, otherwise just revert to the previous behavior of having a transaction per stored object)
- Every time we update an entry we store the originalModifyTimestamp on
it as a copy of the remote modifyTimestamp, this allows us to know if we actually need to touch the cached entry at all upon refresh (like when a getpwnam() is called, speeding up operations for entries that need no refresh (we will avoid data transformation and a writing to ldb).
- Every time we run the general refresh task or we save a changed entry
we store a LastUpdatedTimestamp
- When the refresh task is completed successfully we run another cleanup
task that searches our LDB for any entry that has a too old LastUpdatedTimestamp. If any is found, we double check against the remote server if the entry still exists (and update it if it does), and otherwise we delete it.
I do like the idea of updating only the entries on the remote server that have been updated since the previous enumeration.
Yes hopefully all servers allow this kind of query, I guess we need to experiment and have options to fallback to full searches if this is not supported.
NOTE: this means that until the first background enumeration is complete, a getent passwd or a getent group call may return incomplete results. I think this is acceptable as it will really happen only at startup, when the daemon caches are empty.
I disagree. If we're going to have a startup enumeration, then we should simply not enable handling NSS requests until that first enumeration is complete. Incomplete results can be worse than no results. I assume NSS has a return code for temporary failure?
Internally, yes, but all it does it to return no results to the user space. Not returning results is == returning partial results. So I see no difference here.
Note that this will happen only on the first startup when caches are empty as we otherwise always return whatever is in the cache.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 08/13/2009 08:38 AM, Simo Sorce wrote:
On Thu, 2009-08-13 at 08:54 -0400, Stephen Gallagher wrote:
NOTE: this means that until the first background enumeration is complete, a getent passwd or a getent group call may return incomplete results. I think this is acceptable as it will really happen only at startup, when the daemon caches are empty.
I disagree. If we're going to have a startup enumeration, then we should simply not enable handling NSS requests until that first enumeration is complete. Incomplete results can be worse than no results. I assume NSS has a return code for temporary failure?
Internally, yes, but all it does it to return no results to the user space. Not returning results is == returning partial results. So I see no difference here.
I was referring to having our NSS client-side component return TRYAGAIN or UNAVAIL instead of zero results, since the nsswitch.conf file can be configured to handle these appropriately.
- -- Stephen Gallagher RHCE 804006346421761
Looking to carve out IT costs? www.redhat.com/carveoutcosts/
On Thu, 2009-08-13 at 09:27 -0400, Stephen Gallagher wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 08/13/2009 08:38 AM, Simo Sorce wrote:
On Thu, 2009-08-13 at 08:54 -0400, Stephen Gallagher wrote:
NOTE: this means that until the first background enumeration is complete, a getent passwd or a getent group call may return incomplete results. I think this is acceptable as it will really happen only at startup, when the daemon caches are empty.
I disagree. If we're going to have a startup enumeration, then we should simply not enable handling NSS requests until that first enumeration is complete. Incomplete results can be worse than no results. I assume NSS has a return code for temporary failure?
Internally, yes, but all it does it to return no results to the user space. Not returning results is == returning partial results. So I see no difference here.
I was referring to having our NSS client-side component return TRYAGAIN or UNAVAIL instead of zero results, since the nsswitch.conf file can be configured to handle these appropriately.
We could do that, but how is it going to really make any difference for getent passwd ?
Simo.
Simo Sorce wrote:
Hello all, during this month I have been slowly working on a set of patches to move from storing information in 2 different formats (legacy and member/memberOf based) to just one format (member/memberOf based). While doing this I had to address some problems that come up when you want to store a group and its members have not been stored yet, and cases like this. All the while I have been testing doing enumerations against a server that has more than 3k users and 3k groups. This is a medium sized database, and yet getting groups from scratch (startup after deleting the .ldb database) could take up to a minute; granted the operation is quite a bit faster if the database just needs updating and not creation from scratch, but I still think it's too much.
I've been thinking hard about how to address this problem and solve the few hacks we have in the code when it comes to enumeration caching and retrieval. We always said that enumerations are evil (and they are indeed) and in fact we even have options that disable enumerations by default. Yet I think this is not necessarily the right way to go.
I think we have 2 major problems in our current architecture when it comes to enumerations.
- we try to hit the wire when an enumeration request comes in from a
process and a (small) timeout for the previous enumeration has been expired.
May be then we should as I quick fix have a separate timeout for the enumerations?
- We run the enumeration in a single transaction (and yes I have
recently introduced this), which means any other operation is blocked until the enumeration is finished.
Can we create a special back end for enumerations and separate it from individual operations?
The problem I actually see is that user space apps may have to wait just too much, and this *will* turn out to be a problem.
Agree.
Even if we give the option to turn off enumeration I think that for apps that needs it the penalty has become simply too big. Also I think the way we have to perform updates using this model is largely inefficient, as we basically perform a full new search potentially every few minutes.
Agree, though I think that we can separate these enhancements and do them as a separate effort (may be later, after F12 if we do not have time).
After some hard thinking I wrote down a few points I'd like the list opinion on. If people agree I will start acting on them.
- stop performing enumerations on demand, and perform them in background
if enumerations are activated (change the enumeration parameter from a bitfield to a true/flase boolean)
Is a separate back end may be?
- perform a full user+group enumeration at startup (possibly using a
paged or vlv search)
Yes, I agree, but there is a concern (see below).
- when possible request the modifyTimestamp attribute and save the
highest modifyTimestamp into the domain entry as originalMaxTimestamp
- on a tunable interval run a task that refreshes all users and groups
in the background using a search filter that includes &(modifyTimestamp>$originalMaxtimestamp)
Okey... Steven explain it in more details since i was concerned about the time stamp being also updated on individual refreshes but he said that this time stamp will be touched only by enumerations. Hm I see how that would work.
- still do a full refresh every X minutes/hours
Configurable. Sure.
- disable using a single huge transaction for enumerations (we might be
ok doing a transaction for each page search if pages are small, otherwise just revert to the previous behavior of having a transaction per stored object)
Can you do page - individual request - page -individual request? If yes then it makes sense to have transaction per page. If you have to do pages one after another and can't do other requests in the middle I do not see how changing transaction scope would help.
- Every time we update an entry we store the originalModifyTimestamp on
it as a copy of the remote modifyTimestamp, this allows us to know if we actually need to touch the cached entry at all upon refresh (like when a getpwnam() is called, speeding up operations for entries that need no refresh (we will avoid data transformation and a writing to ldb).
Is this only for enumerations or on individual updates too? How it is related to the currently designed and being implemented cache logic? Can you please explain how this would affect the cache logic?
- Every time we run the general refresh task or we save a changed entry
we store a LastUpdatedTimestamp
- When the refresh task is completed successfully we run another cleanup
task that searches our LDB for any entry that has a too old LastUpdatedTimestamp. If any is found, we double check against the remote server if the entry still exists (and update it if it does), and otherwise we delete it.
Makes sense. I actually makes me think that with out of band periodic enumerations and cleanups it becomes more and more appealing to have it in a separate back end.
NOTE: this means that until the first background enumeration is complete, a getent passwd or a getent group call may return incomplete results. I think this is acceptable as it will really happen only at startup, when the daemon caches are empty.
Here is my concern I mentioned above: How many services and daemons at startup rely on the enumeration? Do we know? Is there any way to know? Should we ask the communities about those to make sure we meet the expectations? What about things line network manager, HAL, System bus, auditd, GDE and many other processes that start at boot? If we block them it might be a show stopper if we provide partial data it might be Ok might be not. I guess we need more input on the matter. Do you agree?
NOTE2: Off course the scheduled refreshes and cleanup tasks are always rescheduled if we are offline or if a fatal error occurs during the task.
IMO this is yet another reason to have a separate back end.
NOTE3: I am proposing to change only the way enumerations are handled, single user or group lookups will remain unchanged for now and will be dealt with later if needed.
Sure. I think the only impact is that the entry might be refreshed with the enumeration and we need to factor a new time stamp in the cache refresh logic. Other than that I do not see any impact.
Please provide comments or questions if you think there is anything not clear with the proposed items or if you think I forgot to take some important aspect in account.
Simo.
sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/sssd-devel
On Thu, 2009-08-13 at 13:17 -0400, Dmitri Pal wrote:
Simo Sorce wrote:
Hello all, during this month I have been slowly working on a set of patches to move from storing information in 2 different formats (legacy and member/memberOf based) to just one format (member/memberOf based). While doing this I had to address some problems that come up when you want to store a group and its members have not been stored yet, and cases like this. All the while I have been testing doing enumerations against a server that has more than 3k users and 3k groups. This is a medium sized database, and yet getting groups from scratch (startup after deleting the .ldb database) could take up to a minute; granted the operation is quite a bit faster if the database just needs updating and not creation from scratch, but I still think it's too much.
I've been thinking hard about how to address this problem and solve the few hacks we have in the code when it comes to enumeration caching and retrieval. We always said that enumerations are evil (and they are indeed) and in fact we even have options that disable enumerations by default. Yet I think this is not necessarily the right way to go.
I think we have 2 major problems in our current architecture when it comes to enumerations.
- we try to hit the wire when an enumeration request comes in from a
process and a (small) timeout for the previous enumeration has been expired.
May be then we should as I quick fix have a separate timeout for the enumerations?
We already have, but a timeout is not going to make any difference to the call that comes in when it is expired.
- We run the enumeration in a single transaction (and yes I have
recently introduced this), which means any other operation is blocked until the enumeration is finished.
Can we create a special back end for enumerations and separate it from individual operations?
Not sure what you mean here.
The problem I actually see is that user space apps may have to wait just too much, and this *will* turn out to be a problem.
Agree.
Even if we give the option to turn off enumeration I think that for apps that needs it the penalty has become simply too big. Also I think the way we have to perform updates using this model is largely inefficient, as we basically perform a full new search potentially every few minutes.
Agree, though I think that we can separate these enhancements and do them as a separate effort (may be later, after F12 if we do not have time).
It has structural implications for the driver, but I am not forcing it up the schedule, I am interested in the technical discussion at the moment.
After some hard thinking I wrote down a few points I'd like the list opinion on. If people agree I will start acting on them.
- stop performing enumerations on demand, and perform them in background
if enumerations are activated (change the enumeration parameter from a bitfield to a true/flase boolean)
Is a separate back end may be?
Don't see any help in having a separate backend, just a lot more (duplicated) code.
- perform a full user+group enumeration at startup (possibly using a
paged or vlv search)
Yes, I agree, but there is a concern (see below).
- when possible request the modifyTimestamp attribute and save the
highest modifyTimestamp into the domain entry as originalMaxTimestamp
- on a tunable interval run a task that refreshes all users and groups
in the background using a search filter that includes &(modifyTimestamp>$originalMaxtimestamp)
Okey... Steven explain it in more details since i was concerned about the time stamp being also updated on individual refreshes but he said that this time stamp will be touched only by enumerations.
the originalMaxTimestamp can only be updated on enumerations refreshes, not when resolving individual entries, or we could miss changes.
Hm I see how that would work.
- still do a full refresh every X minutes/hours
Configurable. Sure.
yes
- disable using a single huge transaction for enumerations (we might be
ok doing a transaction for each page search if pages are small, otherwise just revert to the previous behavior of having a transaction per stored object)
Can you do page - individual request - page -individual request?
That's my hope, the ldap protocol allows that although I am not sure if all ldap servers actually allow it. I have a backup plan where we have an option that makes the ldap driver use a separate connection for the enumeration refreshes if the remote server does not cope well with multiple requests being sent at the same time.
If yes then it makes sense to have transaction per page.
when you use pages each page request is actually a new ldap search with attached a cookie, I don't think there is any problem in intermixing regular operations between each search request in a page search, but see the backup plan above if it turns out some ldap server has problems with that.
If you have to do pages one after another and can't do other requests in the middle I do not see how changing transaction scope would help.
Still using the trick above I could open 2 connections and perform enumerations and other searches in parallel, but the single transaction would block every write access to the db until the slowest request is finished. That is why I think I will break again the enumeration into a transaction per page. A transaction per entry is also possible, it is more expensive (lots of fsyncs) but if enumerations are done in the background that's a bit less critical in term of waiting time.
- Every time we update an entry we store the originalModifyTimestamp on
it as a copy of the remote modifyTimestamp, this allows us to know if we actually need to touch the cached entry at all upon refresh (like when a getpwnam() is called, speeding up operations for entries that need no refresh (we will avoid data transformation and a writing to ldb).
Is this only for enumerations or on individual updates too?
It can be used to test if the retrieved entry is newer or not so that we can save on storing on disk again (write is expensive). But it is primarily for enumerations.
How it is related to the currently designed and being implemented cache logic?
This is backend side.
Can you please explain how this would affect the cache logic?
It depends on what cache you are referring to, if you are referring to refreshes performed by the frontend this may make them unnecessary for the ldap driver.
- Every time we run the general refresh task or we save a changed entry
we store a LastUpdatedTimestamp
- When the refresh task is completed successfully we run another cleanup
task that searches our LDB for any entry that has a too old LastUpdatedTimestamp. If any is found, we double check against the remote server if the entry still exists (and update it if it does), and otherwise we delete it.
Makes sense. I actually makes me think that with out of band periodic enumerations and cleanups it becomes more and more appealing to have it in a separate back end.
Not sure what you mean by that or why it would be appealing, can you explain ?
NOTE: this means that until the first background enumeration is complete, a getent passwd or a getent group call may return incomplete results. I think this is acceptable as it will really happen only at startup, when the daemon caches are empty.
Here is my concern I mentioned above: How many services and daemons at startup rely on the enumeration?
I think none, only very bad apps really rely on enumeration.
Do we know? Is there any way to know? Should we ask the communities about those to make sure we meet the expectations?
In my experience generally if ldap or nis are not available the machine still comes up fine. Posix even says explicitly that enumeration request can returning nothing IIRC. Also experience with samba's winbindd with enumeration turned off tells me this is not going to be a problem.
What about things line network manager, HAL, System bus, auditd, GDE and many other processes that start at boot?
They all just need at most system accounts (which are local), they could rightly care less a bout other users.
If we block them it might be a show stopper if we provide partial data it might be Ok might be not. I guess we need more input on the matter. Do you agree?
No I think I have enough knowledge to establish that enumerations are not critical at all (and that's why we have them disabled by default in the current released code).
NOTE2: Off course the scheduled refreshes and cleanup tasks are always rescheduled if we are offline or if a fatal error occurs during the task.
IMO this is yet another reason to have a separate back end.
And I still don't get what's the point :)
NOTE3: I am proposing to change only the way enumerations are handled, single user or group lookups will remain unchanged for now and will be dealt with later if needed.
Sure. I think the only impact is that the entry might be refreshed with the enumeration and we need to factor a new time stamp in the cache refresh logic. Other than that I do not see any impact.
We always store the timestamp when saving entries, so that bit will be changed also for single user/groups lookups of course.
Simo.
- We run the enumeration in a single transaction (and yes I have
recently introduced this), which means any other operation is blocked until the enumeration is finished.
Can we create a special back end for enumerations and separate it from individual operations?
Not sure what you mean here.
I am just saying that it might make sense to have the identity back end split into two back ends. One is responsible of the individual operations and another for enumerations. If the enumerations are disabled the enumeration BE is just not started. The enumeration BE will do periodic updates probably using small pages and will not in any way interfere with other BE. Seems like a very logical separation of duties and transaction scoping. And you do not need to worry about server supporting page -request-page-request sequence. I know it can be done from one process using async processing but two separate independent processes make more sense to me.
I do not see code duplication. Common code should be bundled in libraries and reused.
[...]
- Every time we update an entry we store the originalModifyTimestamp on
it as a copy of the remote modifyTimestamp, this allows us to know if we actually need to touch the cached entry at all upon refresh (like when a getpwnam() is called, speeding up operations for entries that need no refresh (we will avoid data transformation and a writing to ldb).
Is this only for enumerations or on individual updates too?
It can be used to test if the retrieved entry is newer or not so that we can save on storing on disk again (write is expensive). But it is primarily for enumerations.
We will go for the entry if we do not see an entry or it is expiring in cache. I guess I might not understand the details of cache implementation. I was under the impression that the cache expiration time stamp is a part of the entry in LDB, and each time we get a record (new or not) we at least need to update the time stamp. Am I right? If so we might save on not updating other attributes of the entry if it has not changed but it would not eliminate the write operation completely. There would be a gain be really a minor one so I am not sure it is worth it.
How it is related to the currently designed and being implemented cache logic?
This is backend side.
Yes but it (BE) changes the time stamp that indicates when the entry was last retrieved so that the front end can be it is decision to request a refresh.
Can you please explain how this would affect the cache logic?
It depends on what cache you are referring to, if you are referring to refreshes performed by the frontend this may make them unnecessary for the ldap driver.
Are there more than one caches? You lost me with the second part of the sentence.
[...]
NOTE: this means that until the first background enumeration is complete, a getent passwd or a getent group call may return incomplete results. I think this is acceptable as it will really happen only at startup, when the daemon caches are empty.
Here is my concern I mentioned above: How many services and daemons at startup rely on the enumeration?
I think none, only very bad apps really rely on enumeration.
Do we know? Is there any way to know? Should we ask the communities about those to make sure we meet the expectations?
In my experience generally if ldap or nis are not available the machine still comes up fine. Posix even says explicitly that enumeration request can returning nothing IIRC. Also experience with samba's winbindd with enumeration turned off tells me this is not going to be a problem.
We seem to make some assumptions based on one data point. May be you are right and things are not that bad for us but such assumption makes me uneasy. [...]
Dmitri
Simo.
sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/sssd-devel
On Thu, 2009-08-13 at 17:45 -0400, Dmitri Pal wrote:
I am just saying that it might make sense to have the identity back end split into two back ends. One is responsible of the individual operations and another for enumerations.
No, it would just be a lot of duplication for no gain. It's simple to change code behavior with options.
I do not see code duplication. Common code should be bundled in libraries and reused.
You have to split interfaces, have new initialization routines, make it more complicated to share connections. Really not worth it.
[..]
We will go for the entry if we do not see an entry or it is expiring in cache. I guess I might not understand the details of cache implementation. I was under the impression that the cache expiration time stamp is a part of the entry in LDB, and each time we get a record (new or not) we at least need to update the time stamp. Am I right? If so we might save on not updating other attributes of the entry if it has not changed but it would not eliminate the write operation completely. There would be a gain be really a minor one so I am not sure it is worth it.
Yeah, you are probably right, let's just remove this point.
Yes but it (BE) changes the time stamp that indicates when the entry was last retrieved so that the front end can be it is decision to request a refresh.
Yes this is how it works now.
Can you please explain how this would affect the cache logic?
It depends on what cache you are referring to, if you are referring to refreshes performed by the frontend this may make them unnecessary for the ldap driver.
Are there more than one caches?
We have an in memory negative cache in the nss frontend, and I am still considering to adopt a shared memory approach to talk with the clients to speed up lookups like nscd does, so that would add a new cache.
In my experience generally if ldap or nis are not available the machine still comes up fine. Posix even says explicitly that enumeration request can returning nothing IIRC. Also experience with samba's winbindd with enumeration turned off tells me this is not going to be a problem.
We seem to make some assumptions based on one data point. May be you are right and things are not that bad for us but such assumption makes me uneasy.
Not assumptions, the data model is quite clear and there are multiple examples of machines coming up without remote backends data w/o any problem. In fact the re are options in nss_ldap to ignore request for certain users explicitly to avoid triggering timeouts at startup when the remote ldap server is still not available.
I really see no problem here, let's move on.
Simo.
I am still considering to adopt a shared memory approach to talk with the clients to speed up lookups like nscd does, so that would add a new cache.
Let us talk about this one more. All there rest I agree with. Can you explain this point in more details?
On Fri, 2009-08-14 at 09:58 -0400, Dmitri Pal wrote:
I am still considering to adopt a shared memory approach to talk with the clients to speed up lookups like nscd does, so that would add a new cache.
Let us talk about this one more. All there rest I agree with. Can you explain this point in more details?
Very briefly: nscd uses a shared memory technique to allow clients to read data and uses memory barriers and garbage collection to update it. The memory is read-only for clients, but is much faster than sending a request and waiting for the kernel to schedule sssd_nss let it process the request, do a search on ldb and then return results.
Of course if the data in the shared memory is old (we'd use timestamps so the client will check if the data is too old) or missing, the client will still revert to the usual communication over the pipe.
Also, in nscd all enumerations always go through the pipe, not sure if we want to do the same or not.
Simo.
sssd-devel@lists.fedorahosted.org