Hi,
please find below the design document for the enhanced NSS API which makes e.g. the client side timeouts which where recently refactored available to callers.
A more visual friendly version can be found at: https://pagure.io/fork/sbose/SSSD/docs/blob/07514ce52845d47fe6bb327f782a865b...
bye, Sumit
Enhanced NSS (Name Service Switch) API ======================================
Related ticket(s): ------------------ https://pagure.io/SSSD/sssd/issue/2478
Problem statement ----------------- The system-wide NSS API provided by glibc with calls like ``getpwnam`` etc might be a bottleneck for applications which are only interested in the data provide by the SSSD backends.
It would be possible to load SSSD's NSS plugin ``libnss_sss.so`` with ``dlopen`` and call the provided functions directly. But besides being a bit cumbersome the API defined by NSS is limited and does not offer option to control the behavior of the plugin. Hence a new API provider by SSSD to lookup objects manages by SSSD would be beneficial.
Use cases --------- The first users of the new API would be the FreeIPA ``extdom`` and ``slapi-nis`` plugins. Both are use to make information about users and groups from trusted domains available and hence are only interested results from SSSD.
Both plugins are run in threads of the 389ds directory server and to work efficiently the current mutex based locking of the connection of SSSD should be improved.
Additionally ``slapi-nis`` caches results for performance reasons. Being a part of 389ds it will be notified if 389ds objects like e.g. id-overrides are modified and can drop the cached object. But since SSSD caches results as well a new lookup of the modified object via slapi-nis might still return the old result which will then stay in the cache of ``slapi-nis`` because no now modification is detected. As a result ``slapi-nis`` should be able to invalidate objects in the caches of SSSD to allow a refresh. The ``sss_cache`` is only of limited use here because it requires root privileges and currently invalidates the whole memory cache.
Overview of the solution ------------------------ The use cases that the new API should offer a timeout option so that an individual thread does not have to wait until the request is completely handled by SSSD but can abandon from the request and maybe check later if a result is available.
Additionally an option to invalidate the cached entires of an individual object are needed. It would be possible to use a set of new calls for this but since most of the processing of such request and similar to the lookup requests it would be more straight forward to use a flag to indicate that the cached entry of the found object should be invalidated.
This gives us an API with two new options, a timeout and the bit-field for flags. The flags might be useful for future use cases as well. And the flag to invalidate cached entries of individual objects might be used in future by the ``sss_cache`` utility to make the code simpler and to avoid to completely remove the memory cache.
Implementation details ----------------------
Client side ~~~~~~~~~~~ Some of the flags might have influence on the client behavior. E.g. the flag to invalidate the cached entries of an object would tell the client to bypass the memory cache. This is needed because the memory cache will only contain the data of object which were directly requested. But the SSSD on-disk cache might already contain data of objects which are lookup up indirectly, e.g. groups during a initgroups request or users while looking up group members. So a missing entry in the memory cache does not indicate that the entry is missing in the on-disk cache as well. Additionally memory cache and on-disk cache have different timeouts which would require the request to go directly to the SSSD responder as well.
For the timeout it has to be kept in mind that in a threaded environment there are two major steps where time might be spend. First is of course the communication with the SSSD responder. But the second is waiting on the mutex. Currently SSSD's NSS plugin ``libnss_sss.so`` uses ``pthread_mutex_lock`` which waits without limits until the mutex can be locked. The reason is that ``libnss_sss.so`` should restrict itself to only use glibc and not libpthread and currently glibc only implements ``pthread_mutex_lock`` (see discussion in https://bugzilla.redhat.com/show_bug.cgi?id=1369130 for details about this requirement).
To wait only a limited amount of time libpthread offers ``pthread_mutex_timedlock`` but due to the restriction mentioned above this should be implemented independently of the NSS plugin code. ``libsss_nss_idmap`` looks like the most suitable place for this.
New calls ~~~~~~~~~ The API of the new calls will be defined in ``sss_nss_idmap.h``::
/** * Flags to control the behavior and the results for sss_*_ex() calls */
#define SSS_NSS_EX_FLAG_NO_FLAGS 0
/** Always request data from the server side, client must be privileged to do * so, see nss_trusted_users option in man sssd.conf for details. * This flag cannot be used together with SSS_NSS_EX_FLAG_INVALIDATE_CACHE */ #define SSS_NSS_EX_FLAG_NO_CACHE (1 << 0)
/** Invalidate the data in the caches, client must be privileged to do * so, see nss_trusted_users option in man sssd.conf for details. * This flag cannot be used together with SSS_NSS_EX_FLAG_NO_CACHE */ #define SSS_NSS_EX_FLAG_INVALIDATE_CACHE (1 << 1)
/** * @brief Return user information based on the user name * * @param[in] name same as for getpwnam_r(3) * @param[in] pwd same as for getpwnam_r(3) * @param[in] buffer same as for getpwnam_r(3) * @param[in] buflen same as for getpwnam_r(3) * @param[out] result same as for getpwnam_r(3) * @param[in] flags flags to control the behavior and the results of the * call * @param[in] timeout timeout in milliseconds * * @return * - 0: * - ENOENT: no user with the given name found * - ERANGE: Insufficient buffer space supplied * - ETIME: request timed out but was send to SSSD * - ETIMEDOUT: request timed out but was not send to SSSD */ int sss_nss_getpwnam_timeout(const char *name, struct passwd *pwd, char *buffer, size_t buflen, struct passwd **result, uint32_t flags, unsigned int timeout);
/** * @brief Return user information based on the user uid * * @param[in] uid same as for getpwuid_r(3) * @param[in] pwd same as for getpwuid_r(3) * @param[in] buffer same as for getpwuid_r(3) * @param[in] buflen same as for getpwuid_r(3) * @param[out] result same as for getpwuid_r(3) * @param[in] flags flags to control the behavior and the results of the * call * @param[in] timeout timeout in milliseconds * * @return * - 0: * - ENOENT: no user with the given uid found * - ERANGE: Insufficient buffer space supplied * - ETIME: request timed out but was send to SSSD * - ETIMEDOUT: request timed out but was not send to SSSD */ int sss_nss_getpwuid_timeout(uid_t uid, struct passwd *pwd, char *buffer, size_t buflen, struct passwd **result, uint32_t flags, unsigned int timeout);
/** * @brief Return group information based on the group name * * @param[in] name same as for getgrnam_r(3) * @param[in] pwd same as for getgrnam_r(3) * @param[in] buffer same as for getgrnam_r(3) * @param[in] buflen same as for getgrnam_r(3) * @param[out] result same as for getgrnam_r(3) * @param[in] flags flags to control the behavior and the results of the * call * @param[in] timeout timeout in milliseconds * * @return * - 0: * - ENOENT: no group with the given name found * - ERANGE: Insufficient buffer space supplied * - ETIME: request timed out but was send to SSSD * - ETIMEDOUT: request timed out but was not send to SSSD */ int sss_nss_getgrnam_timeout(const char *name, struct group *grp, char *buffer, size_t buflen, struct group **result, uint32_t flags, unsigned int timeout);
/** * @brief Return group information based on the group gid * * @param[in] gid same as for getgrgid_r(3) * @param[in] pwd same as for getgrgid_r(3) * @param[in] buffer same as for getgrgid_r(3) * @param[in] buflen same as for getgrgid_r(3) * @param[out] result same as for getgrgid_r(3) * @param[in] flags flags to control the behavior and the results of the * call * @param[in] timeout timeout in milliseconds * * @return * - 0: * - ENOENT: no group with the given gid found * - ERANGE: Insufficient buffer space supplied * - ETIME: request timed out but was send to SSSD * - ETIMEDOUT: request timed out but was not send to SSSD */ int sss_nss_getgrgid_timeout(gid_t gid, struct group *grp, char *buffer, size_t buflen, struct group **result, uint32_t flags, unsigned int timeout);
/** * @brief Return a list of groups to which a user belongs * * @param[in] name name of the user * @param[in] gid same as second argument of getgrouplist(3) * @param[in] groups array of gid_t of size ngroups, will be filled * with GIDs of groups the user belongs to * @param[in,out] ngroups size of the groups array on input. On output it * will contain the actual number of groups the * user belongs to. With a return value of 0 the * groups array was large enough to hold all group. * With a return valu of ERANGE the array was not * large enough and ngroups will have the needed * size. * @param[in] flags flags to control the behavior and the results of * the call * @param[in] timeout timeout in milliseconds * * @return * - 0: success * - ENOENT: no user with the given name found * - ERANGE: Insufficient buffer space supplied * - ETIME: request timed out but was send to SSSD * - ETIMEDOUT: request timed out but was not send to SSSD */ int sss_nss_getgrouplist_timeout(const char *name, gid_t group, gid_t *groups, int *ngroups, uint32_t flags, unsigned int timeout);
SSSD responder side ~~~~~~~~~~~~~~~~~~~ The SSSD NSS responder has to be prepared to accept the flags in the request. Currently only a name or a ID are expected. To handle this new request types:
* SSS_NSS_GETPWNAM_EX * SSS_NSS_GETPWUID_EX * SSS_NSS_GETGRNAM_EX * SSS_NSS_GETGRGID_EX * SSS_NSS_INITGR_EX
will be added. If the flags are not set will behave as the non-EX calls.
If ``SSS_NSS_EX_FLAG_NO_CACHE`` is set ``cache_req_data_set_bypass_cache()`` will be called so that the cache-request framework will directly request new data from the backend.
If ``SSS_NSS_EX_FLAG_INVALIDATE_CACHE`` is set ``cache_req_data_set_bypass_dp()`` (which will be implemented with this feature) will be called to only search the requested object in the cache. If it was found the cached entry will be invalidate in both on-disk and memory cache. If ``SSS_NSS_EX_FLAG_INVALIDATE_CACHE`` was sent with SSS_NSS_INITGR_EX both the groupmembership data and the plain user data will be invalidates.
The flags ``SSS_NSS_EX_FLAG_NO_CACHE`` and ``SSS_NSS_EX_FLAG_INVALIDATE_CACHE`` cannot be used at the same time.
Since both flags might have impact on performace not all caller should be allowed to use them. To configure which user is allowed a new option ``nss_trusted_users`` will be used which will follow the same rules as the already existing option ``pam_trusted_users``.
Configuration changes --------------------- A new option ``nss_trusted_users`` will be introduced. A man page entry might look like::
nss_trusted_users (string) Specifies the comma-separated list of UID values or user names that are allowed to run privileged operation in the NSS responder, e.g. bypass the cache and refresh and entry directly. Users not included in this list will get a permissions error. User names are resolved to UIDs at startup.
Default: Only UID 0
Please note that UID 0 is always allowed to run privileged operation in the NSS responder even in case it is not in the nss_trusted_users list.
How To Test ----------- To test the new calls should be used in C-programs. If flags are used and the calling user will not be root ``nss_trusted_users`` must be set accordingly.
How To Debug ------------ ``libsss_nss_idmap`` currently does not has any logging infrastructure so only the debug logs of the SSSD responder are available.
If the ``SSS_NSS_EX_FLAG_NO_CACHE`` or ``SSS_NSS_EX_FLAG_INVALIDATE_CACHE`` are used on the client side ``strace`` can be used to see is the client skips the memory cache as expected.
Authors ------- * Sumit Bose sbose@redhat.com
On Wed, Oct 25, 2017 at 05:39:21PM +0200, Sumit Bose wrote:
Hi,
please find below the design document for the enhanced NSS API which makes e.g. the client side timeouts which where recently refactored available to callers.
A more visual friendly version can be found at: https://pagure.io/fork/sbose/SSSD/docs/blob/07514ce52845d47fe6bb327f782a865b...
bye, Sumit
LGTM!
On Thu, 2017-10-26 at 12:16 +0200, Jakub Hrozek wrote:
On Wed, Oct 25, 2017 at 05:39:21PM +0200, Sumit Bose wrote:
Hi,
please find below the design document for the enhanced NSS API which makes e.g. the client side timeouts which where recently refactored available to callers.
A more visual friendly version can be found at: https://pagure.io/fork/sbose/SSSD/docs/blob/07514ce52845d47fe6bb327 f782a865bfa75628a/f/design_pages/enhanced_nss_api.rst
bye, Sumit
LGTM!
Looking at this I have some questions, if you are going to create a new library and just need to set a timeout it seem it would be a much better interface to use a context handle you allocate and pass into each call, and then have getters setters to set timeouts or any other flags that should influence the whole behavior. This will allow you also to control how many concurrent connections you want to have against sssd, as each new context will create a new socket connection and all.
In the original libnss_sss.so we could not do that because the glibc interface does not offer any way to hold a context, but there is no reason to continue along that line in a *new* API. And not using contexts in threaded applications is generally a bad idea, as you end up *requiring* te use of mutexes when that is really not always a need (separated threads can open separate connections and not share any data that require mutexes).
On the responder side I also do not see why new calls are being created. You clearly want a client-wide behavior, introduce ONE new call that sets flags for the rest of the connection, and just reuse the usual commands otherwise.
I do not understand what is the point of nss_truste_users why a force reload is a privileged operation ?
I guess DNLSGTM ?
Simo.
On Thu, Oct 26, 2017 at 02:43:29PM -0400, Simo Sorce wrote:
On Thu, 2017-10-26 at 12:16 +0200, Jakub Hrozek wrote:
On Wed, Oct 25, 2017 at 05:39:21PM +0200, Sumit Bose wrote:
Hi,
please find below the design document for the enhanced NSS API which makes e.g. the client side timeouts which where recently refactored available to callers.
A more visual friendly version can be found at: https://pagure.io/fork/sbose/SSSD/docs/blob/07514ce52845d47fe6bb327 f782a865bfa75628a/f/design_pages/enhanced_nss_api.rst
bye, Sumit
LGTM!
Looking at this I have some questions, if you are going to create a new library and just need to set a timeout it seem it would be a much better interface to use a context handle you allocate and pass into each call, and then have getters setters to set timeouts or any other flags that should influence the whole behavior. This will allow you also to control how many concurrent connections you want to have against sssd, as each new context will create a new socket connection and all.
In the original libnss_sss.so we could not do that because the glibc interface does not offer any way to hold a context, but there is no reason to continue along that line in a *new* API. And not using contexts in threaded applications is generally a bad idea, as you end up *requiring* te use of mutexes when that is really not always a need (separated threads can open separate connections and not share any data that require mutexes).
This sounds like a good 2.0 feature, are you interested in creating a more detailed design page for this? Currently my goal was to reuse as much or the current trusted code we already have.
On the responder side I also do not see why new calls are being created. You clearly want a client-wide behavior, introduce ONE new call that sets flags for the rest of the connection, and just reuse the usual commands otherwise.
The current flags, like invalidating a cached entry are per-request, only the single object address by the current request should be invalidate not all object which are requested on the same connection.
I do not understand what is the point of nss_truste_users why a force reload is a privileged operation ?
Since it can force expensive operations on the backends which will hit servers I think not everybody should be allowed to do this.
I guess DNLSGTM ?
A test-builds with a simplified version of the related patches solved bottleneck issues with 389ds IPA modules. So at least it would help without introducing other issues.
bye, Sumit
Simo.
-- Simo Sorce Sr. Principal Software Engineer Red Hat, Inc _______________________________________________ sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org To unsubscribe send an email to sssd-devel-leave@lists.fedorahosted.org
On Thu, 2017-10-26 at 22:14 +0200, Sumit Bose wrote:
On Thu, Oct 26, 2017 at 02:43:29PM -0400, Simo Sorce wrote:
On Thu, 2017-10-26 at 12:16 +0200, Jakub Hrozek wrote:
On Wed, Oct 25, 2017 at 05:39:21PM +0200, Sumit Bose wrote:
Hi,
please find below the design document for the enhanced NSS API which makes e.g. the client side timeouts which where recently refactored available to callers.
A more visual friendly version can be found at: https://pagure.io/fork/sbose/SSSD/docs/blob/07514ce52845d47fe6b b327 f782a865bfa75628a/f/design_pages/enhanced_nss_api.rst
bye, Sumit
LGTM!
Looking at this I have some questions, if you are going to create a new library and just need to set a timeout it seem it would be a much better interface to use a context handle you allocate and pass into each call, and then have getters setters to set timeouts or any other flags that should influence the whole behavior. This will allow you also to control how many concurrent connections you want to have against sssd, as each new context will create a new socket connection and all.
In the original libnss_sss.so we could not do that because the glibc interface does not offer any way to hold a context, but there is no reason to continue along that line in a *new* API. And not using contexts in threaded applications is generally a bad idea, as you end up *requiring* te use of mutexes when that is really not always a need (separated threads can open separate connections and not share any data that require mutexes).
This sounds like a good 2.0 feature, are you interested in creating a more detailed design page for this?
Sure.
Currently my goal was to reuse as much or the current trusted code we already have.
On the responder side I also do not see why new calls are being created. You clearly want a client-wide behavior, introduce ONE new call that sets flags for the rest of the connection, and just reuse the usual commands otherwise.
The current flags, like invalidating a cached entry are per-request, only the single object address by the current request should be invalidate not all object which are requested on the same connection.
I would probably add a command to explicitly invalidate individual caches, this would avoid having special paths on every other call, resulting in cleaner code, at the cost of one more roundtrip though, so I guess it is a matter of figuring out what is the right balance here.
I do not understand what is the point of nss_truste_users why a force reload is a privileged operation ?
Since it can force expensive operations on the backends which will hit servers I think not everybody should be allowed to do this.
You can already force this by requesting unexisting users/groups, I am not convinced this necessarily needs to be a privileged operation as there are already ways to cause work for SSSD. I would rather drop it. If we really want to deal with potential abuse we should introduce rate-limiting per uid, and basically slow down to a halt abusing clients by giving weights and possibly quotas, doling out privileges is cumbersome anyway and does not really prevent a malicious client to cause hard ATM.
IMHO nss_trusted_users gets a NACK as a concept and should be dropped.
I guess DNLSGTM ?
A test-builds with a simplified version of the related patches solved bottleneck issues with 389ds IPA modules. So at least it would help without introducing other issues.
I did not know we already had code, and I am not against the goal of improving the bottlenecks at all, but we should definitely improve the design here as this design is way too conservative and introduces APIs that we probably want to toss with a better design. So please mark those functions "private/tech preview/whatever" by putting them behind a guarding #ifdef in the header so that people really need to know what they are doing to use them and we can more easily deprecate and remove them quickly.
Simo.
bye, Sumit
Simo.
-- Simo Sorce Sr. Principal Software Engineer Red Hat, Inc _______________________________________________ sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org To unsubscribe send an email to sssd-devel-leave@lists.fedorahosted .org
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org To unsubscribe send an email to sssd-devel-leave@lists.fedorahosted.o rg
On Fri, Oct 27, 2017 at 08:43:28AM -0400, Simo Sorce wrote:
On Thu, 2017-10-26 at 22:14 +0200, Sumit Bose wrote:
On Thu, Oct 26, 2017 at 02:43:29PM -0400, Simo Sorce wrote:
On Thu, 2017-10-26 at 12:16 +0200, Jakub Hrozek wrote:
On Wed, Oct 25, 2017 at 05:39:21PM +0200, Sumit Bose wrote:
Hi,
please find below the design document for the enhanced NSS API which makes e.g. the client side timeouts which where recently refactored available to callers.
A more visual friendly version can be found at: https://pagure.io/fork/sbose/SSSD/docs/blob/07514ce52845d47fe6b b327 f782a865bfa75628a/f/design_pages/enhanced_nss_api.rst
bye, Sumit
LGTM!
Looking at this I have some questions, if you are going to create a new library and just need to set a timeout it seem it would be a much better interface to use a context handle you allocate and pass into each call, and then have getters setters to set timeouts or any other flags that should influence the whole behavior. This will allow you also to control how many concurrent connections you want to have against sssd, as each new context will create a new socket connection and all.
In the original libnss_sss.so we could not do that because the glibc interface does not offer any way to hold a context, but there is no reason to continue along that line in a *new* API. And not using contexts in threaded applications is generally a bad idea, as you end up *requiring* te use of mutexes when that is really not always a need (separated threads can open separate connections and not share any data that require mutexes).
This sounds like a good 2.0 feature, are you interested in creating a more detailed design page for this?
Sure.
Currently my goal was to reuse as much or the current trusted code we already have.
On the responder side I also do not see why new calls are being created. You clearly want a client-wide behavior, introduce ONE new call that sets flags for the rest of the connection, and just reuse the usual commands otherwise.
The current flags, like invalidating a cached entry are per-request, only the single object address by the current request should be invalidate not all object which are requested on the same connection.
I would probably add a command to explicitly invalidate individual caches, this would avoid having special paths on every other call, resulting in cleaner code, at the cost of one more roundtrip though, so I guess it is a matter of figuring out what is the right balance here.
I do not understand what is the point of nss_truste_users why a force reload is a privileged operation ?
Since it can force expensive operations on the backends which will hit servers I think not everybody should be allowed to do this.
You can already force this by requesting unexisting users/groups, I am not convinced this necessarily needs to be a privileged operation as there are already ways to cause work for SSSD. I would rather drop it. If we really want to deal with potential abuse we should introduce rate-limiting per uid, and basically slow down to a halt abusing clients by giving weights and possibly quotas, doling out privileges is cumbersome anyway and does not really prevent a malicious client to cause hard ATM.
IMHO nss_trusted_users gets a NACK as a concept and should be dropped.
Of course I can remove it, but since removing it later is easier than adding it later I'd like to try to explain again why I think it would be useful.
You are right that it is already possible to send requests to the servers via SSSD. But as you said this are "only" searches for unexisting user and groups which should be handled by the indexes on the server quite efficiently and causes no disk-I/O on the client. Additionally we try to avoid accidental misuse of this with the negative cache.
The flags might trigger operations like looking up all groups a user is a member of or looking up a group with all members. While only the first might cause a more expensive operation on the server both might cause a lot of disk-I/O on the client.
Rate limitations would help to mitigate misuse as well. But a typical SSSD client would have no use for the flags so why allow it to use them?
That's why I think nss_trusted_users is a good way to avoid accidental misuse in a similar way as the negative cache. But if you prefer I'll drop it.
I guess DNLSGTM ?
A test-builds with a simplified version of the related patches solved bottleneck issues with 389ds IPA modules. So at least it would help without introducing other issues.
I did not know we already had code, and I am not against the goal of improving the bottlenecks at all, but we should definitely improve the design here as this design is way too conservative and introduces APIs that we probably want to toss with a better design. So please mark those functions "private/tech preview/whatever" by putting them behind a guarding #ifdef in the header so that people really need to know what they are doing to use them and we can more easily deprecate and remove them quickly.
ok, done.
bye, Sumit
Simo.
bye, Sumit
Simo.
-- Simo Sorce Sr. Principal Software Engineer Red Hat, Inc _______________________________________________ sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org To unsubscribe send an email to sssd-devel-leave@lists.fedorahosted .org
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org To unsubscribe send an email to sssd-devel-leave@lists.fedorahosted.o rg
-- Simo Sorce Sr. Principal Software Engineer Red Hat, Inc _______________________________________________ sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org To unsubscribe send an email to sssd-devel-leave@lists.fedorahosted.org
On Thu, 2017-11-02 at 13:14 +0100, Sumit Bose wrote:
On Fri, Oct 27, 2017 at 08:43:28AM -0400, Simo Sorce wrote:
On Thu, 2017-10-26 at 22:14 +0200, Sumit Bose wrote:
On Thu, Oct 26, 2017 at 02:43:29PM -0400, Simo Sorce wrote:
On Thu, 2017-10-26 at 12:16 +0200, Jakub Hrozek wrote:
On Wed, Oct 25, 2017 at 05:39:21PM +0200, Sumit Bose wrote:
Hi,
please find below the design document for the enhanced NSS API which makes e.g. the client side timeouts which where recently refactored available to callers.
A more visual friendly version can be found at: https://pagure.io/fork/sbose/SSSD/docs/blob/07514ce52845d47 fe6b b327 f782a865bfa75628a/f/design_pages/enhanced_nss_api.rst
bye, Sumit
LGTM!
Looking at this I have some questions, if you are going to create a new library and just need to set a timeout it seem it would be a much better interface to use a context handle you allocate and pass into each call, and then have getters setters to set timeouts or any other flags that should influence the whole behavior. This will allow you also to control how many concurrent connections you want to have against sssd, as each new context will create a new socket connection and all.
In the original libnss_sss.so we could not do that because the glibc interface does not offer any way to hold a context, but there is no reason to continue along that line in a *new* API. And not using contexts in threaded applications is generally a bad idea, as you end up *requiring* te use of mutexes when that is really not always a need (separated threads can open separate connections and not share any data that require mutexes).
This sounds like a good 2.0 feature, are you interested in creating a more detailed design page for this?
Sure.
Currently my goal was to reuse as much or the current trusted code we already have.
On the responder side I also do not see why new calls are being created. You clearly want a client-wide behavior, introduce ONE new call that sets flags for the rest of the connection, and just reuse the usual commands otherwise.
The current flags, like invalidating a cached entry are per- request, only the single object address by the current request should be invalidate not all object which are requested on the same connection.
I would probably add a command to explicitly invalidate individual caches, this would avoid having special paths on every other call, resulting in cleaner code, at the cost of one more roundtrip though, so I guess it is a matter of figuring out what is the right balance here.
I do not understand what is the point of nss_truste_users why a force reload is a privileged operation ?
Since it can force expensive operations on the backends which will hit servers I think not everybody should be allowed to do this.
You can already force this by requesting unexisting users/groups, I am not convinced this necessarily needs to be a privileged operation as there are already ways to cause work for SSSD. I would rather drop it. If we really want to deal with potential abuse we should introduce rate-limiting per uid, and basically slow down to a halt abusing clients by giving weights and possibly quotas, doling out privileges is cumbersome anyway and does not really prevent a malicious client to cause hard ATM.
IMHO nss_trusted_users gets a NACK as a concept and should be dropped.
Of course I can remove it, but since removing it later is easier than adding it later I'd like to try to explain again why I think it would be useful.
You are right that it is already possible to send requests to the servers via SSSD. But as you said this are "only" searches for unexisting user and groups which should be handled by the indexes on the server quite efficiently and causes no disk-I/O on the client. Additionally we try to avoid accidental misuse of this with the negative cache.
You can also call out existing users, especially in large domains if you have groups big enough you can cause fast cache thrashing very easily, this is already a big deal for the client. Client disk load is uninteresting because a process can simply write/fsync locally to cause disk I/O issues, and traffic towards the server is also uninteresting because a client can simply directly contact the server directly to cause load.
So what we are left with is uniquely the fact a process might keep the sssd_be process busier and cause other processes on the system to get slower responses. But this is easy to deal with by simply throttling bad behaving users (and the admin kicking them out).
The flags might trigger operations like looking up all groups a user is a member of or looking up a group with all members. While only the first might cause a more expensive operation on the server both might cause a lot of disk-I/O on the client.
Rate limitations would help to mitigate misuse as well. But a typical SSSD client would have no use for the flags so why allow it to use them?
They may have a need to insure a specific user/group is refreshed, tools would be able to use this instead of having to be be root and deleting the whole sssd cache just to refresh a user you know has been changed on the server. We wouldn't want people to abuse this of course, but it's not a bad idea.
That's why I think nss_trusted_users is a good way to avoid accidental misuse in a similar way as the negative cache. But if you prefer I'll drop it.
I do not know that you can easily change it later, and I would rather use file permissions than explicit checks, that would mean exposing a "privileged" socket that only users that are part of a group of "trusted" service can access, and this is clearly a lot more work, which I am not advocating.
I am really torn on the need for this, it will make using this feature really cumbersome as you have to explicitly modify a configuration file and list specific users (groups ?), and this is to balance against a vague chance of a user causing a local DoS/slowdown but not a lot more.
To me it sounds like a very big hammer for a very small fly.
Simo.
I guess DNLSGTM ?
A test-builds with a simplified version of the related patches solved bottleneck issues with 389ds IPA modules. So at least it would help without introducing other issues.
I did not know we already had code, and I am not against the goal of improving the bottlenecks at all, but we should definitely improve the design here as this design is way too conservative and introduces APIs that we probably want to toss with a better design. So please mark those functions "private/tech preview/whatever" by putting them behind a guarding #ifdef in the header so that people really need to know what they are doing to use them and we can more easily deprecate and remove them quickly.
ok, done.
bye, Sumit
Simo.
bye, Sumit
Simo.
-- Simo Sorce Sr. Principal Software Engineer Red Hat, Inc _______________________________________________ sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org To unsubscribe send an email to sssd-devel-leave@lists.fedoraho sted .org
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org To unsubscribe send an email to sssd-devel-leave@lists.fedorahost ed.o rg
-- Simo Sorce Sr. Principal Software Engineer Red Hat, Inc _______________________________________________ sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org To unsubscribe send an email to sssd-devel-leave@lists.fedorahosted .org
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org To unsubscribe send an email to sssd-devel-leave@lists.fedorahosted.o rg
On to, 02 marras 2017, Simo Sorce wrote:
On Thu, 2017-11-02 at 13:14 +0100, Sumit Bose wrote:
On Fri, Oct 27, 2017 at 08:43:28AM -0400, Simo Sorce wrote:
On Thu, 2017-10-26 at 22:14 +0200, Sumit Bose wrote:
On Thu, Oct 26, 2017 at 02:43:29PM -0400, Simo Sorce wrote:
On Thu, 2017-10-26 at 12:16 +0200, Jakub Hrozek wrote:
On Wed, Oct 25, 2017 at 05:39:21PM +0200, Sumit Bose wrote: > Hi, > > please find below the design document for the enhanced NSS > API > which > makes e.g. the client side timeouts which where recently > refactored > available to callers. > > A more visual friendly version can be found at: > https://pagure.io/fork/sbose/SSSD/docs/blob/07514ce52845d47 > fe6b > b327 > f782a865bfa75628a/f/design_pages/enhanced_nss_api.rst > > bye, > Sumit
LGTM!
Looking at this I have some questions, if you are going to create a new library and just need to set a timeout it seem it would be a much better interface to use a context handle you allocate and pass into each call, and then have getters setters to set timeouts or any other flags that should influence the whole behavior. This will allow you also to control how many concurrent connections you want to have against sssd, as each new context will create a new socket connection and all.
In the original libnss_sss.so we could not do that because the glibc interface does not offer any way to hold a context, but there is no reason to continue along that line in a *new* API. And not using contexts in threaded applications is generally a bad idea, as you end up *requiring* te use of mutexes when that is really not always a need (separated threads can open separate connections and not share any data that require mutexes).
This sounds like a good 2.0 feature, are you interested in creating a more detailed design page for this?
Sure.
Currently my goal was to reuse as much or the current trusted code we already have.
On the responder side I also do not see why new calls are being created. You clearly want a client-wide behavior, introduce ONE new call that sets flags for the rest of the connection, and just reuse the usual commands otherwise.
The current flags, like invalidating a cached entry are per- request, only the single object address by the current request should be invalidate not all object which are requested on the same connection.
I would probably add a command to explicitly invalidate individual caches, this would avoid having special paths on every other call, resulting in cleaner code, at the cost of one more roundtrip though, so I guess it is a matter of figuring out what is the right balance here.
I do not understand what is the point of nss_truste_users why a force reload is a privileged operation ?
Since it can force expensive operations on the backends which will hit servers I think not everybody should be allowed to do this.
You can already force this by requesting unexisting users/groups, I am not convinced this necessarily needs to be a privileged operation as there are already ways to cause work for SSSD. I would rather drop it. If we really want to deal with potential abuse we should introduce rate-limiting per uid, and basically slow down to a halt abusing clients by giving weights and possibly quotas, doling out privileges is cumbersome anyway and does not really prevent a malicious client to cause hard ATM.
IMHO nss_trusted_users gets a NACK as a concept and should be dropped.
Of course I can remove it, but since removing it later is easier than adding it later I'd like to try to explain again why I think it would be useful.
You are right that it is already possible to send requests to the servers via SSSD. But as you said this are "only" searches for unexisting user and groups which should be handled by the indexes on the server quite efficiently and causes no disk-I/O on the client. Additionally we try to avoid accidental misuse of this with the negative cache.
You can also call out existing users, especially in large domains if you have groups big enough you can cause fast cache thrashing very easily, this is already a big deal for the client. Client disk load is uninteresting because a process can simply write/fsync locally to cause disk I/O issues, and traffic towards the server is also uninteresting because a client can simply directly contact the server directly to cause load.
So what we are left with is uniquely the fact a process might keep the sssd_be process busier and cause other processes on the system to get slower responses. But this is easy to deal with by simply throttling bad behaving users (and the admin kicking them out).
The flags might trigger operations like looking up all groups a user is a member of or looking up a group with all members. While only the first might cause a more expensive operation on the server both might cause a lot of disk-I/O on the client.
Rate limitations would help to mitigate misuse as well. But a typical SSSD client would have no use for the flags so why allow it to use them?
They may have a need to insure a specific user/group is refreshed, tools would be able to use this instead of having to be be root and deleting the whole sssd cache just to refresh a user you know has been changed on the server. We wouldn't want people to abuse this of course, but it's not a bad idea.
That's why I think nss_trusted_users is a good way to avoid accidental misuse in a similar way as the negative cache. But if you prefer I'll drop it.
I do not know that you can easily change it later, and I would rather use file permissions than explicit checks, that would mean exposing a "privileged" socket that only users that are part of a group of "trusted" service can access, and this is clearly a lot more work, which I am not advocating.
I am really torn on the need for this, it will make using this feature really cumbersome as you have to explicitly modify a configuration file and list specific users (groups ?), and this is to balance against a vague chance of a user causing a local DoS/slowdown but not a lot more.
To me it sounds like a very big hammer for a very small fly.
Right now the only user for this is dirsrv on IPA masters. If you throttle dirsrv plugins, you are denying SSSD clients from IPA clients from getting actual results. Throttling, thus, would be a wrong measure here.
On Thu, 2017-11-02 at 14:53 +0200, Alexander Bokovoy wrote:
On to, 02 marras 2017, Simo Sorce wrote:
On Thu, 2017-11-02 at 13:14 +0100, Sumit Bose wrote:
On Fri, Oct 27, 2017 at 08:43:28AM -0400, Simo Sorce wrote:
On Thu, 2017-10-26 at 22:14 +0200, Sumit Bose wrote:
On Thu, Oct 26, 2017 at 02:43:29PM -0400, Simo Sorce wrote:
On Thu, 2017-10-26 at 12:16 +0200, Jakub Hrozek wrote: > On Wed, Oct 25, 2017 at 05:39:21PM +0200, Sumit Bose > wrote: > > Hi, > > > > please find below the design document for the enhanced > > NSS > > API > > which > > makes e.g. the client side timeouts which where > > recently > > refactored > > available to callers. > > > > A more visual friendly version can be found at: > > https://pagure.io/fork/sbose/SSSD/docs/blob/07514ce5284 > > 5d47 > > fe6b > > b327 > > f782a865bfa75628a/f/design_pages/enhanced_nss_api.rst > > > > bye, > > Sumit > > LGTM! >
Looking at this I have some questions, if you are going to create a new library and just need to set a timeout it seem it would be a much better interface to use a context handle you allocate and pass into each call, and then have getters setters to set timeouts or any other flags that should influence the whole behavior. This will allow you also to control how many concurrent connections you want to have against sssd, as each new context will create a new socket connection and all.
In the original libnss_sss.so we could not do that because the glibc interface does not offer any way to hold a context, but there is no reason to continue along that line in a *new* API. And not using contexts in threaded applications is generally a bad idea, as you end up *requiring* te use of mutexes when that is really not always a need (separated threads can open separate connections and not share any data that require mutexes).
This sounds like a good 2.0 feature, are you interested in creating a more detailed design page for this?
Sure.
Currently my goal was to reuse as much or the current trusted code we already have.
On the responder side I also do not see why new calls are being created. You clearly want a client-wide behavior, introduce ONE new call that sets flags for the rest of the connection, and just reuse the usual commands otherwise.
The current flags, like invalidating a cached entry are per- request, only the single object address by the current request should be invalidate not all object which are requested on the same connection.
I would probably add a command to explicitly invalidate individual caches, this would avoid having special paths on every other call, resulting in cleaner code, at the cost of one more roundtrip though, so I guess it is a matter of figuring out what is the right balance here.
I do not understand what is the point of nss_truste_users why a force reload is a privileged operation ?
Since it can force expensive operations on the backends which will hit servers I think not everybody should be allowed to do this.
You can already force this by requesting unexisting users/groups, I am not convinced this necessarily needs to be a privileged operation as there are already ways to cause work for SSSD. I would rather drop it. If we really want to deal with potential abuse we should introduce rate-limiting per uid, and basically slow down to a halt abusing clients by giving weights and possibly quotas, doling out privileges is cumbersome anyway and does not really prevent a malicious client to cause hard ATM.
IMHO nss_trusted_users gets a NACK as a concept and should be dropped.
Of course I can remove it, but since removing it later is easier than adding it later I'd like to try to explain again why I think it would be useful.
You are right that it is already possible to send requests to the servers via SSSD. But as you said this are "only" searches for unexisting user and groups which should be handled by the indexes on the server quite efficiently and causes no disk-I/O on the client. Additionally we try to avoid accidental misuse of this with the negative cache.
You can also call out existing users, especially in large domains if you have groups big enough you can cause fast cache thrashing very easily, this is already a big deal for the client. Client disk load is uninteresting because a process can simply write/fsync locally to cause disk I/O issues, and traffic towards the server is also uninteresting because a client can simply directly contact the server directly to cause load.
So what we are left with is uniquely the fact a process might keep the sssd_be process busier and cause other processes on the system to get slower responses. But this is easy to deal with by simply throttling bad behaving users (and the admin kicking them out).
The flags might trigger operations like looking up all groups a user is a member of or looking up a group with all members. While only the first might cause a more expensive operation on the server both might cause a lot of disk-I/O on the client.
Rate limitations would help to mitigate misuse as well. But a typical SSSD client would have no use for the flags so why allow it to use them?
They may have a need to insure a specific user/group is refreshed, tools would be able to use this instead of having to be be root and deleting the whole sssd cache just to refresh a user you know has been changed on the server. We wouldn't want people to abuse this of course, but it's not a bad idea.
That's why I think nss_trusted_users is a good way to avoid accidental misuse in a similar way as the negative cache. But if you prefer I'll drop it.
I do not know that you can easily change it later, and I would rather use file permissions than explicit checks, that would mean exposing a "privileged" socket that only users that are part of a group of "trusted" service can access, and this is clearly a lot more work, which I am not advocating.
I am really torn on the need for this, it will make using this feature really cumbersome as you have to explicitly modify a configuration file and list specific users (groups ?), and this is to balance against a vague chance of a user causing a local DoS/slowdown but not a lot more.
To me it sounds like a very big hammer for a very small fly.
Right now the only user for this is dirsrv on IPA masters. If you throttle dirsrv plugins, you are denying SSSD clients from IPA clients from getting actual results. Throttling, thus, would be a wrong measure here.
Note that throttling is a possible future measure if it turns out this is needed, I am not advocating for implementing throttling now.
Simo.
On 10/25/2017 05:39 PM, Sumit Bose wrote:
Hi,
please find below the design document for the enhanced NSS API which makes e.g. the client side timeouts which where recently refactored available to callers.
A more visual friendly version can be found at: https://pagure.io/fork/sbose/SSSD/docs/blob/07514ce52845d47fe6bb327f782a865b...
bye, Sumit
Have you considered D-Bus API instead of extending nss? Is it too slow for this use case?
On ma, 30 loka 2017, Pavel Březina wrote:
On 10/25/2017 05:39 PM, Sumit Bose wrote:
Hi,
please find below the design document for the enhanced NSS API which makes e.g. the client side timeouts which where recently refactored available to callers.
A more visual friendly version can be found at: https://pagure.io/fork/sbose/SSSD/docs/blob/07514ce52845d47fe6bb327f782a865b...
bye, Sumit
Have you considered D-Bus API instead of extending nss? Is it too slow for this use case?
We considered using D-Bus API originally when worked on integrating SSSD and plugins in 389-ds for trust to AD feature in freeIPA few years ago. We decided against using D-Bus API that time, too fragile to implement.
Current API proposal is direct evolution of existing code, to avoid breaking things up.
sssd-devel@lists.fedorahosted.org