On Mon, Jun 10, 2013 at 10:03:09AM +0200, Pavel Březina wrote:
On 06/09/2013 07:53 PM, Dmitri Pal wrote:
>On 06/09/2013 08:05 AM, Jakub Hrozek wrote:
>>On Mon, Jun 03, 2013 at 10:15:49PM +0200, Pavel Březina wrote:
>>>On 06/03/2013 12:17 AM, Jakub Hrozek wrote:
>>>>On Thu, 2013-05-23 at 11:46 +0200, Pavel Březina wrote:
>>>>>On 05/21/2013 12:06 PM, Jakub Hrozek wrote:
>>>>>>In general it's not clear to me from the code who is
responsible for
>>>>>>freeing the task -- is it the caller, but only on success? I
think this
>>>>>>should always behave the same.
>>>>>After discussion with Jakub we decided that the caller is
responsible
>>>>>for freeing the task.
>>>>>
>>>>>When an internal error occur (unable to create tevent_timer), the
task
>>>>>is now disabled.
>>>>>
>>>>>ERR_STOP_PERIODIC_TASK was removed. The request can disable the task
>>>>>instead.
>>>>>
>>>>>>be_ptask_schedule(): The DEBUG message uses %d but passes in a
string.
>>>>>Fixed.
>>>>>
>>>>>>[PATCH 2/4] back end: periodical refresh of expired records API
>>>>>>The "enum be_refresh" name is too generic. Maybe
"enum be_refresh_type"
>>>>>>?
>>>>>Renamed.
>>>>>
>>>>>>In be_refresh_get_names(), can you use sysdb_attrs_to_list() to
gather
>>>>>>the names?
>>>>>Done.
>>>>>
>>>>>>Is it wise to assume that all object have names and name the
getter
>>>>>>get_names()? Some object might not have names at all, but for
instance
>>>>>>UUIDs.. Also in be_refresh_step() you call get_names but get back
a
>>>>>>"dn".
>>>>>Fixed.
>>>>>
>>>>>>Please file tickets to add unit tests for these two modules and
make the
>>>>>>tickets block upstream #1923.
>>>>>https://fedorahosted.org/sssd/ticket/1939 be_ptask
>>>>>https://fedorahosted.org/sssd/ticket/1940 be_refresh
>>>>>
>>>>>>Nitpick:
>>>>>>>+ filter = talloc_asprintf(tmp_ctx,
"(&(%s<=%lld))",
>>>>>>>+ SYSDB_CACHE_EXPIRE, (long
long)now);
>>>>>>
^^^
>>>>>> missing
space
>>>>>Fixed.
>>>>>
>>>>>>[PATCH 3/4] back end: add refresh expired records periodic task
>>>>>>>--- a/src/config/SSSDConfig/__init__.py.in
>>>>>>>+++ b/src/config/SSSDConfig/__init__.py.in
>>>>>>>@@ -125,6 +125,7 @@ option_strings = {
>>>>>>> 'entry_cache_service_timeout' : _('Entry
cache timeout length (seconds)'),
>>>>>>> 'entry_cache_autofs_timeout' : _('Entry
cache timeout length (seconds)'),
>>>>>>> 'entry_cache_sudo_timeout' : _('Entry
cache timeout length (seconds)'),
>>>>>>>+ 'refresh_expired_interval' : _('How often
should expired rules be refreshed in background'),
>>>>>>We should use "objects" or "entries" instead
of rules.
>>>>>Fixed.
>>>>>
>>>>>>Can you explain the "circular dependency" here? I think
you already told
>>>>>>me in person, but I forgot..
>>>>>dp_backend.h defines struct be_ctx.
>>>>>
>>>>>dp_ptask.h uses struct be_ctx.
>>>>>dp_refresh.h uses struct be_ctx.
>>>>>
>>>>>
>>>>> dp_backend.h <---- dp_ptask.h
>>>>> | ^ ^
>>>>> | | |
>>>>> v | |
>>>>> dp_refresh.h -----------
>>>>>
>>>>>data_provider_be.c includes all three of them (it needs to create
ptask
>>>>>for be_refresh and remember it in be_ctx).
>>>>>
>>>>>>>--- a/src/providers/dp_ptask.h
>>>>>>>+++ b/src/providers/dp_ptask.h
>>>>>>>@@ -27,6 +27,9 @@
>>>>>>>
>>>>>>> #include "providers/dp_backend.h"
>>>>>>>
>>>>>>>+/* solve circular dependency */
>>>>>>>+struct be_ctx;
>>>>>>>+
>>>>>>> struct be_ptask;
>>>>>>>
>>>>>>> /**
>>>>>>>diff --git a/src/providers/dp_refresh.h
b/src/providers/dp_refresh.h
>>>>>>>index
54c6aac99866415d4f33f61ac05d487f71b8c49c..d93a932637804dfbf743c395af3f8e640f79c12f 100644
>>>>>>>--- a/src/providers/dp_refresh.h
>>>>>>>+++ b/src/providers/dp_refresh.h
>>>>>>>@@ -27,6 +27,9 @@
>>>>>>> #include "providers/dp_backend.h"
>>>>>>> #include "providers/dp_ptask.h"
>>>>>>>
>>>>>>>+/* solve circular dependency */
>>>>>>>+struct be_ctx;
>>>>>>>+
>>>>>>>>[PATCH 4/4] providers: refresh expired netgroups
>>>>>>>>I don't see one important part implemented -- what if
all netgroups
>>>>>>>>expire at once, then they are all refreshed at once,
right? Can we add
>>>>>>>>some throttling and refresh them in batches with some
delay to avoid
>>>>>>>>starving the back end?
>>>>>>>I thought we agreed in person that we will first push a basic
stupid
>>>>>>>solution and than fine tune it if necessary - hopefully based
on real
>>>>>>>environment.
>>>>>>OK, I thought there was going to be some proof of concept that we
would
>>>>>>just amend based on testing. Please file a ticket about this,
we'll get
>>>>>>the throttling implemented post-beta.
>>>>>https://fedorahosted.org/sssd/ticket/1941 throttling
>>>>>
>>>>>https://fedorahosted.org/sssd/ticket/1942 convert enumeration
>>>>>https://fedorahosted.org/sssd/ticket/1943 convert sudo
>>>>>https://fedorahosted.org/sssd/ticket/1944 convert dyndns
>>>>>
>>>>>Also fix make check problems.
>>>>Thank you, I like the new code. I re-read the patches again and actually
>>>>tested them quite a lot this time.
>>>>
>>>>Here are some more comments:
>>>>
>>>>[PATCH 1/4] back end: periodic task API
>>>>Here I only have minor comments. I think it would be defensive to add a
>>>>NULL check for "task" to be_ptask_online_cb() and maybe even
>>>>be_ptask_destructor(). The reason is that the task is now (correctly)
>>>>owned by the called but especially the online callback might be called
>>>>based on external event.
>>>Done.
>>>
>>>>In be_ptask_create() I think assigning to "_task" should be
moved to
>>>>when "EOK" is assigned to "ret" to only modify
external variable when
>>>>the function succeeds.
>>>Of course.
>>>Done.
>>>
>>>>[PATCH 2/4] back end: periodical refresh of expired records API
>>>>In be_refresh_add_cb() I agree with checking the parameters as it's
>>>>quite a generic API, but you could also check "enum be_refresh_type
>>>>type" for being lesser than BE_REFRESH_TYPE_SENTINEL and return
EINVAL
>>>>(or a new error code) otherwise.
>>>Done.
>>>
>>>>I wonder if it would be cleaner to move the "talloc_steal(subreq,
>>>>values);" after creating the subreq and before setting the callback
to
>>>>make sure the values are always owned by subreq? It probably doesn't
>>>>matter that much from functional point of view as values is otherwise
>>>>owned by state, but it would make the code more consistent to me.
>>>Done.
>>>
>>>>I think that EAGAIN is mishandled in be_refresh_done(). The done label
>>>>errors out in any case, but EAGAIN would be returned in case there was
>>>>more entities to return:
>>>Thanks. Done.
>>>
>>>>[PATCH 3/4] back end: add refresh expired records periodic task
>>>>The patch looks good to me, but I'm wondering about the defaults of
>>>>be_ptask_create. I would suggest making them shorter, maybe even half of
>>>>the current values.
>>>OK. First delay is now 30, enable delay is 5.
>>>
>>>>[PATCH 4/4] providers: refresh expired netgroups
>>>>EAGAIN is mishandled in this patch, too. I did the following change to
>>>>make the refresh work correctly:
>>>Fixed.
>>>
>>>>I also need to take another look at the responder caching, because in my
>>>>tests with about 1000 netgroups, I was still getting the data provider
>>>>contacted even if I set relatively long netgroup timeout and short
>>>>background refresh. But with the above changes, the background refresh
>>>>seems to have been working fine.
>>>Did the background refresh finished before you tried to log in (or whatever
>>>test did you performed)?
>>>
>>I created a very complex netgroups structure and ran getent netgroup
>><top-netgroup>. This is exactly what the customer was doing.
>>
>>Sometimes the entries were returned from cache, sometimes I hit the
>>midpoint refresh, so after a couple of tries, the expiry timestamps were
>>different for different netgroups. Then after a couple of tries, I hit a
>>window where the entry was already expired but the task didn't pick it
>>up yet.
>>
>>So here is a proposal -- instead of refreshing expired entries of
>>certain objectclass, what about refresh all entries of that objectclass
>>periodically regardless of their expiration time? The timeouts could be
>>configured using the expiry timeouts anyway.
>>
>>Alternatively we could make the task configurable (maybe with another
>>option or with a sane default) so that it picks entries N percent into their
>>expiration timeout so that we can be sure noone ever hits an expired
>>entry. But this sounds like v 2.0 of this feature to me.
>>
>>What do you think?
>
>May be have a flag netgroups_refresh_mode that would force to always
>return whatever is available and force the refresh in the background.
>So here is how it would work:
>netgroups_refresh_mode = fetch | nowait | background
>
>fetch - if you come across an expired entry fetch an update before
>returning data (current behavior)
>nowait - return whatever is there and rely on the refresh job to refresh
>the data - this can return stale data
>background - if you come across the expired data return it immediately
>but expedite the refresh - this can return stale data too.
Hi,
I don't think creating more options is necessary. How about refreshing those
records that are expired or are about to expire before next refresh is
triggered?
This is only simple filter change and should guarantee that netgroups will
never expire (if period is lower than expiration time) but also keeps
traffic distribution over time.
I tested this locally as a modification of Pavel's patches and I confirm
this modification had the desired effect in my testing. After I switched
off midpoint refresh completely to rely on the background refresh task
only, I saw no more delay in requesting the netgroups.
Can you send the modified version Pavel? I think it will be good to go.