On Mon, Jun 10, 2013 at 07:04:51PM +0200, Jakub Hrozek wrote:
On Mon, Jun 10, 2013 at 12:02:10PM +0200, Pavel Březina wrote:
> On 06/10/2013 11:25 AM, Jakub Hrozek wrote:
> >On Mon, Jun 10, 2013 at 10:03:09AM +0200, Pavel Březina wrote:
> >>On 06/09/2013 07:53 PM, Dmitri Pal wrote:
> >>>On 06/09/2013 08:05 AM, Jakub Hrozek wrote:
> >>>>On Mon, Jun 03, 2013 at 10:15:49PM +0200, Pavel Březina wrote:
> >>>>>On 06/03/2013 12:17 AM, Jakub Hrozek wrote:
> >>>>>>On Thu, 2013-05-23 at 11:46 +0200, Pavel Březina wrote:
> >>>>>>>On 05/21/2013 12:06 PM, Jakub Hrozek wrote:
> >>>>>>>>In general it's not clear to me from the code
who is responsible for
> >>>>>>>>freeing the task -- is it the caller, but only on
success? I think this
> >>>>>>>>should always behave the same.
> >>>>>>>After discussion with Jakub we decided that the caller
is responsible
> >>>>>>>for freeing the task.
> >>>>>>>
> >>>>>>>When an internal error occur (unable to create
tevent_timer), the task
> >>>>>>>is now disabled.
> >>>>>>>
> >>>>>>>ERR_STOP_PERIODIC_TASK was removed. The request can
disable the task
> >>>>>>>instead.
> >>>>>>>
> >>>>>>>>be_ptask_schedule(): The DEBUG message uses %d but
passes in a string.
> >>>>>>>Fixed.
> >>>>>>>
> >>>>>>>>[PATCH 2/4] back end: periodical refresh of expired
records API
> >>>>>>>>The "enum be_refresh" name is too generic.
Maybe "enum be_refresh_type"
> >>>>>>>>?
> >>>>>>>Renamed.
> >>>>>>>
> >>>>>>>>In be_refresh_get_names(), can you use
sysdb_attrs_to_list() to gather
> >>>>>>>>the names?
> >>>>>>>Done.
> >>>>>>>
> >>>>>>>>Is it wise to assume that all object have names and
name the getter
> >>>>>>>>get_names()? Some object might not have names at
all, but for instance
> >>>>>>>>UUIDs.. Also in be_refresh_step() you call get_names
but get back a
> >>>>>>>>"dn".
> >>>>>>>Fixed.
> >>>>>>>
> >>>>>>>>Please file tickets to add unit tests for these two
modules and make the
> >>>>>>>>tickets block upstream #1923.
> >>>>>>>https://fedorahosted.org/sssd/ticket/1939 be_ptask
> >>>>>>>https://fedorahosted.org/sssd/ticket/1940 be_refresh
> >>>>>>>
> >>>>>>>>Nitpick:
> >>>>>>>>>+ filter = talloc_asprintf(tmp_ctx,
"(&(%s<=%lld))",
> >>>>>>>>>+
SYSDB_CACHE_EXPIRE, (long long)now);
> >>>>>>>>
^^^
> >>>>>>>>
missing space
> >>>>>>>Fixed.
> >>>>>>>
> >>>>>>>>[PATCH 3/4] back end: add refresh expired records
periodic task
> >>>>>>>>>--- a/src/config/SSSDConfig/__init__.py.in
> >>>>>>>>>+++ b/src/config/SSSDConfig/__init__.py.in
> >>>>>>>>>@@ -125,6 +125,7 @@ option_strings = {
> >>>>>>>>> 'entry_cache_service_timeout' :
_('Entry cache timeout length (seconds)'),
> >>>>>>>>> 'entry_cache_autofs_timeout' :
_('Entry cache timeout length (seconds)'),
> >>>>>>>>> 'entry_cache_sudo_timeout' :
_('Entry cache timeout length (seconds)'),
> >>>>>>>>>+ 'refresh_expired_interval' :
_('How often should expired rules be refreshed in background'),
> >>>>>>>>We should use "objects" or
"entries" instead of rules.
> >>>>>>>Fixed.
> >>>>>>>
> >>>>>>>>Can you explain the "circular dependency"
here? I think you already told
> >>>>>>>>me in person, but I forgot..
> >>>>>>>dp_backend.h defines struct be_ctx.
> >>>>>>>
> >>>>>>>dp_ptask.h uses struct be_ctx.
> >>>>>>>dp_refresh.h uses struct be_ctx.
> >>>>>>>
> >>>>>>>
> >>>>>>> dp_backend.h <---- dp_ptask.h
> >>>>>>> | ^ ^
> >>>>>>> | | |
> >>>>>>> v | |
> >>>>>>> dp_refresh.h -----------
> >>>>>>>
> >>>>>>>data_provider_be.c includes all three of them (it needs
to create ptask
> >>>>>>>for be_refresh and remember it in be_ctx).
> >>>>>>>
> >>>>>>>>>--- a/src/providers/dp_ptask.h
> >>>>>>>>>+++ b/src/providers/dp_ptask.h
> >>>>>>>>>@@ -27,6 +27,9 @@
> >>>>>>>>>
> >>>>>>>>> #include "providers/dp_backend.h"
> >>>>>>>>>
> >>>>>>>>>+/* solve circular dependency */
> >>>>>>>>>+struct be_ctx;
> >>>>>>>>>+
> >>>>>>>>> struct be_ptask;
> >>>>>>>>>
> >>>>>>>>> /**
> >>>>>>>>>diff --git a/src/providers/dp_refresh.h
b/src/providers/dp_refresh.h
> >>>>>>>>>index
54c6aac99866415d4f33f61ac05d487f71b8c49c..d93a932637804dfbf743c395af3f8e640f79c12f 100644
> >>>>>>>>>--- a/src/providers/dp_refresh.h
> >>>>>>>>>+++ b/src/providers/dp_refresh.h
> >>>>>>>>>@@ -27,6 +27,9 @@
> >>>>>>>>> #include "providers/dp_backend.h"
> >>>>>>>>> #include "providers/dp_ptask.h"
> >>>>>>>>>
> >>>>>>>>>+/* solve circular dependency */
> >>>>>>>>>+struct be_ctx;
> >>>>>>>>>+
> >>>>>>>>>>[PATCH 4/4] providers: refresh expired
netgroups
> >>>>>>>>>>I don't see one important part
implemented -- what if all netgroups
> >>>>>>>>>>expire at once, then they are all refreshed
at once, right? Can we add
> >>>>>>>>>>some throttling and refresh them in batches
with some delay to avoid
> >>>>>>>>>>starving the back end?
> >>>>>>>>>I thought we agreed in person that we will first
push a basic stupid
> >>>>>>>>>solution and than fine tune it if necessary -
hopefully based on real
> >>>>>>>>>environment.
> >>>>>>>>OK, I thought there was going to be some proof of
concept that we would
> >>>>>>>>just amend based on testing. Please file a ticket
about this, we'll get
> >>>>>>>>the throttling implemented post-beta.
> >>>>>>>https://fedorahosted.org/sssd/ticket/1941 throttling
> >>>>>>>
> >>>>>>>https://fedorahosted.org/sssd/ticket/1942 convert
enumeration
> >>>>>>>https://fedorahosted.org/sssd/ticket/1943 convert sudo
> >>>>>>>https://fedorahosted.org/sssd/ticket/1944 convert
dyndns
> >>>>>>>
> >>>>>>>Also fix make check problems.
> >>>>>>Thank you, I like the new code. I re-read the patches again
and actually
> >>>>>>tested them quite a lot this time.
> >>>>>>
> >>>>>>Here are some more comments:
> >>>>>>
> >>>>>>[PATCH 1/4] back end: periodic task API
> >>>>>>Here I only have minor comments. I think it would be
defensive to add a
> >>>>>>NULL check for "task" to be_ptask_online_cb() and
maybe even
> >>>>>>be_ptask_destructor(). The reason is that the task is now
(correctly)
> >>>>>>owned by the called but especially the online callback might
be called
> >>>>>>based on external event.
> >>>>>Done.
> >>>>>
> >>>>>>In be_ptask_create() I think assigning to "_task"
should be moved to
> >>>>>>when "EOK" is assigned to "ret" to only
modify external variable when
> >>>>>>the function succeeds.
> >>>>>Of course.
> >>>>>Done.
> >>>>>
> >>>>>>[PATCH 2/4] back end: periodical refresh of expired records
API
> >>>>>>In be_refresh_add_cb() I agree with checking the parameters
as it's
> >>>>>>quite a generic API, but you could also check "enum
be_refresh_type
> >>>>>>type" for being lesser than BE_REFRESH_TYPE_SENTINEL
and return EINVAL
> >>>>>>(or a new error code) otherwise.
> >>>>>Done.
> >>>>>
> >>>>>>I wonder if it would be cleaner to move the
"talloc_steal(subreq,
> >>>>>>values);" after creating the subreq and before setting
the callback to
> >>>>>>make sure the values are always owned by subreq? It probably
doesn't
> >>>>>>matter that much from functional point of view as values is
otherwise
> >>>>>>owned by state, but it would make the code more consistent
to me.
> >>>>>Done.
> >>>>>
> >>>>>>I think that EAGAIN is mishandled in be_refresh_done(). The
done label
> >>>>>>errors out in any case, but EAGAIN would be returned in case
there was
> >>>>>>more entities to return:
> >>>>>Thanks. Done.
> >>>>>
> >>>>>>[PATCH 3/4] back end: add refresh expired records periodic
task
> >>>>>>The patch looks good to me, but I'm wondering about the
defaults of
> >>>>>>be_ptask_create. I would suggest making them shorter, maybe
even half of
> >>>>>>the current values.
> >>>>>OK. First delay is now 30, enable delay is 5.
> >>>>>
> >>>>>>[PATCH 4/4] providers: refresh expired netgroups
> >>>>>>EAGAIN is mishandled in this patch, too. I did the following
change to
> >>>>>>make the refresh work correctly:
> >>>>>Fixed.
> >>>>>
> >>>>>>I also need to take another look at the responder caching,
because in my
> >>>>>>tests with about 1000 netgroups, I was still getting the
data provider
> >>>>>>contacted even if I set relatively long netgroup timeout and
short
> >>>>>>background refresh. But with the above changes, the
background refresh
> >>>>>>seems to have been working fine.
> >>>>>Did the background refresh finished before you tried to log in
(or whatever
> >>>>>test did you performed)?
> >>>>>
> >>>>I created a very complex netgroups structure and ran getent
netgroup
> >>>><top-netgroup>. This is exactly what the customer was doing.
> >>>>
> >>>>Sometimes the entries were returned from cache, sometimes I hit the
> >>>>midpoint refresh, so after a couple of tries, the expiry timestamps
were
> >>>>different for different netgroups. Then after a couple of tries, I
hit a
> >>>>window where the entry was already expired but the task didn't
pick it
> >>>>up yet.
> >>>>
> >>>>So here is a proposal -- instead of refreshing expired entries of
> >>>>certain objectclass, what about refresh all entries of that
objectclass
> >>>>periodically regardless of their expiration time? The timeouts could
be
> >>>>configured using the expiry timeouts anyway.
> >>>>
> >>>>Alternatively we could make the task configurable (maybe with
another
> >>>>option or with a sane default) so that it picks entries N percent
into their
> >>>>expiration timeout so that we can be sure noone ever hits an
expired
> >>>>entry. But this sounds like v 2.0 of this feature to me.
> >>>>
> >>>>What do you think?
> >>>
> >>>May be have a flag netgroups_refresh_mode that would force to always
> >>>return whatever is available and force the refresh in the background.
> >>>So here is how it would work:
> >>>netgroups_refresh_mode = fetch | nowait | background
> >>>
> >>>fetch - if you come across an expired entry fetch an update before
> >>>returning data (current behavior)
> >>>nowait - return whatever is there and rely on the refresh job to
refresh
> >>>the data - this can return stale data
> >>>background - if you come across the expired data return it immediately
> >>>but expedite the refresh - this can return stale data too.
> >>
> >>Hi,
> >>I don't think creating more options is necessary. How about refreshing
those
> >>records that are expired or are about to expire before next refresh is
> >>triggered?
> >>
> >>This is only simple filter change and should guarantee that netgroups will
> >>never expire (if period is lower than expiration time) but also keeps
> >>traffic distribution over time.
> >
> >I tested this locally as a modification of Pavel's patches and I confirm
> >this modification had the desired effect in my testing. After I switched
> >off midpoint refresh completely to rely on the background refresh task
> >only, I saw no more delay in requesting the netgroups.
> >
> >Can you send the modified version Pavel? I think it will be good to go.
>
> Here is rebased + modified version.
>
> To make the review easier I'm also attaching the diff patch. The changes
> were squashed into the first and the forth patch.
Thank you, these patches work for me.
ACK.
Pushed to master.
Can you also prepare a 1.9 version of these patches?