Re: [SSSD] [PATCH] back end: periodic task API + refresh of expired records

Monday, 10 June 2013

On Mon, Jun 10, 2013 at 12:02:10PM +0200, Pavel Březina wrote:
...
 On 06/10/2013 11:25 AM, Jakub Hrozek wrote:
 >On Mon, Jun 10, 2013 at 10:03:09AM +0200, Pavel Březina wrote:
 >>On 06/09/2013 07:53 PM, Dmitri Pal wrote:
 >>>On 06/09/2013 08:05 AM, Jakub Hrozek wrote:
 >>>>On Mon, Jun 03, 2013 at 10:15:49PM +0200, Pavel Březina wrote:
 >>>>>On 06/03/2013 12:17 AM, Jakub Hrozek wrote:
 >>>>>>On Thu, 2013-05-23 at 11:46 +0200, Pavel Březina wrote:
 >>>>>>>On 05/21/2013 12:06 PM, Jakub Hrozek wrote:
 >>>>>>>>In general it's not clear to me from the code who is
responsible for
 >>>>>>>>freeing the task -- is it the caller, but only on
success? I think this
 >>>>>>>>should always behave the same.
 >>>>>>>After discussion with Jakub we decided that the caller is
responsible
 >>>>>>>for freeing the task.
 >>>>>>>
 >>>>>>>When an internal error occur (unable to create tevent_timer),
the task
 >>>>>>>is now disabled.
 >>>>>>>
 >>>>>>>ERR_STOP_PERIODIC_TASK was removed. The request can disable
the task
 >>>>>>>instead.
 >>>>>>>
 >>>>>>>>be_ptask_schedule(): The DEBUG message uses %d but passes
in a string.
 >>>>>>>Fixed.
 >>>>>>>
 >>>>>>>>[PATCH 2/4] back end: periodical refresh of expired
records API
 >>>>>>>>The "enum be_refresh" name is too generic.
Maybe "enum be_refresh_type"
 >>>>>>>>?
 >>>>>>>Renamed.
 >>>>>>>
 >>>>>>>>In be_refresh_get_names(), can you use
sysdb_attrs_to_list() to gather
 >>>>>>>>the names?
 >>>>>>>Done.
 >>>>>>>
 >>>>>>>>Is it wise to assume that all object have names and name
the getter
 >>>>>>>>get_names()? Some object might not have names at all, but
for instance
 >>>>>>>>UUIDs.. Also in be_refresh_step() you call get_names but
get back a
 >>>>>>>>"dn".
 >>>>>>>Fixed.
 >>>>>>>
 >>>>>>>>Please file tickets to add unit tests for these two
modules and make the
 >>>>>>>>tickets block upstream #1923.
 >>>>>>>https://fedorahosted.org/sssd/ticket/1939 be_ptask
 >>>>>>>https://fedorahosted.org/sssd/ticket/1940 be_refresh
 >>>>>>>
 >>>>>>>>Nitpick:
 >>>>>>>>>+    filter = talloc_asprintf(tmp_ctx,
"(&(%s<=%lld))",
 >>>>>>>>>+                             SYSDB_CACHE_EXPIRE,
(long long)now);
 >>>>>>>>                                                         
    ^^^
 >>>>>>>>                                                         
missing space
 >>>>>>>Fixed.
 >>>>>>>
 >>>>>>>>[PATCH 3/4] back end: add refresh expired records
periodic task
 >>>>>>>>>--- a/src/config/SSSDConfig/__init__.py.in
 >>>>>>>>>+++ b/src/config/SSSDConfig/__init__.py.in
 >>>>>>>>>@@ -125,6 +125,7 @@ option_strings = {
 >>>>>>>>>       'entry_cache_service_timeout' :
_('Entry cache timeout length (seconds)'),
 >>>>>>>>>       'entry_cache_autofs_timeout' :
_('Entry cache timeout length (seconds)'),
 >>>>>>>>>       'entry_cache_sudo_timeout' :
_('Entry cache timeout length (seconds)'),
 >>>>>>>>>+    'refresh_expired_interval' : _('How
often should expired rules be refreshed in background'),
 >>>>>>>>We should use "objects" or "entries"
instead of rules.
 >>>>>>>Fixed.
 >>>>>>>
 >>>>>>>>Can you explain the "circular dependency" here?
I think you already told
 >>>>>>>>me in person, but I forgot..
 >>>>>>>dp_backend.h defines struct be_ctx.
 >>>>>>>
 >>>>>>>dp_ptask.h uses struct be_ctx.
 >>>>>>>dp_refresh.h uses struct be_ctx.
 >>>>>>>
 >>>>>>>
 >>>>>>>      dp_backend.h <---- dp_ptask.h
 >>>>>>>         | ^                ^
 >>>>>>>         | |                |
 >>>>>>>         v |                |
 >>>>>>>     dp_refresh.h -----------
 >>>>>>>
 >>>>>>>data_provider_be.c includes all three of them (it needs to
create ptask
 >>>>>>>for be_refresh and remember it in be_ctx).
 >>>>>>>
 >>>>>>>>>--- a/src/providers/dp_ptask.h
 >>>>>>>>>+++ b/src/providers/dp_ptask.h
 >>>>>>>>>@@ -27,6 +27,9 @@
 >>>>>>>>>
 >>>>>>>>>   #include "providers/dp_backend.h"
 >>>>>>>>>
 >>>>>>>>>+/* solve circular dependency */
 >>>>>>>>>+struct be_ctx;
 >>>>>>>>>+
 >>>>>>>>>   struct be_ptask;
 >>>>>>>>>
 >>>>>>>>>   /**
 >>>>>>>>>diff --git a/src/providers/dp_refresh.h
b/src/providers/dp_refresh.h
 >>>>>>>>>index
54c6aac99866415d4f33f61ac05d487f71b8c49c..d93a932637804dfbf743c395af3f8e640f79c12f 100644
 >>>>>>>>>--- a/src/providers/dp_refresh.h
 >>>>>>>>>+++ b/src/providers/dp_refresh.h
 >>>>>>>>>@@ -27,6 +27,9 @@
 >>>>>>>>>   #include "providers/dp_backend.h"
 >>>>>>>>>   #include "providers/dp_ptask.h"
 >>>>>>>>>
 >>>>>>>>>+/* solve circular dependency */
 >>>>>>>>>+struct be_ctx;
 >>>>>>>>>+
 >>>>>>>>>>[PATCH 4/4] providers: refresh expired netgroups
 >>>>>>>>>>I don't see one important part implemented --
what if all netgroups
 >>>>>>>>>>expire at once, then they are all refreshed at
once, right? Can we add
 >>>>>>>>>>some throttling and refresh them in batches with
some delay to avoid
 >>>>>>>>>>starving the back end?
 >>>>>>>>>I thought we agreed in person that we will first push
a basic stupid
 >>>>>>>>>solution and than fine tune it if necessary -
hopefully based on real
 >>>>>>>>>environment.
 >>>>>>>>OK, I thought there was going to be some proof of concept
that we would
 >>>>>>>>just amend based on testing. Please file a ticket about
this, we'll get
 >>>>>>>>the throttling implemented post-beta.
 >>>>>>>https://fedorahosted.org/sssd/ticket/1941 throttling
 >>>>>>>
 >>>>>>>https://fedorahosted.org/sssd/ticket/1942 convert
enumeration
 >>>>>>>https://fedorahosted.org/sssd/ticket/1943 convert sudo
 >>>>>>>https://fedorahosted.org/sssd/ticket/1944 convert dyndns
 >>>>>>>
 >>>>>>>Also fix make check problems.
 >>>>>>Thank you, I like the new code. I re-read the patches again and
actually
 >>>>>>tested them quite a lot this time.
 >>>>>>
 >>>>>>Here are some more comments:
 >>>>>>
 >>>>>>[PATCH 1/4] back end: periodic task API
 >>>>>>Here I only have minor comments. I think it would be defensive to
add a
 >>>>>>NULL check for "task" to be_ptask_online_cb() and maybe
even
 >>>>>>be_ptask_destructor(). The reason is that the task is now
(correctly)
 >>>>>>owned by the called but especially the online callback might be
called
 >>>>>>based on external event.
 >>>>>Done.
 >>>>>
 >>>>>>In be_ptask_create() I think assigning to "_task"
should be moved to
 >>>>>>when "EOK" is assigned to "ret" to only
modify external variable when
 >>>>>>the function succeeds.
 >>>>>Of course.
 >>>>>Done.
 >>>>>
 >>>>>>[PATCH 2/4] back end: periodical refresh of expired records API
 >>>>>>In be_refresh_add_cb() I agree with checking the parameters as
it's
 >>>>>>quite a generic API, but you could also check "enum
be_refresh_type
 >>>>>>type" for being lesser than BE_REFRESH_TYPE_SENTINEL and
return EINVAL
 >>>>>>(or a new error code) otherwise.
 >>>>>Done.
 >>>>>
 >>>>>>I wonder if it would be cleaner to move the
"talloc_steal(subreq,
 >>>>>>values);" after creating the subreq and before setting the
callback to
 >>>>>>make sure the values are always owned by subreq? It probably
doesn't
 >>>>>>matter that much from functional point of view as values is
otherwise
 >>>>>>owned by state, but it would make the code more consistent to
me.
 >>>>>Done.
 >>>>>
 >>>>>>I think that EAGAIN is mishandled in be_refresh_done(). The done
label
 >>>>>>errors out in any case, but EAGAIN would be returned in case
there was
 >>>>>>more entities to return:
 >>>>>Thanks. Done.
 >>>>>
 >>>>>>[PATCH 3/4] back end: add refresh expired records periodic task
 >>>>>>The patch looks good to me, but I'm wondering about the
defaults of
 >>>>>>be_ptask_create. I would suggest making them shorter, maybe even
half of
 >>>>>>the current values.
 >>>>>OK. First delay is now 30, enable delay is 5.
 >>>>>
 >>>>>>[PATCH 4/4] providers: refresh expired netgroups
 >>>>>>EAGAIN is mishandled in this patch, too. I did the following
change to
 >>>>>>make the refresh work correctly:
 >>>>>Fixed.
 >>>>>
 >>>>>>I also need to take another look at the responder caching,
because in my
 >>>>>>tests with about 1000 netgroups, I was still getting the data
provider
 >>>>>>contacted even if I set relatively long netgroup timeout and
short
 >>>>>>background refresh. But with the above changes, the background
refresh
 >>>>>>seems to have been working fine.
 >>>>>Did the background refresh finished before you tried to log in (or
whatever
 >>>>>test did you performed)?
 >>>>>
 >>>>I created a very complex netgroups structure and ran getent netgroup
 >>>><top-netgroup>. This is exactly what the customer was doing.
 >>>>
 >>>>Sometimes the entries were returned from cache, sometimes I hit the
 >>>>midpoint refresh, so after a couple of tries, the expiry timestamps were
 >>>>different for different netgroups. Then after a couple of tries, I hit a
 >>>>window where the entry was already expired but the task didn't pick
it
 >>>>up yet.
 >>>>
 >>>>So here is a proposal -- instead of refreshing expired entries of
 >>>>certain objectclass, what about refresh all entries of that objectclass
 >>>>periodically regardless of their expiration time? The timeouts could be
 >>>>configured using the expiry timeouts anyway.
 >>>>
 >>>>Alternatively we could make the task configurable (maybe with another
 >>>>option or with a sane default) so that it picks entries N percent into
their
 >>>>expiration timeout so that we can be sure noone ever hits an expired
 >>>>entry. But this sounds like v 2.0 of this feature to me.
 >>>>
 >>>>What do you think?
 >>>
 >>>May be have a flag netgroups_refresh_mode that would force to always
 >>>return whatever is available and force the refresh in the background.
 >>>So here is how it would work:
 >>>netgroups_refresh_mode = fetch | nowait | background
 >>>
 >>>fetch - if you come across an expired entry fetch an update before
 >>>returning data (current behavior)
 >>>nowait - return whatever is there and rely on the refresh job to refresh
 >>>the data - this can return stale data
 >>>background - if you come across the expired data return it immediately
 >>>but expedite the refresh - this can return stale data too.
 >>
 >>Hi,
 >>I don't think creating more options is necessary. How about refreshing those
 >>records that are expired or are about to expire before next refresh is
 >>triggered?
 >>
 >>This is only simple filter change and should guarantee that netgroups will
 >>never expire (if period is lower than expiration time) but also keeps
 >>traffic distribution over time.
 >
 >I tested this locally as a modification of Pavel's patches and I confirm
 >this modification had the desired effect in my testing. After I switched
 >off midpoint refresh completely to rely on the background refresh task
 >only, I saw no more delay in requesting the netgroups.
 >
 >Can you send the modified version Pavel? I think it will be good to go.

 Here is rebased + modified version.

 To make the review easier I'm also attaching the diff patch. The changes
 were squashed into the first and the forth patch. 
Thank you, these patches work for me.

ACK.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [SSSD] [PATCH] back end: periodic task API + refresh of expired records