On Mon, Oct 10, 2016 at 10:04:30AM +0200, Victor Tapia wrote:
Hi list,
I've faced a race condition when SSSD boots in a machine with a big clock drift. This is what I see:
- SSSD starts before the network is up, queries the LDAP server without
success and sets a retry timer (~60 secs) 2. NTP starts and corrects the clock, 1 hour back for example. 3. SSSD takes ~60 secs + the drift correction (1 hour) to retry the connection.
In this particular scenario the credentials cache is disabled, so the wait time to login is noticeable. How feasible would it be to use a monotonic clock for this kind of timed events?
I really have not tried this and I guess I don't know tevent internals well enough if this works, but I wonder if just using: clock_gettime() and constructing struct timeval in place of: tevent_timeval_current_ofs() could solve this particular issue.
On the other hand, this is a pattern we use in SSSD all through the code for timed events and we're just not well equipped to handle time drifts. Did you investigate why doesn't sssd detect the networking change from libnl messages or from resolv.conf being touched?