We were incorrectly using DBUS_ERROR_TIMEOUT here. The correct
behaviour is to check for DBUS_ERROR_NO_REPLY. This way we will
properly handle the three-tries in the tasks_check_handler().
D-BUS is rather confusing with these error codes.
DBUS_ERROR_NO_REPLY: No reply to a message expecting one, usually means
a timeout occurred.
DBUS_ERROR_TIMEOUT: Certain timeout errors, possibly ETIMEDOUT on a
socket.
And just for added confusion, there's also:
DBUS_ERROR_TIMED_OUT: Certain timeout errors, e.g. while starting a
service.
DBUS_ERROR_NO_REPLY is the only correct one for our usage. This explains
the intermittent bug we were seeing where the monitor lost communication
with its services (usually the data providers). Because of this loss of
communication, the monitor was unable to notify the providers of changes
to the routing table or resolv.conf, leading to being stuck offline
until SSSD was restarted.
This is probably the root cause of
https://bugzilla.redhat.com/show_bug.cgi?id=728343