Hello, Justin!
"I cannot confirm if the link you provided has the correct
steps"
These steps I tested on several other servers. Everything works.
The problem occurred on only one server (I wrote about this earlier).
This is the most strange in this situation.
"I would search for 'mark_offline' in the domain log
file and look just above this to get an idea of what causes the backend to be set offline.
"
Here's what I found in the domain log:
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [fo_set_port_status] (0x0100):
Marking port 389 of server 'msk-dc01.holding.com' as 'not working'
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [ad_user_data_cmp] (0x1000):
Comparing LDAP with LDAP
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [fo_set_port_status] (0x0400):
Marking port 389 of duplicate server 'msk-dc01.holding.com' as 'not
working'
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [ad_user_data_cmp] (0x1000):
Comparing LDAP with LDAP
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [sdap_handle_release] (0x2000):
Trace: sh[0x7f590093da10], connected[1], ops[(nil)], ldap[0x7f5900927d10],
destructor_lock[0], release_memory[0]
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [remove_connection_callback]
(0x4000): Successfully removed connection callback.
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [be_mark_offline] (0x2000): Going
offline!
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [be_mark_offline] (0x2000):
Initialize check_if_online_ptask.
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [be_ptask_create] (0x0400): Periodic
task [Check if online (periodic)] was created
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [be_ptask_schedule] (0x0400): Task
[Check if online (periodic)]: scheduling task 82 seconds from now [1476885227]
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [be_run_offline_cb] (0x0080): Going
offline. Running callbacks.
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [sdap_id_op_connect_done] (0x4000):
notify offline to op #1
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [be_mark_dom_offline] (0x1000):
Marking subdomain
holding.com offline
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [be_mark_subdom_offline] (0x1000):
Marking subdomain
holding.com as inactive
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [ad_subdomains_root_conn_done]
(0x0040): Failed to connect to AD server: [11](Resource temporarily unavailable)
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [get_subdomains_callback] (0x0400):
Backend returned: (1, 11, <NULL>) [Provider is Offline (Have exhausted maximum
number of retries for service)]
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [be_queue_next_request] (0x4000):
Queued request filed successfully.
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [sdap_id_op_connect_done] (0x4000):
notify offline to op #2
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [be_mark_dom_offline] (0x1000):
Marking subdomain
holding.com offline
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [be_mark_subdom_offline] (0x4000):
Subdomain already inactive
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [ad_subdomains_root_conn_done]
(0x0040): Failed to connect to AD server: [11](Resource temporarily unavailable)
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [sdap_id_release_conn_data]
(0x4000): releasing unused connection
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [get_subdomains_callback] (0x0400):
Backend returned: (0, 0, <NULL>) [Success (Success)]
(Wed Oct 19 16:52:25 2016) [sssd[be[ad.holding.com]]] [be_queue_next_request] (0x4000):
Request queue is empty.
Why SSSD is trying to access the root domain controller (
msk-dc01.holding.com) ?
In our case, the root domain controllers have access restrictions. This is a normal
situation for large domains.
In my configuration file sssd.conf explicitly specified domain controllers that SSSD
should use for authorization
[
domain/ad.holding.com]
ad_server =
kom-dc01.ad.holding.com,
kom-dc02.ad.holding.com