Guest OS: Ubuntu 16.04.4 LTS (kernel versions 4.4.0-108 to current 116)

Virtualization env: VMWare ESXi 6.0

Host hardware: Dell R720


Using SSSD to bind linux servers to the AD domain for authentication. This was working fine right up to 4.4.0-104. After the update to -108,-109,-112, or -116, if sssd is enabled OR if it is disabled but then started after a successful boot and you perform a lookup (i.e. id some_domain_user), the entire system will freeze, and you have to force a reboot. There's even a blip in the syslog when it happens.


I increased loglevel to 9 in the sssd.conf file in all the sections, and then started SSSD and tried to lookup a user. The most meaningful things I've been able to pull out are:

sssd.log:

(Fri Mar 16 13:06:03 2018) [sssd] [service_send_ping] (0x2000): Pinging ad.domain.com
(Fri Mar 16 13:06:03 2018) [sssd] [sbus_add_timeout] (0x2000): 0x88a9d0
(Fri Mar 16 13:06:03 2018) [sssd] [service_send_ping] (0x2000): Pinging nss
(Fri Mar 16 13:06:03 2018) [sssd] [sbus_add_timeout] (0x2000): 0x8904c0
(Fri Mar 16 13:06:03 2018) [sssd] [service_send_ping] (0x2000): Pinging pam
(Fri Mar 16 13:06:03 2018) [sssd] [sbus_add_timeout] (0x2000): 0x88ede0
(Fri Mar 16 13:06:03 2018) [sssd] [service_send_ping] (0x2000): Pinging ssh
(Fri Mar 16 13:06:03 2018) [sssd] [sbus_add_timeout] (0x2000): 0x889710
(Fri Mar 16 13:06:03 2018) [sssd] [sbus_remove_timeout] (0x2000): 0x88a9d0
(Fri Mar 16 13:06:03 2018) [sssd] [sbus_dispatch] (0x4000): dbus conn: 0x888c10
(Fri Mar 16 13:06:03 2018) [sssd] [sbus_dispatch] (0x4000): Dispatching.
(Fri Mar 16 13:06:03 2018) [sssd] [ping_check] (0x2000): Service ad.domain.com replied to ping
(Fri Mar 16 13:06:03 2018) [sssd] [sbus_remove_timeout] (0x2000): 0x88ede0
(Fri Mar 16 13:06:03 2018) [sssd] [sbus_dispatch] (0x4000): dbus conn: 0x890e00
(Fri Mar 16 13:06:03 2018) [sssd] [sbus_dispatch] (0x4000): Dispatching.
(Fri Mar 16 13:06:03 2018) [sssd] [ping_check] (0x2000): Service pam replied to ping
(Fri Mar 16 13:06:03 2018) [sssd] [sbus_remove_timeout] (0x2000): 0x889710
(Fri Mar 16 13:06:03 2018) [sssd] [sbus_dispatch] (0x4000): dbus conn: 0x88d5f0
(Fri Mar 16 13:06:03 2018) [sssd] [sbus_dispatch] (0x4000): Dispatching.
(Fri Mar 16 13:06:03 2018) [sssd] [ping_check] (0x2000): Service ssh replied to ping
(Fri Mar 16 13:06:03 2018) [sssd] [sbus_remove_timeout] (0x2000): 0x8904c0
(Fri Mar 16 13:06:03 2018) [sssd] [sbus_dispatch] (0x4000): dbus conn: 0x88eae0
(Fri Mar 16 13:06:03 2018) [sssd] [sbus_dispatch] (0x4000): Dispatching.
(Fri Mar 16 13:06:03 2018) [sssd] [ping_check] (0x2000): Service nss replied to ping

sssd_nss.log:

(Fri Mar 16 13:05:06 2018) [sssd[nss]] [id_callback] (0x0010): The Monitor returned an error [org.freedesktop.DBus.Error.NoReply]

sssd_ad.domain.com.log:

(Fri Mar 16 13:05:13 2018) [sssd[be[ad.domain.com]]] [sbus_message_handler] (0x2000): Received SBUS method org.freedesktop.sssd.service.ping on path /org/freedesktop/sssd/service
(Fri Mar 16 13:05:13 2018) [sssd[be[ad.domain.com]]] [sbus_get_sender_id_send] (0x2000): Not a sysbus message, quit
(Fri Mar 16 13:05:23 2018) [sssd[be[ad.domain.com]]] [sbus_dispatch] (0x4000): dbus conn: 0xf1e840
(Fri Mar 16 13:05:23 2018) [sssd[be[ad.domain.com]]] [sbus_dispatch] (0x4000): Dispatching.

sssd_ssh.log:

(Fri Mar 16 13:06:53 2018) [sssd[ssh]] [sbus_dispatch] (0x4000): dbus conn: 0x1817880
(Fri Mar 16 13:06:53 2018) [sssd[ssh]] [sbus_dispatch] (0x4000): Dispatching.
(Fri Mar 16 13:06:53 2018) [sssd[ssh]] [sbus_message_handler] (0x2000): Received SBUS method org.freedesktop.sssd.service.ping on path /org/freedesktop/sssd/service
(Fri Mar 16 13:06:53 2018) [sssd[ssh]] [sbus_get_sender_id_send] (0x2000): Not a sysbus message, quit


sssd.conf:

[sssd]
domains = ad.domain.com
config_file_version = 2
services = nss, pam, ssh
# default_domain_suffix = ad.domain.com

[ssh]

[domain/ad.domain.com]
ad_domain = ad.domain.com
krb5_realm = AD.DOMAIN.COM
realmd_tags = manages-system joined-with-samba
cache_credentials = True
id_provider = ad
krb5_store_password_if_offline = True
default_shell = /bin/bash
ldap_id_mapping = True
fallback_homedir = /home/%d/%u
access_provider = ad
# use_fully_qualified_names = True
ad_gpo_access_control = permissive
ad_access_filter = ad.domain.com:(memberOf=CN=Information Technology,CN=Users,DC=ad,DC=domain,DC=com)
ldap_sasl_authid = ServicePrincipalName | userPrincipalName
ldap_user_ssh_public_key = sshPublicKeys
ldap_user_certificate = noSuchAttribute

Trying to capture boot time logs wasn't successful (with sssd enabled with systemd). Just a big gap in kernel.log and syslog. Watching the boot process shows that it hangs at:


Started D-Bus System Messaging Bus


Which appears to be corroborate the problems being indicated in the sssd logs. Some kinda problem with sssd sending dbus messages, right? If I revert to -104, all good. Anything above that version, sadness.


What is weird is that I do not have these problems on a test system using VirtualBox on my MacBook Pro running the same version of Ubuntu 16.04.4 LTS. I'm wondering if this might be a Meltdown related issue, as it happened with the kernel version that was initially released to address it (-108). We applied updates to our physical hosts, ESXi and the fixes for the guest OS.  I'm afraid it is a weird confluence of host+vmware+guest OS problems.


Already tried making sssd.service the last service to start.  No change in behavior other than it hangs later in the boot process.


Any insight or help appreciated! Thanks!


Dave