Am Wed, Aug 25, 2021 at 10:32:58AM -0500 schrieb Spike White:
*Short summary: * How can we troubleshoot sssd’s ‘Automatic Kerberos Host
Keytab Renewal’ process? We have ~0.4% of our Linux servers dropping
off the AD domain monthly.
Over the past two years, we have on-boarded sssd as our Linux AD
integration component. Largely displacing a former commercial product that
did the same.
We have about ~20K Linux servers that are sssd-enabled. A mix of RHEL6,
RHEL7, RHEL8, Oracle Linux 6, 7 and 8. We have ~7K Linux servers still on
the old commercial product. (For certain edge-case scenarios, such as
DMZs, the commercial product works better.)
Our AD forest is a single AD forest, with 4 regional child domains. All
with transitive trust. Sssd auto-discovers parent domain and all 4 child
domains, no problem – whenever it’s adcli joined to its regional local
Why are I writing this?
Because we are researching an ongoing problem reported by L1 server ops.
About 70 – 80 sssd-enabled Linux servers / month drop off the domain. Out
of our current sssd-enabled population of ~20K server, that’s not
horrible. But still it should be better. (Our former commercial product
It’s not limited to one particular OS, OS version, build location or
region. We have surveyed; it seems to occur randomly among all OS
versions, regions and locations.
To be clear, it’s extremely likely that this behavior arising from some
subtle misconfiguration on our part – not from any sssd or adcli or
Kerberos bug. We have a couple of configuration improvements we’re
pursuing. (Kerberos max ticket lifetime mismatch between AD and
/etc/krb5.conf file for instance.)
We are taking sssd’s default settings for
ad_machine_account_password_renewal_opts. So after 30 days, sssd will
attempt daily to renew the host Kerberos keytab file. It should re-attempt
daily if not renewed. By company policy, our AD disables any machine
accounts that have not renewed their credentials in 40 days. So when we
find servers that have dropped off the domain, it’s because they have not
renewed their AD machine accounts in 40 days.
if this happens again it would be good the check the highest KVNO from
the local keytab and the one stored in AD for the affected computer. The
LDAP attribute in AD is called 'msDS-KeyVersionNumber' and can be looked
up with 'adcli show-computer'.
As long as the KVNO is not reset by your disabling mechanism in AD this
would help to understand if SSSD/adcli just didn't update the key, in
this case the KVNOs should be the same. Or if there was a failed update
earlier and as a result the client wasn't able to update the key again,
in this case the AD KVNO should be 1 higher than the one from the
Would the users recognise if SSSD on the computer is offline, i.e. if
they do not get a fresh Kerberos ticket when logging in? I'm asking
because if SSSD is offline 'adcli update' is not called.
We have SR’s open with our OS vendors (Redhat and Oracle respectively) for
months now. To no great help. (They gave a few suggestions, but none
We thought we were hitting this bug:
But packet captures proved that adcli update is using TCP on RHEL7/8.
Thus, this might be a potential problem, but only on RHEL6. (BTW
‘udp_preference_limit = 0’ doesn’t force use of TCP for the kpasswd
invocation in RHEL6 – it still uses UDP. Thus, the recommended work-around
for this bug doesn’t work.)
So that isn’t our underlying problem.
We’re at a loss now – as you can see, we’re grasping at straws.
How can we troubleshoot sssd’s ‘automatic Kerberos Host keytab renewal’
process? Whenever we inspect a particular server it works. We can’t run
all sssd clients at debug level 9; it fills up /var/log filesystem after a
few days of that. We’re interested in troubleshooting that one particular
sssd process on all clients; not all parts of sssd.
Other than a steep learning curve (on our part), obscure situations (like
DMZ auto-discovery of AD controllers) and exotic scenarios (like above),
we’re quite happy with our 2 yr journey of direct AD integration with
sssd. Obviously, the troubleshooting tools on RHEL6 are very minimal.
But certainly, overall the quality of sssd on RHEL7/8 is excellent. AD
integration has innumerable devils in the details; I’m amazed that sssd
performs as well as it does against our multi-domain forest.
PS the problem with sssd auto-discovery of AD controllers in DMZs has been
fixed in a recent sssd release. The better discovery algorithm was
implemented – same one used by Windows clients and commercial products.
It’s just that recent sssd version is not on RHEL7 or 8.
sssd-users mailing list -- sssd-users(a)lists.fedorahosted.org
To unsubscribe send an email to sssd-users-leave(a)lists.fedorahosted.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure