Sumit,

Thanks.   BTW, this occurs primarily on RHEL7, RHEL8, Oracle Linux 7 and 8.  (We have very few RHEL6 servers left, but some of them are sssd enabled.)

We verified (via tcpdump, wireshark) that *L7/8 use TCP exclusively for this kpasswd operation.  As we're aware of this UDP problem.  So this is a new password change problem, not the old UDP-related problem.

BTW, we also verified that RHEL6 uses UDP exclusively for this kpasswd operation.  Regardless of setting of udp_preference_limit in /etc/krb5.conf.  That setting apparently is only for the Kerberos port (TCP/UDP 88), not for the kpasswd port.

Yes, we're very anxious to hear what our AD admins will tell us from their AD DC logs.

Spike
 

On Wed, Sep 29, 2021 at 5:13 AM Sumit Bose <sbose@redhat.com> wrote:
Am Tue, Sep 28, 2021 at 03:18:06PM -0500 schrieb Spike White:
> All,
>
> We took Sumit’s advice and enabled sssd’s debug level 7 on the “domain”
> section of sssd.conf.   On about 2300 non-prod Linux servers.
>
> FYI – beware if you do this!  We found occurrences where that
> sssd_amer.company.com_log was 8 GB after 24 hrs.  So you’ll likely have to
> fine-tune your sssd logrotate file or even more drastic actions.
>
> RECAP:  Randomly on 0.24% of our Linux servers, after 30 days sssd will
> drop off the AD domain.  We find this occurs during the automatic Kerberos
> Host Keytab renewal.  The KVNO number in AD is one more than the latest
> KVNO number in /etc/krb5.keytab file.
>
> Due to sssd debug level 7, we now have verbose ‘adcli update’  output in
> our sssd_<domain>.company.com_log files.   For two such culprits.  The
> output shows the same error condition.  Here is example output:
>
> (2021-09-28  3:44:23): [be[amer.company.com]]
> [ad_machine_account_password_renewal_done] (0x1000): --- adcli output
> start---
>
>  * Found realm in keytab: AMER.COMPANY.COM
>
>  * Found computer name in keytab: KEWNLR2CU2APP01
>
>  * Found service principal in keytab: host/kewnlr2cu2app01.amer.company.com
>
>  * Found host qualified name in keytab: kewnlr2cu2app01.amer.company.com
>
>  * Found service principal in keytab: host/KEWNLR2CU2APP01
>
>  * Found service principal in keytab: RestrictedKrbHost/KEWNLR2CU2APP01
>
>  * Found service principal in keytab: RestrictedKrbHost/
> kewnlr2cu2app01.amer.company.com
>
>  * Using fully qualified name: kewnlr2cu2app01.amer.company.com
>
>  * Using domain name: amer.company.com
>
>  * Using computer account name: KEWNLR2CU2APP01
>
>  * Using domain realm: amer.company.com
>
>  * Sending NetLogon ping to domain controller:
> AUSDC16AMER23.amer.company.com
>
>  * Received NetLogon info from: AUSDC16AMER23.amer.company.com
>
>  * Wrote out krb5.conf snippet to
> /tmp/adcli-krb5-HRsQ9K/krb5.d/adcli-krb5-conf-yBNrRI
>
>  * Authenticated as default/reset computer account: KEWNLR2CU2APP01
>
>  * Using GSS-SPNEGO for SASL bind
>
>  * Looked up short domain name: AMERICAS
>
>  * Looked up domain SID: S-1-5-21-1802859667-647903414-1863928812
>
>  * Using fully qualified name: kewnlr2cu2app01.amer.company.com
>
>  * Using domain name: amer.company.com
>
>  * Using computer account name: KEWNLR2CU2APP01
>
>  * Using domain realm: amer.company.com
>
>  * Using fully qualified name: kewnlr2cu2app01.amer.company.com
>
>  * Enrolling computer name: KEWNLR2CU2APP01
>
>  * Generated 120 character computer password
>
>  * Using keytab: FILE:/etc/krb5.keytab
>
>  * Found computer account for KEWNLR2CU2APP01$ at:
> CN=KEWNLR2CU2APP01,OU=Servers,OU=UNIX,DC=amer,DC=company,DC=com
>
>  * Retrieved kvno '17' for computer account in directory:
> CN=KEWNLR2CU2APP01,OU=Servers,OU=UNIX,DC=amer,DC=company,DC=com
>
>  * Sending NetLogon ping to domain controller:
> AUSDC16AMER23.amer.company.com
>
>  * Received NetLogon info from: AUSDC16AMER23.amer.company.com
>
>  ! Cannot change computer password: Authentication error
>
> adcli: updating membership with domain amer.company.com failed: Cannot
> change computer password: Authentication error
>
> ---adcli output end---
>
>
>
> Within 1.5 mins of the above, we receive errors in /var/log/messages as
> below:
>
> Sep 28 03:45:51 kewnlr2cu2app01 sssd[ldap_child[288005]][288005]: Failed to
> initialize credentials using keytab [MEMORY:/etc/krb5.keytab]:
> Preauthentication failed. Unable to create GSSAPI-encrypted LDAP connection.
>
> We verify in /etc/krb5.keytab file that the latest KVNO is still 17, while
> in AD the KVNO is now 18.  Also, the time of the last password changed in
> AD exactly corresponds to above:
>
> PS C:\Users\spike_white> get-adcomputer kewnlr2cu2app01 -Property
> 'PasswordLastSet'
>
>
>
>
>
> DistinguishedName :
> CN=KEWNLR2CU2APP01,OU=Servers,OU=UNIX,DC=amer,DC=company,DC=com
>
> DNSHostName       : kewnlr2cu2app01.amer.company.com
>
> ...
>
> Name              : KEWNLR2CU2APP01
>
> ObjectClass       : computer
>
> ...
>
> PasswordLastSet   : 9/28/2021 3:44:23 AM
>
> SamAccountName    : KEWNLR2CU2APP01$
>
> ...
>
> UserPrincipalName : host/kewnlr2cu2app01.amer.company.com@AMER.COMPANY.COM
>
>
>
> PS C:\Users\spike_white> get-adcomputer kewnlr2cu2app01 -property
> msDS-KeyVersionNumber
>
>
>
>
>
> DistinguishedName     :
> CN=KEWNLR2CU2APP01,OU=Servers,OU=UNIX,DC=amer,DC=company,DC=com
>
> DNSHostName           : kewnlr2cu2app01.amer.company.com
>
> ...
>
> msDS-KeyVersionNumber : 18
>
>
>
> Of course, after this, the adcli output in _<domain>.company.com_log file,
> will continue to show Kerberos pre-authentication errors.  Because now
> adcli update is using the old machine account password, while AD has the
> new machine account password:
>
>
>
> (2021-09-28  4:13:42): [be[amer.company.com]]
> [ad_machine_account_password_renewal_done] (0x1000): --- adcli output
> start---
>
>  * Found realm in keytab: AMER.COMPANY.COM
>
>  * Found computer name in keytab: KEWNLR2CU2APP01
>
>  * Found service principal in keytab: host/kewnlr2cu2app01.amer.company.com
>
>  * Found host qualified name in keytab: kewnlr2cu2app01.amer.company.com
>
>  * Found service principal in keytab: host/KEWNLR2CU2APP01
>
>  * Found service principal in keytab: RestrictedKrbHost/KEWNLR2CU2APP01
>
>  * Found service principal in keytab: RestrictedKrbHost/
> kewnlr2cu2app01.amer.company.com
>
>  * Using fully qualified name: kewnlr2cu2app01.amer.company.com
>
>  * Using domain name: amer.company.com
>
>  * Using computer account name: KEWNLR2CU2APP01
>
>  * Using domain realm: amer.company.com
>
>  * Discovering domain controllers: _ldap._tcp.amer.company.com
>
>  * Sending NetLogon ping to domain controller:
> RDUDC16AMER04.amer.company.com
>
>  * Received NetLogon info from: RDUDC16AMER04.amer.company.com
>
>  * Discovering site domain controllers: _ldap._tcp.AMERAustin._sites.dc._
> msdcs.amer.company.com
>
>  * Sending NetLogon ping to domain controller:
> AUSDC16AMER34.amer.company.com
>
>  * Received NetLogon info from: AUSDC16AMER34.amer.company.com
>
>  * Wrote out krb5.conf snippet to
> /tmp/adcli-krb5-i7P6zR/krb5.d/adcli-krb5-conf-vkBoqT
>
>  ! Couldn't authenticate as machine account: KEWNLR2CU2APP01:
> Preauthentication failed
>
> adcli: couldn't connect to amer.company.com domain: Couldn't authenticate
> as machine account: KEWNLR2CU2APP01: Preauthentication failed
>
> ---adcli output end---
>
>
>
> In summary, for some reason adcli update after attempting to set the
> machine account password thinks that it’s receiving a Kerberos
> authentication error.  Actually, it has successfully set the machine
> account password in AD.  Because it thinks that it did not successfully
> update the machine account password, it does not update the entiries in the
> local /etc/krb5.keytab file.
>

Hi,

thank you for this extensive analysis.

>
>
> We have our AD admins examining the AD domain controller logs now (since we
> have an exact DC name, exact time and exact client FQDN above).
>

The 'Authentication error' error is coming from the password changing
operation itself. According to the related RFC
(https://datatracker.ietf.org/doc/html/rfc3244) it means 'request fails
due to an error in authentication processing'. Please let me know if
your AD admins can fine anything odd in the logs at this time.

>
>
> At this point, we’re unsure whether this is an adcli problem or an AD
> problem.
>
>
>
> Does adcli update attempt to authenticate back to the same AD DC with the
> new password?  Or does it randomly pick an AD DC to authentication back to,
> with the new password?

No, adcli does not try to authenticate back with the new password. But
this might be some way out of this issue. If AD returns an error when
trying to update a machine account password adcli can try to
authenticate with new password to see if it is accepted or not. But
there still might be a race condition. With the old error when using udp
adcli got the error code back before the AD DC has update the password.
So even when talking to the same DC to avoid replication issues it might
be possible that the new password will not work immediately but only
after a timeout. So checking if the new password is accepted after an
error might be a workaround but it might not work in all cases.

bye,
Sumit

>
>
>
> Spike White
>
> On Wed, Aug 25, 2021 at 10:32 AM Spike White <spikewhitetx@gmail.com> wrote:
_______________________________________________
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure