So i think you just want:
kinit -R -k $HOSTNAME$
On 8/30/21 11:28 AM, Mote, Todd wrote:
I wrote that from memory. What's needed is the shortname and a $, but thinking more about it now, that's needed at the end of the shortname not the beginning.
-----Original Message----- From: Patrick Goetz pgoetz@math.utexas.edu Sent: Monday, August 30, 2021 11:25 AM To: sssd-users@lists.fedorahosted.org Subject: [SSSD-users]Re: Trouble-shooting sssd’s ‘Automatic Kerberos Host Keytab Renewal’ with AD back-end….
Todd,
On 8/27/21 9:41 AM, Mote, Todd wrote:
We ultimately decided to deploy a cron job with the install that ran periodically (less than the renewal period) to keep the keytab fresh (kinit -R -k $($hostname -s)). We haven't had computers falling off the domain since we implemented that.
Are you sure about this syntax? Adding a -s flag in $( ) containing a bash variable doesn't do anything.
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
This message is from an external sender. Learn more about why this << matters at https://links.utexas.edu/rtyclf. <<
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
This message is from an external sender. Learn more about why this << matters at https://links.utexas.edu/rtyclf. <<
All,
OK we have a query we run in AD for machine account passwords for a certain age. In today's run, 31 - 32 days. Then we verify it's pingable.
We have found such one such suspicious candidate today (two actually, but the other Linux server is quite sick). So one good research candidate. According to both AD and /etc/krb5.keytab file, the machine account password was last set on 7/29. Today is 8/31, so that would be 32 days. This 'automatic machine account keytab renewal' background task should trigger again today.
sssd service was last started 2 weeks ago and, by all appearances, appears healthy. sssctl domain-status <domain> shows online, connected to AD servers (both domain and GC servers).. All logins and group enumerations working as expected.
Just now, we dynamically set the debug level to 9 with 'sssctl debug-level 9'. This particular server is Oracle Linux 8.4, running sssd-*-2.4.0-9.0.1.el8_4.1.x86_64. Installed July 13th, 2021. So -- very recent sssd version. (This problem occurs with both RHEL & OL 6/7/8, it's just today's candidate happens to be OL8.)
We can't keep debug level 9 up for a great many days; it swamps the /var/log filesystem. But we can leave up for a few days. We purposely did not restart sssd server as we know that would trigger a machine account renewal.
Speaking of that -- from Sumit's sssd source code in ad_provider/ad_machine_pw_renewal.c, it appears that sssd is creating a back-end task to call external program /usr/sbin/adcli with certain args. What string can I look for in which sssd log file (now that I have debug level 9 enabled) to tell me when this 'adcli update' task (aka 'automatic machine account keytab renewal') is triggered?
I'm less certain now that we've surveyed our env that this background 'adcli update' task is the reason behind 70 - 80 servers / month dropping off the domain. It might be a slight contributor, but I find only a very few pingable servers with machine account last renewal date between 30 and 40 days.
Yes, I can disable this default 30 day automatic update and roll my own 'adcli update' cron. But that's a mass deployment, to fix what might not be the problem. I want to verify this is the actual culprit before I take those drastic steps.
Spike
On Mon, Aug 30, 2021 at 11:39 AM Patrick Goetz pgoetz@math.utexas.edu wrote:
So i think you just want:
kinit -R -k $HOSTNAME$
On 8/30/21 11:28 AM, Mote, Todd wrote:
I wrote that from memory. What's needed is the shortname and a $, but
thinking more about it now, that's needed at the end of the shortname not the beginning.
-----Original Message----- From: Patrick Goetz pgoetz@math.utexas.edu Sent: Monday, August 30, 2021 11:25 AM To: sssd-users@lists.fedorahosted.org Subject: [SSSD-users]Re: Trouble-shooting sssd’s ‘Automatic Kerberos
Host Keytab Renewal’ with AD back-end….
Todd,
On 8/27/21 9:41 AM, Mote, Todd wrote:
We ultimately decided to deploy a cron job with the install that ran
periodically (less than the renewal period) to keep the keytab fresh (kinit -R -k $($hostname -s)). We haven't had computers falling off the domain since we implemented that.
Are you sure about this syntax? Adding a -s flag in $( ) containing a
bash variable doesn't do anything.
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To
unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure
This message is from an external sender. Learn more about why this << matters at https://links.utexas.edu/rtyclf. <<
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure
This message is from an external sender. Learn more about why this << matters at https://links.utexas.edu/rtyclf. <<
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Tue, Aug 31, 2021 at 6:47 PM Spike White spikewhitetx@gmail.com wrote:
All,
OK we have a query we run in AD for machine account passwords for a certain age. In today's run, 31 - 32 days. Then we verify it's pingable.
We have found such one such suspicious candidate today (two actually, but the other Linux server is quite sick). So one good research candidate. According to both AD and /etc/krb5.keytab file, the machine account password was last set on 7/29. Today is 8/31, so that would be 32 days. This 'automatic machine account keytab renewal' background task should trigger again today.
sssd service was last started 2 weeks ago and, by all appearances, appears healthy. sssctl domain-status <domain> shows online, connected to AD servers (both domain and GC servers).. All logins and group enumerations working as expected.
Just now, we dynamically set the debug level to 9 with 'sssctl debug-level 9'. This particular server is Oracle Linux 8.4, running sssd-*-2.4.0-9.0.1.el8_4.1.x86_64. Installed July 13th, 2021. So -- very recent sssd version. (This problem occurs with both RHEL & OL 6/7/8, it's just today's candidate happens to be OL8.)
We can't keep debug level 9 up for a great many days; it swamps the /var/log filesystem. But we can leave up for a few days. We purposely did not restart sssd server as we know that would trigger a machine account renewal.
Speaking of that -- from Sumit's sssd source code in ad_provider/ad_machine_pw_renewal.c, it appears that sssd is creating a back-end task to call external program /usr/sbin/adcli with certain args. What string can I look for in which sssd log file (now that I have debug level 9 enabled) to tell me when this 'adcli update' task (aka 'automatic machine account keytab renewal') is triggered?
It seems SSSD itself only logs in case of errors. I didn't find any explicit logs around `ad_machine_account_password_renewal_send()`. But perhaps there will be something like "[be_ptask_execute] (0x0400): Task [AD machine account password renewal]: executing task" from generic be_ptask_* helpers in the sssd_$domain.log (I'm not sure).
Also at this verbosity level `--verbose` should be supplied to adcli itself and I guess output should be captured in sssd_$domain.log as well. I'm not familiar with `adcli` internals, you can take a glance at https://gitlab.freedesktop.org/realmd/adcli to find its log messages.
I'm less certain now that we've surveyed our env that this background 'adcli update' task is the reason behind 70 - 80 servers / month dropping off the domain. It might be a slight contributor, but I find only a very few pingable servers with machine account last renewal date between 30 and 40 days.
Yes, I can disable this default 30 day automatic update and roll my own 'adcli update' cron. But that's a mass deployment, to fix what might not be the problem. I want to verify this is the actual culprit before I take those drastic steps.
Spike
Am Tue, Aug 31, 2021 at 09:53:01PM +0200 schrieb Alexey Tikhonov:
On Tue, Aug 31, 2021 at 6:47 PM Spike White spikewhitetx@gmail.com wrote:
All,
OK we have a query we run in AD for machine account passwords for a certain age. In today's run, 31 - 32 days. Then we verify it's pingable.
We have found such one such suspicious candidate today (two actually, but the other Linux server is quite sick). So one good research candidate. According to both AD and /etc/krb5.keytab file, the machine account password was last set on 7/29. Today is 8/31, so that would be 32 days. This 'automatic machine account keytab renewal' background task should trigger again today.
sssd service was last started 2 weeks ago and, by all appearances, appears healthy. sssctl domain-status <domain> shows online, connected to AD servers (both domain and GC servers).. All logins and group enumerations working as expected.
Just now, we dynamically set the debug level to 9 with 'sssctl debug-level 9'. This particular server is Oracle Linux 8.4, running sssd-*-2.4.0-9.0.1.el8_4.1.x86_64. Installed July 13th, 2021. So -- very recent sssd version. (This problem occurs with both RHEL & OL 6/7/8, it's just today's candidate happens to be OL8.)
We can't keep debug level 9 up for a great many days; it swamps the /var/log filesystem. But we can leave up for a few days. We purposely did not restart sssd server as we know that would trigger a machine account renewal.
Speaking of that -- from Sumit's sssd source code in ad_provider/ad_machine_pw_renewal.c, it appears that sssd is creating a back-end task to call external program /usr/sbin/adcli with certain args. What string can I look for in which sssd log file (now that I have debug level 9 enabled) to tell me when this 'adcli update' task (aka 'automatic machine account keytab renewal') is triggered?
It seems SSSD itself only logs in case of errors. I didn't find any explicit logs around `ad_machine_account_password_renewal_send()`. But perhaps there will be something like "[be_ptask_execute] (0x0400): Task [AD machine account password renewal]: executing task" from generic be_ptask_* helpers in the sssd_$domain.log (I'm not sure).
Also at this verbosity level `--verbose` should be supplied to adcli itself and I guess output should be captured in sssd_$domain.log as well. I'm not familiar with `adcli` internals, you can take a glance at https://gitlab.freedesktop.org/realmd/adcli to find its log messages.
Hi,
if SSSD's debug_level is 7 or higher the '--verbose' option is set when calling adcli and the output is added to the backend logs. It will start with log message "--- adcli output start---".
HTH
bye, Sumit
I'm less certain now that we've surveyed our env that this background 'adcli update' task is the reason behind 70 - 80 servers / month dropping off the domain. It might be a slight contributor, but I find only a very few pingable servers with machine account last renewal date between 30 and 40 days.
Yes, I can disable this default 30 day automatic update and roll my own 'adcli update' cron. But that's a mass deployment, to fix what might not be the problem. I want to verify this is the actual culprit before I take those drastic steps.
Spike
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Ok, this is *very* illuminating!
I see this in sssd_amer.company.com.log"
(2021-09-01 3:44:46): [be[amer.company.com]] [ad_machine_account_password_renewal_done] (0x1000): --- adcli output start--- adcli: couldn't connect to amer.company.com domain: Couldn't authenticate as machine account: ZZZKBTDURBOL8: Preauthentication failed ---adcli output end---
However, I don't find that host name ZZZKBTDURBOL8 anywhere on the system. (By company convention, servers named ZZZ* are test servers that linux SEs spin up themselves).
This server that's not renewing its creds is named: nwpllv8bu100.amer.company.com. it's a std dev server. in /etc/sssd/sssd.conf file, it has that as its sasl auth ID:
[root@nwpllv8bu100 sssd]# grep sasl /etc/sssd/sssd.conf ldap_sasl_authid = host/nwpllv8bu100.amer.dell.com@AMER.COMPANY.COM [root@nwpllv8bu100 sssd]#
If I do 'kinit -k', the /etc/krb5.keytab file has that name as well:
[root@nwpllv8bu100 sssd]# kinit -k [root@nwpllv8bu100 sssd]# klist Ticket cache: KCM:0 Default principal: host/nwpllv8bu100.amer.dell.com@AMER.COMPANY.COM
Valid starting Expires Service principal 09/01/2021 11:04:16 09/01/2021 21:04:16 krbtgt/ AMER.DELL.COM@AMER.COMPANY.COM renew until 09/08/2021 11:04:16 [root@nwpllv8bu100 sssd]#
I searched /etc/sssd/sssd.conf -- no "zzz" or "ZZZ" string is anywhere in there. So where is sssd picking up this name ZZZKBTDURBOL8 and passing it to adcli update?
Spike
On Wed, Sep 1, 2021 at 2:46 AM Sumit Bose sbose@redhat.com wrote:
Am Tue, Aug 31, 2021 at 09:53:01PM +0200 schrieb Alexey Tikhonov:
On Tue, Aug 31, 2021 at 6:47 PM Spike White spikewhitetx@gmail.com
wrote:
All,
OK we have a query we run in AD for machine account passwords for a certain age. In today's run, 31 - 32 days. Then we verify it's
pingable.
We have found such one such suspicious candidate today (two actually,
but
the other Linux server is quite sick). So one good research candidate. According to both AD and /etc/krb5.keytab file, the machine account password was last set on 7/29. Today is 8/31, so that would be 32
days.
This 'automatic machine account keytab renewal' background task should trigger again today.
sssd service was last started 2 weeks ago and, by all appearances,
appears
healthy. sssctl domain-status <domain> shows online, connected to AD servers (both domain and GC servers).. All logins and group
enumerations
working as expected.
Just now, we dynamically set the debug level to 9 with 'sssctl
debug-level
9'. This particular server is Oracle Linux 8.4, running sssd-*-2.4.0-9.0.1.el8_4.1.x86_64. Installed July 13th,
- So
-- very recent sssd version. (This problem occurs with both RHEL & OL 6/7/8, it's just today's candidate happens to be OL8.)
We can't keep debug level 9 up for a great many days; it swamps the /var/log filesystem. But we can leave up for a few days. We
purposely did
not restart sssd server as we know that would trigger a machine account renewal.
Speaking of that -- from Sumit's sssd source code in ad_provider/ad_machine_pw_renewal.c, it appears that sssd is creating a back-end task to call external program /usr/sbin/adcli with certain
args.
What string can I look for in which sssd log file (now that I have
debug
level 9 enabled) to tell me when this 'adcli update' task (aka
'automatic
machine account keytab renewal') is triggered?
It seems SSSD itself only logs in case of errors. I didn't find any explicit logs around `ad_machine_account_password_renewal_send()`. But perhaps there will be something like "[be_ptask_execute] (0x0400):
Task
[AD machine account password renewal]: executing task" from generic be_ptask_* helpers in the sssd_$domain.log (I'm not sure).
Also at this verbosity level `--verbose` should be supplied to adcli
itself
and I guess output should be captured in sssd_$domain.log as well. I'm
not
familiar with `adcli` internals, you can take a glance at https://gitlab.freedesktop.org/realmd/adcli to find its log messages.
Hi,
if SSSD's debug_level is 7 or higher the '--verbose' option is set when calling adcli and the output is added to the backend logs. It will start with log message "--- adcli output start---".
HTH
bye, Sumit
I'm less certain now that we've surveyed our env that this background 'adcli update' task is the reason behind 70 - 80 servers / month
dropping
off the domain. It might be a slight contributor, but I find only a
very
few pingable servers with machine account last renewal date between 30
and
40 days.
Yes, I can disable this default 30 day automatic update and roll my own 'adcli update' cron. But that's a mass deployment, to fix what might
not
be the problem. I want to verify this is the actual culprit before I
take
those drastic steps.
Spike
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
So to respond to my own email, but a co-worker did finally find some references to that bizarre name ZZZKBTDURBOL8.
[root@nwpllv8bu100 post_install]# klist -kte Keytab name: FILE:/etc/krb5.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 2 07/13/2021 16:42:17 ZZZKBTDURBOL8$@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 ZZZKBTDURBOL8$@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 host/ zzzkbtdurbol8.amer.company.com@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 host/ zzzkbtdurbol8.amer.company.com@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 host/ZZZKBTDURBOL8@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 host/ZZZKBTDURBOL8@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 RestrictedKrbHost/ZZZKBTDURBOL8@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 RestrictedKrbHost/ZZZKBTDURBOL8@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 RestrictedKrbHost/ zzzkbtdurbol8.amer.company.com@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 RestrictedKrbHost/ zzzkbtdurbol8.amer.company.com@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 NWPLLV8BU100$@AMER.COMPANY.COM (DEPRECATED:arcfour-hmac) 3 07/29/2021 13:38:27 NWPLLV8BU100$@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 NWPLLV8BU100$@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 host/nwpllv8bu100.amer.company.com@AMER.COMPANY.COM (DEPRECATED:arcfour-hmac) 3 07/29/2021 13:38:27 host/nwpllv8bu100.amer.company.com@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 host/nwpllv8bu100.amer.company.com@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 host/NWPLLV8BU100@AMER.COMPANY.COM (DEPRECATED:arcfour-hmac) 3 07/29/2021 13:38:27 host/NWPLLV8BU100@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 host/NWPLLV8BU100@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 RestrictedKrbHost/NWPLLV8BU100@AMER.COMPANY.COM (DEPRECATED:arcfour-hmac) 3 07/29/2021 13:38:27 RestrictedKrbHost/NWPLLV8BU100@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 RestrictedKrbHost/NWPLLV8BU100@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 RestrictedKrbHost/ nwpllv8bu100.amer.company.com@AMER.COMPANY.COM (DEPRECATED:arcfour-hmac) 3 07/29/2021 13:38:27 RestrictedKrbHost/ nwpllv8bu100.amer.company.com@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 RestrictedKrbHost/ nwpllv8bu100.amer.company.com@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96)
I realize this is reflecting the literal entries in the /etc/krb5.keytab file. So it appears that when this VM was born (on July 13th), it was named zzzkbtdurbo18.amer.company.com. (I see other supporting evidence for this). On or before July 29th, it was renamed to final FQDN nwpllv8bu100.amer.company.com. /etc/hostname, /etc/hosts, /etc/sysconfig/network etc were all updated and it was rejoined to AD.
kinit -k works fine. It picks up the current hostname and apparently uses host/nwpllv8bu100.amer.company.com@AMER.COMPANY.COM as its service principal. Since there's a valid entry in /etc/krb5.keytab file for this, it uses this and all is good. (I'm guessing it uses the 14th or 15th /etc/krb5.keytab file entry above.)
sssd works, because it has this line:
ldap_sasl_authid = host/nwpllv8bu100.amer.company.com@AMER.COMPANY.COM
But when sssd invokes adcli update to refresh the machine account password, adcli update fails.
Also, I see that adcli testjoin fails.
[root@nwpllv8bu100 tmp]# adcli testjoin -D AMER.COMPANY.COM adcli: couldn't connect to AMER.COMPANY.COM domain: Couldn't authenticate as machine account: ZZZKBTDURBOL8: Preauthentication failed [root@nwpllv8bu100 tmp]#
From a strace of this adcli testjoin, it appears that adcli is opening the /etc/krb5.keytab file to determine the default service principal to use and is pulling the old server name. (instead of using the correct service principal, as kinit -k somehow does.)
Maybe when sssd constructs this adcli update invocation, it's not passing the ldap_sasl_authid, so the adcli update is doing the above logic to pull the old server name?
Sounds like an adcli problem. Adcli should do as 'kinit -k' does when it's passed no explicit service principal. Should dive into /etc/krb5.keytab file and use the most recent set of entries (KVNO = 3 in above example). Maybe derive the default service principal off the current FQDN and Kerberos realm?
Spike PS As a general policy, we are not supposed to clone a VM and rename it to another FQDN/IP address. I'll be trying to track down who did this and for what reason.
On Wed, Sep 1, 2021 at 10:08 AM Spike White spikewhitetx@gmail.com wrote:
Ok, this is *very* illuminating!
I see this in sssd_amer.company.com.log"
(2021-09-01 3:44:46): [be[amer.company.com]] [ad_machine_account_password_renewal_done] (0x1000): --- adcli output start--- adcli: couldn't connect to amer.company.com domain: Couldn't authenticate as machine account: ZZZKBTDURBOL8: Preauthentication failed ---adcli output end---
However, I don't find that host name ZZZKBTDURBOL8 anywhere on the system. (By company convention, servers named ZZZ* are test servers that linux SEs spin up themselves).
This server that's not renewing its creds is named: nwpllv8bu100.amer.company.com. it's a std dev server. in /etc/sssd/sssd.conf file, it has that as its sasl auth ID:
[root@nwpllv8bu100 sssd]# grep sasl /etc/sssd/sssd.conf ldap_sasl_authid = host/nwpllv8bu100.amer.dell.com@AMER.COMPANY.COM [root@nwpllv8bu100 sssd]#
If I do 'kinit -k', the /etc/krb5.keytab file has that name as well:
[root@nwpllv8bu100 sssd]# kinit -k [root@nwpllv8bu100 sssd]# klist Ticket cache: KCM:0 Default principal: host/nwpllv8bu100.amer.dell.com@AMER.COMPANY.COM
Valid starting Expires Service principal 09/01/2021 11:04:16 09/01/2021 21:04:16 krbtgt/ AMER.DELL.COM@AMER.COMPANY.COM renew until 09/08/2021 11:04:16 [root@nwpllv8bu100 sssd]#
I searched /etc/sssd/sssd.conf -- no "zzz" or "ZZZ" string is anywhere in there. So where is sssd picking up this name ZZZKBTDURBOL8 and passing it to adcli update?
Spike
On Wed, Sep 1, 2021 at 2:46 AM Sumit Bose sbose@redhat.com wrote:
Am Tue, Aug 31, 2021 at 09:53:01PM +0200 schrieb Alexey Tikhonov:
On Tue, Aug 31, 2021 at 6:47 PM Spike White spikewhitetx@gmail.com
wrote:
All,
OK we have a query we run in AD for machine account passwords for a certain age. In today's run, 31 - 32 days. Then we verify it's
pingable.
We have found such one such suspicious candidate today (two actually,
but
the other Linux server is quite sick). So one good research
candidate.
According to both AD and /etc/krb5.keytab file, the machine account password was last set on 7/29. Today is 8/31, so that would be 32
days.
This 'automatic machine account keytab renewal' background task
should
trigger again today.
sssd service was last started 2 weeks ago and, by all appearances,
appears
healthy. sssctl domain-status <domain> shows online, connected to AD servers (both domain and GC servers).. All logins and group
enumerations
working as expected.
Just now, we dynamically set the debug level to 9 with 'sssctl
debug-level
9'. This particular server is Oracle Linux 8.4, running sssd-*-2.4.0-9.0.1.el8_4.1.x86_64. Installed July 13th,
- So
-- very recent sssd version. (This problem occurs with both RHEL & OL 6/7/8, it's just today's candidate happens to be OL8.)
We can't keep debug level 9 up for a great many days; it swamps the /var/log filesystem. But we can leave up for a few days. We
purposely did
not restart sssd server as we know that would trigger a machine
account
renewal.
Speaking of that -- from Sumit's sssd source code in ad_provider/ad_machine_pw_renewal.c, it appears that sssd is creating
a
back-end task to call external program /usr/sbin/adcli with certain
args.
What string can I look for in which sssd log file (now that I have
debug
level 9 enabled) to tell me when this 'adcli update' task (aka
'automatic
machine account keytab renewal') is triggered?
It seems SSSD itself only logs in case of errors. I didn't find any explicit logs around `ad_machine_account_password_renewal_send()`. But perhaps there will be something like "[be_ptask_execute] (0x0400):
Task
[AD machine account password renewal]: executing task" from generic be_ptask_* helpers in the sssd_$domain.log (I'm not sure).
Also at this verbosity level `--verbose` should be supplied to adcli
itself
and I guess output should be captured in sssd_$domain.log as well. I'm
not
familiar with `adcli` internals, you can take a glance at https://gitlab.freedesktop.org/realmd/adcli to find its log messages.
Hi,
if SSSD's debug_level is 7 or higher the '--verbose' option is set when calling adcli and the output is added to the backend logs. It will start with log message "--- adcli output start---".
HTH
bye, Sumit
I'm less certain now that we've surveyed our env that this background 'adcli update' task is the reason behind 70 - 80 servers / month
dropping
off the domain. It might be a slight contributor, but I find only a
very
few pingable servers with machine account last renewal date between
30 and
40 days.
Yes, I can disable this default 30 day automatic update and roll my
own
'adcli update' cron. But that's a mass deployment, to fix what might
not
be the problem. I want to verify this is the actual culprit before
I take
those drastic steps.
Spike
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Am Wed, Sep 01, 2021 at 11:39:30AM -0500 schrieb Spike White:
So to respond to my own email, but a co-worker did finally find some references to that bizarre name ZZZKBTDURBOL8.
[root@nwpllv8bu100 post_install]# klist -kte Keytab name: FILE:/etc/krb5.keytab KVNO Timestamp Principal
2 07/13/2021 16:42:17 ZZZKBTDURBOL8$@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 ZZZKBTDURBOL8$@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 host/ zzzkbtdurbol8.amer.company.com@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 host/ zzzkbtdurbol8.amer.company.com@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 host/ZZZKBTDURBOL8@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 host/ZZZKBTDURBOL8@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 RestrictedKrbHost/ZZZKBTDURBOL8@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 RestrictedKrbHost/ZZZKBTDURBOL8@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 RestrictedKrbHost/ zzzkbtdurbol8.amer.company.com@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 RestrictedKrbHost/ zzzkbtdurbol8.amer.company.com@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 NWPLLV8BU100$@AMER.COMPANY.COM (DEPRECATED:arcfour-hmac) 3 07/29/2021 13:38:27 NWPLLV8BU100$@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 NWPLLV8BU100$@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 host/nwpllv8bu100.amer.company.com@AMER.COMPANY.COM (DEPRECATED:arcfour-hmac) 3 07/29/2021 13:38:27 host/nwpllv8bu100.amer.company.com@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 host/nwpllv8bu100.amer.company.com@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 host/NWPLLV8BU100@AMER.COMPANY.COM (DEPRECATED:arcfour-hmac) 3 07/29/2021 13:38:27 host/NWPLLV8BU100@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 host/NWPLLV8BU100@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 RestrictedKrbHost/NWPLLV8BU100@AMER.COMPANY.COM (DEPRECATED:arcfour-hmac) 3 07/29/2021 13:38:27 RestrictedKrbHost/NWPLLV8BU100@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 RestrictedKrbHost/NWPLLV8BU100@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 RestrictedKrbHost/ nwpllv8bu100.amer.company.com@AMER.COMPANY.COM (DEPRECATED:arcfour-hmac) 3 07/29/2021 13:38:27 RestrictedKrbHost/ nwpllv8bu100.amer.company.com@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 RestrictedKrbHost/ nwpllv8bu100.amer.company.com@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96)
I realize this is reflecting the literal entries in the /etc/krb5.keytab file. So it appears that when this VM was born (on July 13th), it was named zzzkbtdurbo18.amer.company.com. (I see other supporting evidence for this). On or before July 29th, it was renamed to final FQDN nwpllv8bu100.amer.company.com. /etc/hostname, /etc/hosts, /etc/sysconfig/network etc were all updated and it was rejoined to AD.
kinit -k works fine. It picks up the current hostname and apparently uses host/nwpllv8bu100.amer.company.com@AMER.COMPANY.COM as its service principal. Since there's a valid entry in /etc/krb5.keytab file for this, it uses this and all is good. (I'm guessing it uses the 14th or 15th /etc/krb5.keytab file entry above.)
sssd works, because it has this line:
ldap_sasl_authid = host/nwpllv8bu100.amer.company.com@AMER.COMPANY.COM
But when sssd invokes adcli update to refresh the machine account password, adcli update fails.
Also, I see that adcli testjoin fails.
[root@nwpllv8bu100 tmp]# adcli testjoin -D AMER.COMPANY.COM adcli: couldn't connect to AMER.COMPANY.COM domain: Couldn't authenticate as machine account: ZZZKBTDURBOL8: Preauthentication failed [root@nwpllv8bu100 tmp]#
From a strace of this adcli testjoin, it appears that adcli is opening the /etc/krb5.keytab file to determine the default service principal to use and is pulling the old server name. (instead of using the correct service principal, as kinit -k somehow does.)
Hi,
it looks like your environment is a bit special. I guess you have added 'host/nwpllv8bu100.amer.company.com@AMER.COMPANY.COM' to the 'userPrincipalName' LDAP attribute in the AD computer object for this host.
# klist -k Keytab name: FILE:/etc/krb5.keytab KVNO Principal ---- -------------------------------------------------------------------------- 2 MASTER$@CHILD.AD.VM 2 MASTER$@CHILD.AD.VM 2 host/MASTER@CHILD.AD.VM 2 host/MASTER@CHILD.AD.VM 2 host/master.client.vm@CHILD.AD.VM 2 host/master.client.vm@CHILD.AD.VM 2 RestrictedKrbHost/MASTER@CHILD.AD.VM 2 RestrictedKrbHost/MASTER@CHILD.AD.VM 2 RestrictedKrbHost/master.client.vm@CHILD.AD.VM 2 RestrictedKrbHost/master.client.vm@CHILD.AD.VM # kdestroy -A # # # kinit -k kinit: Client 'host/master.client.vm@CHILD.AD.VM' not found in Kerberos database while getting initial credentials # kinit -k 'MASTER$@CHILD.AD.VM'
The reason is that 'kinit -k' constructs the principal by calling gethostname() or similar, adding the 'host/' prefix and the realm. But by default this principal in AD is only a service principal can cannot be used to request a TGT as kinit does. AD only allows user principals for request a TGT and this is by default 'SHORT$@AD.REALM'. If the userPrincipalName attribute is set, this principal given here is allowed as well.
That's why adcli is checking the keytab for a principal with a '$' character and by default it uses the first it finds because it is expected there is only one. Adding some heuristics in case there are more '$' principals in the keytab like highest KVNO might help in some cases but would fail in other cases so I think just using the first is good enough.
Maybe when sssd constructs this adcli update invocation, it's not passing the ldap_sasl_authid, so the adcli update is doing the above logic to pull the old server name?
Adding a new option to adcli and using the value from ldap_sasl_authid might be a solution, I will think about it.
HTH
bye, Sumit
Sounds like an adcli problem. Adcli should do as 'kinit -k' does when it's passed no explicit service principal. Should dive into /etc/krb5.keytab file and use the most recent set of entries (KVNO = 3 in above example). Maybe derive the default service principal off the current FQDN and Kerberos realm?
Spike PS As a general policy, we are not supposed to clone a VM and rename it to another FQDN/IP address. I'll be trying to track down who did this and for what reason.
On Wed, Sep 1, 2021 at 10:08 AM Spike White spikewhitetx@gmail.com wrote:
Ok, this is *very* illuminating!
I see this in sssd_amer.company.com.log"
(2021-09-01 3:44:46): [be[amer.company.com]] [ad_machine_account_password_renewal_done] (0x1000): --- adcli output start--- adcli: couldn't connect to amer.company.com domain: Couldn't authenticate as machine account: ZZZKBTDURBOL8: Preauthentication failed ---adcli output end---
However, I don't find that host name ZZZKBTDURBOL8 anywhere on the system. (By company convention, servers named ZZZ* are test servers that linux SEs spin up themselves).
This server that's not renewing its creds is named: nwpllv8bu100.amer.company.com. it's a std dev server. in /etc/sssd/sssd.conf file, it has that as its sasl auth ID:
[root@nwpllv8bu100 sssd]# grep sasl /etc/sssd/sssd.conf ldap_sasl_authid = host/nwpllv8bu100.amer.dell.com@AMER.COMPANY.COM [root@nwpllv8bu100 sssd]#
If I do 'kinit -k', the /etc/krb5.keytab file has that name as well:
[root@nwpllv8bu100 sssd]# kinit -k [root@nwpllv8bu100 sssd]# klist Ticket cache: KCM:0 Default principal: host/nwpllv8bu100.amer.dell.com@AMER.COMPANY.COM
Valid starting Expires Service principal 09/01/2021 11:04:16 09/01/2021 21:04:16 krbtgt/ AMER.DELL.COM@AMER.COMPANY.COM renew until 09/08/2021 11:04:16 [root@nwpllv8bu100 sssd]#
I searched /etc/sssd/sssd.conf -- no "zzz" or "ZZZ" string is anywhere in there. So where is sssd picking up this name ZZZKBTDURBOL8 and passing it to adcli update?
Spike
On Wed, Sep 1, 2021 at 2:46 AM Sumit Bose sbose@redhat.com wrote:
Am Tue, Aug 31, 2021 at 09:53:01PM +0200 schrieb Alexey Tikhonov:
On Tue, Aug 31, 2021 at 6:47 PM Spike White spikewhitetx@gmail.com
wrote:
All,
OK we have a query we run in AD for machine account passwords for a certain age. In today's run, 31 - 32 days. Then we verify it's
pingable.
We have found such one such suspicious candidate today (two actually,
but
the other Linux server is quite sick). So one good research
candidate.
According to both AD and /etc/krb5.keytab file, the machine account password was last set on 7/29. Today is 8/31, so that would be 32
days.
This 'automatic machine account keytab renewal' background task
should
trigger again today.
sssd service was last started 2 weeks ago and, by all appearances,
appears
healthy. sssctl domain-status <domain> shows online, connected to AD servers (both domain and GC servers).. All logins and group
enumerations
working as expected.
Just now, we dynamically set the debug level to 9 with 'sssctl
debug-level
9'. This particular server is Oracle Linux 8.4, running sssd-*-2.4.0-9.0.1.el8_4.1.x86_64. Installed July 13th,
- So
-- very recent sssd version. (This problem occurs with both RHEL & OL 6/7/8, it's just today's candidate happens to be OL8.)
We can't keep debug level 9 up for a great many days; it swamps the /var/log filesystem. But we can leave up for a few days. We
purposely did
not restart sssd server as we know that would trigger a machine
account
renewal.
Speaking of that -- from Sumit's sssd source code in ad_provider/ad_machine_pw_renewal.c, it appears that sssd is creating
a
back-end task to call external program /usr/sbin/adcli with certain
args.
What string can I look for in which sssd log file (now that I have
debug
level 9 enabled) to tell me when this 'adcli update' task (aka
'automatic
machine account keytab renewal') is triggered?
It seems SSSD itself only logs in case of errors. I didn't find any explicit logs around `ad_machine_account_password_renewal_send()`. But perhaps there will be something like "[be_ptask_execute] (0x0400):
Task
[AD machine account password renewal]: executing task" from generic be_ptask_* helpers in the sssd_$domain.log (I'm not sure).
Also at this verbosity level `--verbose` should be supplied to adcli
itself
and I guess output should be captured in sssd_$domain.log as well. I'm
not
familiar with `adcli` internals, you can take a glance at https://gitlab.freedesktop.org/realmd/adcli to find its log messages.
Hi,
if SSSD's debug_level is 7 or higher the '--verbose' option is set when calling adcli and the output is added to the backend logs. It will start with log message "--- adcli output start---".
HTH
bye, Sumit
I'm less certain now that we've surveyed our env that this background 'adcli update' task is the reason behind 70 - 80 servers / month
dropping
off the domain. It might be a slight contributor, but I find only a
very
few pingable servers with machine account last renewal date between
30 and
40 days.
Yes, I can disable this default 30 day automatic update and roll my
own
'adcli update' cron. But that's a mass deployment, to fix what might
not
be the problem. I want to verify this is the actual culprit before
I take
those drastic steps.
Spike
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On 9/2/21 12:49 AM, Sumit Bose wrote:
The reason is that 'kinit -k' constructs the principal by calling gethostname() or similar, adding the 'host/' prefix and the realm. But by default this principal in AD is only a service principal can cannot be used to request a TGT as kinit does. AD only allows user principals for request a TGT and this is by default 'SHORT$@AD.REALM'. If the userPrincipalName attribute is set, this principal given here is allowed as well.
This raises a couple of questions. Because of AD's flat address space, we use a host naming convention in AD as a sort of low rent namespacing; so, for example, for this host the college is cns and the research group cryo, so the AD hostname is cns-cryo-ross1$
However,
# hostname rossmann.biosci.utexas.edu
which is easier for the users to remember for ssh purposes. We set
ad_hostname = cns-cryo-ross1.austin.utexas.edu
in /etc/sssd/sssd.conf.
But I just checked, and kinit does not use ad_hostname, so I have to run it as
kinit -k -R cns-cryo-ross1$
The question is, then what does use the ad_hostname key/value pair?
Next, the kinit example provided by Spike was `kinit -k` -- we always run `kinit -k -R`
-R renews the TGT, which is what I thought is the thing set to expire in AD that needs to be periodically renewed. What's the purpose of running `kinit -k` without the -R?
Am Thu, Sep 02, 2021 at 10:02:54AM -0500 schrieb Patrick Goetz:
On 9/2/21 12:49 AM, Sumit Bose wrote:
The reason is that 'kinit -k' constructs the principal by calling gethostname() or similar, adding the 'host/' prefix and the realm. But by default this principal in AD is only a service principal can cannot be used to request a TGT as kinit does. AD only allows user principals for request a TGT and this is by default 'SHORT$@AD.REALM'. If the userPrincipalName attribute is set, this principal given here is allowed as well.
This raises a couple of questions. Because of AD's flat address space, we use a host naming convention in AD as a sort of low rent namespacing; so, for example, for this host the college is cns and the research group cryo, so the AD hostname is cns-cryo-ross1$
However,
# hostname rossmann.biosci.utexas.edu
which is easier for the users to remember for ssh purposes. We set
ad_hostname = cns-cryo-ross1.austin.utexas.edu
in /etc/sssd/sssd.conf.
But I just checked, and kinit does not use ad_hostname, so I have to run it as
kinit -k -R cns-cryo-ross1$
The question is, then what does use the ad_hostname key/value pair?
Next, the kinit example provided by Spike was `kinit -k` -- we always run `kinit -k -R`
-R renews the TGT, which is what I thought is the thing set to expire in AD that needs to be periodically renewed. What's the purpose of running `kinit -k` without the -R?
Hi,
there are two different things.
First, there are the host keys in the keytab which are equivalent to a user password. Those keys are renewed by 'adcli update' if they are older then 30 days, similar as you would renew you user password if the AD DC tells you to do it.
Second, with those keys you can request a Kerberos TGT
kinit -k 'shortname$'
as you can do with your user password:
kinit user@REALM Password for user@REALM
This TGT has a lifetime and it might have a renewal time as well:
# klist Ticket cache: KCM:0:69840 Default principal: Administrator@CHILD.AD.VM
Valid starting Expires Service principal 09/06/2021 09:39:28 09/06/2021 19:39:28 krbtgt/CHILD.AD.VM@CHILD.AD.VM renew until 09/07/2021 09:39:24
In the example above the TGT will expire at '09/06/2021 19:39:28' but can be renewed until '09/07/2021 09:39:24'. This means that if you call
kinit -R
before '09/06/2021 19:39:28' you will get a fresh TGT without entering your password. The new TGT will have a new lifetime but 'renew until' will stay the same. After '09/07/2021 09:39:24' 'kinit -R' will not work anymore and you have to enter your password again. It does not matter here if the TGT was originally requested with a keytab with 'kinit -k' or with plain 'kinit' and a password.
However, since the keytab is present in the file system calling
kinit -k 'shortname$'
will always get a fresh TGT without manual intervention. So in case you have a valid keytab this is even more flexible than 'kinit -R'
HTH
bye, Sumit
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On 9/6/21 4:49 AM, Sumit Bose wrote:
Am Thu, Sep 02, 2021 at 10:02:54AM -0500 schrieb Patrick Goetz:
On 9/2/21 12:49 AM, Sumit Bose wrote:
The reason is that 'kinit -k' constructs the principal by calling gethostname() or similar, adding the 'host/' prefix and the realm. But by default this principal in AD is only a service principal can cannot be used to request a TGT as kinit does. AD only allows user principals for request a TGT and this is by default 'SHORT$@AD.REALM'. If the userPrincipalName attribute is set, this principal given here is allowed as well.
This raises a couple of questions. Because of AD's flat address space, we use a host naming convention in AD as a sort of low rent namespacing; so, for example, for this host the college is cns and the research group cryo, so the AD hostname is cns-cryo-ross1$
However,
# hostname rossmann.biosci.utexas.edu
which is easier for the users to remember for ssh purposes. We set
ad_hostname = cns-cryo-ross1.austin.utexas.edu
in /etc/sssd/sssd.conf.
But I just checked, and kinit does not use ad_hostname, so I have to run it as
kinit -k -R cns-cryo-ross1$
The question is, then what does use the ad_hostname key/value pair?
Next, the kinit example provided by Spike was `kinit -k` -- we always run `kinit -k -R`
-R renews the TGT, which is what I thought is the thing set to expire in AD that needs to be periodically renewed. What's the purpose of running `kinit -k` without the -R?
Hi,
there are two different things.
First, there are the host keys in the keytab which are equivalent to a user password. Those keys are renewed by 'adcli update' if they are older then 30 days, similar as you would renew you user password if the AD DC tells you to do it.
Second, with those keys you can request a Kerberos TGT
kinit -k 'shortname$'
I thought, based on the kinit man page, that the -k flag is just an ordinary ticket request and that you need to add the -R flag to request a TGT. What you're saying is it also renews the TGT?
OTOH I thought `kinit -k` was updating the computer account password on the domain controller, but that doesn't seem to be the case, in which case I'm not even sure what the purpose of an ordinary (non-TGT) ticket is if you're not requesting automatic login to some specifically requested service.
Also, just to make sure I'm clear on this, the "renew until" doesn't change because this is based on the computer account password expiration, and further that sssd runs `adcli update` for you periodically? How often, by the way?
as you can do with your user password:
kinit user@REALM Password for user@REALMThis TGT has a lifetime and it might have a renewal time as well:
# klist Ticket cache: KCM:0:69840 Default principal: Administrator@CHILD.AD.VM
Valid starting Expires Service principal 09/06/2021 09:39:28 09/06/2021 19:39:28 krbtgt/CHILD.AD.VM@CHILD.AD.VM renew until 09/07/2021 09:39:24
In the example above the TGT will expire at '09/06/2021 19:39:28' but can be renewed until '09/07/2021 09:39:24'. This means that if you call
kinit -Rbefore '09/06/2021 19:39:28' you will get a fresh TGT without entering your password. The new TGT will have a new lifetime but 'renew until' will stay the same. After '09/07/2021 09:39:24' 'kinit -R' will not work anymore and you have to enter your password again. It does not matter here if the TGT was originally requested with a keytab with 'kinit -k' or with plain 'kinit' and a password.
However, since the keytab is present in the file system calling
kinit -k 'shortname$'will always get a fresh TGT without manual intervention. So in case you have a valid keytab this is even more flexible than 'kinit -R'
HTH
bye, Sumit
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
This message is from an external sender. Learn more about why this << matters at https://links.utexas.edu/rtyclf. <<
Patrick,
kinit -k acquires a new fresh TGT ticket.
kinit -R renews an existing TGT ticket (if it's not already expired). Even if renewed, "renew until" doesn't change (usually 7 days).
None of these are updating any computer account password on AD. That's an AD-specific requirement, that machines update their machine account passwords every 40 days or be locked out.
sssd wakes up every 24 hrs by default (controlled by ad_machine_account_password_renewal_opts). It checks to see if machine account password is older than ad_maximum_machine_account_password_age (default 30 days). If it's < 30 days, sssd do nothing. If 31 days or greater, it calls adcli update with various flags. to update the machine account password.
Spike
On Tue, Sep 7, 2021 at 9:53 AM Patrick Goetz pgoetz@math.utexas.edu wrote:
On 9/6/21 4:49 AM, Sumit Bose wrote:
Am Thu, Sep 02, 2021 at 10:02:54AM -0500 schrieb Patrick Goetz:
On 9/2/21 12:49 AM, Sumit Bose wrote:
The reason is that 'kinit -k' constructs the principal by calling gethostname() or similar, adding the 'host/' prefix and the realm. But by default this principal in AD is only a service principal can cannot be used to request a TGT as kinit does. AD only allows user principals for request a TGT and this is by default 'SHORT$@AD.REALM'. If the userPrincipalName attribute is set, this principal given here is
allowed
as well.
This raises a couple of questions. Because of AD's flat address space,
we
use a host naming convention in AD as a sort of low rent namespacing;
so,
for example, for this host the college is cns and the research group
cryo,
so the AD hostname is cns-cryo-ross1$
However,
# hostname rossmann.biosci.utexas.edu
which is easier for the users to remember for ssh purposes. We set
ad_hostname = cns-cryo-ross1.austin.utexas.edu
in /etc/sssd/sssd.conf.
But I just checked, and kinit does not use ad_hostname, so I have to
run it
as
kinit -k -R cns-cryo-ross1$
The question is, then what does use the ad_hostname key/value pair?
Next, the kinit example provided by Spike was `kinit -k` -- we always
run
`kinit -k -R`
-R renews the TGT, which is what I thought is the thing set to expire
in AD
that needs to be periodically renewed. What's the purpose of running
`kinit
-k` without the -R?
Hi,
there are two different things.
First, there are the host keys in the keytab which are equivalent to a user password. Those keys are renewed by 'adcli update' if they are older then 30 days, similar as you would renew you user password if the AD DC tells you to do it.
Second, with those keys you can request a Kerberos TGT
kinit -k 'shortname$'I thought, based on the kinit man page, that the -k flag is just an ordinary ticket request and that you need to add the -R flag to request a TGT. What you're saying is it also renews the TGT?
OTOH I thought `kinit -k` was updating the computer account password on the domain controller, but that doesn't seem to be the case, in which case I'm not even sure what the purpose of an ordinary (non-TGT) ticket is if you're not requesting automatic login to some specifically requested service.
Also, just to make sure I'm clear on this, the "renew until" doesn't change because this is based on the computer account password expiration, and further that sssd runs `adcli update` for you periodically? How often, by the way?
as you can do with your user password:
kinit user@REALM Password for user@REALMThis TGT has a lifetime and it might have a renewal time as well:
# klist Ticket cache: KCM:0:69840 Default principal: Administrator@CHILD.AD.VM
Valid starting Expires Service principal 09/06/2021 09:39:28 09/06/2021 19:39:28 krbtgt/CHILD.AD.VM@CHILD.AD.VM renew until 09/07/2021 09:39:24
In the example above the TGT will expire at '09/06/2021 19:39:28' but can be renewed until '09/07/2021 09:39:24'. This means that if you call
kinit -Rbefore '09/06/2021 19:39:28' you will get a fresh TGT without entering your password. The new TGT will have a new lifetime but 'renew until' will stay the same. After '09/07/2021 09:39:24' 'kinit -R' will not work anymore and you have to enter your password again. It does not matter here if the TGT was originally requested with a keytab with 'kinit -k' or with plain 'kinit' and a password.
However, since the keytab is present in the file system calling
kinit -k 'shortname$'will always get a fresh TGT without manual intervention. So in case you have a valid keytab this is even more flexible than 'kinit -R'
HTH
bye, Sumit
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure
This message is from an external sender. Learn more about why this << matters at https://links.utexas.edu/rtyclf. <<
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Hi Spike -
Thanks for this clarification. Now I'm wondering what happens when you run `kinit -k -R HOST$` which was our previous cron job solution to linux machines losing their AD connections. Switching to
`msktutil --update --computer-name HOST`
Modulo my ongoing confusion about when you need to include the $ in the host name and when you don't, this seems to have mostly resolved that issue. Since msktutil --update is updating the password, one would think we shouldn't need this, given the every 24-hour adcli thing, but prior to this all the Ubuntu hosts would routinely drop out of AD every month, so it's still not clear what is going on. Based on a previous examination of the log files, the password was expiring, so maybe adcli can't run for some reason? Maybe the problem is we're using an older version of sssd? Even on the Ubuntu 20.04 machines the sssd version is 2.2.3.
All that said, the kinit man page is awfully confusing and could use a rewrite. The -R flag explicitly mentions renewing a TGT, while the -k flag just talks about renewing a ticket. Tickets and TGT's are not the same thing in Kerberos.
-------------------------------------- -R requests renewal of the ticket-granting ticket. Note that an expired ticket cannot be renewed, even if the ticket is still within its renew‐ able life.
Note that renewable tickets that have expired as reported by klist(1) may sometimes be renewed using this option, because the KDC applies a grace period to account for client-KDC clock skew. See krb5.conf(5) clockskew setting.
-k [-i | -t keytab_file] requests a ticket, obtained from a key in the local host's keytab. The location of the keytab may be specified with the -t keytab_file option, or with the -i option to specify the use of the default client keytab; otherwise the default keytab will be used. By default, a host ticket for the local host is requested, but any principal may be specified. On a KDC, the special keytab location KDB: can be used to indicate that kinit should open the KDC database and look up the key directly. This permits an administrator to obtain tickets as any principal that supports authen‐ tication based on the key. --------------------------------------
On 9/7/21 1:40 PM, Spike White wrote:
Patrick,
kinit -k acquires a new fresh TGT ticket.
kinit -R renews an existing TGT ticket (if it's not already expired). Even if renewed, "renew until" doesn't change (usually 7 days).
None of these are updating any computer account password on AD. That's an AD-specific requirement, that machines update their machine account passwords every 40 days or be locked out.
sssd wakes up every 24 hrs by default (controlled by ad_machine_account_password_renewal_opts). It checks to see if machine account password is older than ad_maximum_machine_account_password_age (default 30 days). If it's < 30 days, sssd do nothing. If 31 days or greater, it calls adcli update with various flags. to update the machine account password.
Spike
On Tue, Sep 7, 2021 at 9:53 AM Patrick Goetz <pgoetz@math.utexas.edu mailto:pgoetz@math.utexas.edu> wrote:
On 9/6/21 4:49 AM, Sumit Bose wrote: > Am Thu, Sep 02, 2021 at 10:02:54AM -0500 schrieb Patrick Goetz: >> >> On 9/2/21 12:49 AM, Sumit Bose wrote: >>> The reason is that 'kinit -k' constructs the principal by calling >>> gethostname() or similar, adding the 'host/' prefix and the realm. But >>> by default this principal in AD is only a service principal can cannot >>> be used to request a TGT as kinit does. AD only allows user principals >>> for request a TGT and this is by default 'SHORT$@AD.REALM'. If the >>> userPrincipalName attribute is set, this principal given here is allowed >>> as well. >>> >> >> This raises a couple of questions. Because of AD's flat address space, we >> use a host naming convention in AD as a sort of low rent namespacing; so, >> for example, for this host the college is cns and the research group cryo, >> so the AD hostname is cns-cryo-ross1$ >> >> However, >> >> # hostname >> rossmann.biosci.utexas.edu <http://rossmann.biosci.utexas.edu> >> >> >> which is easier for the users to remember for ssh purposes. We set >> >> ad_hostname = cns-cryo-ross1.austin.utexas.edu <http://cns-cryo-ross1.austin.utexas.edu> >> >> in /etc/sssd/sssd.conf. >> >> But I just checked, and kinit does not use ad_hostname, so I have to run it >> as >> >> kinit -k -R cns-cryo-ross1$ >> >> The question is, then what does use the ad_hostname key/value pair? >> >> Next, the kinit example provided by Spike was `kinit -k` -- we always run >> `kinit -k -R` >> >> -R renews the TGT, which is what I thought is the thing set to expire in AD >> that needs to be periodically renewed. What's the purpose of running `kinit >> -k` without the -R? > > Hi, > > there are two different things. > > First, there are the host keys in the keytab which are equivalent to a > user password. Those keys are renewed by 'adcli update' if they are > older then 30 days, similar as you would renew you user password if the > AD DC tells you to do it. > > Second, with those keys you can request a Kerberos TGT > > kinit -k 'shortname$' > I thought, based on the kinit man page, that the -k flag is just an ordinary ticket request and that you need to add the -R flag to request a TGT. What you're saying is it also renews the TGT? OTOH I thought `kinit -k` was updating the computer account password on the domain controller, but that doesn't seem to be the case, in which case I'm not even sure what the purpose of an ordinary (non-TGT) ticket is if you're not requesting automatic login to some specifically requested service. Also, just to make sure I'm clear on this, the "renew until" doesn't change because this is based on the computer account password expiration, and further that sssd runs `adcli update` for you periodically? How often, by the way? > as you can do with your user password: > > kinit user@REALM > Password for user@REALM > > This TGT has a lifetime and it might have a renewal time as well: > > # klist > Ticket cache: KCM:0:69840 > Default principal: Administrator@CHILD.AD.VM > > Valid starting Expires Service principal > 09/06/2021 09:39:28 09/06/2021 19:39:28 krbtgt/CHILD.AD.VM@CHILD.AD.VM > renew until 09/07/2021 09:39:24 > > > In the example above the TGT will expire at '09/06/2021 19:39:28' but > can be renewed until '09/07/2021 09:39:24'. This means that if you call > > kinit -R > > before '09/06/2021 19:39:28' you will get a fresh TGT without entering > your password. The new TGT will have a new lifetime but 'renew until' > will stay the same. After '09/07/2021 09:39:24' 'kinit -R' will not work > anymore and you have to enter your password again. It does not matter > here if the TGT was originally requested with a keytab with 'kinit -k' > or with plain 'kinit' and a password. > > However, since the keytab is present in the file system calling > > kinit -k 'shortname$' > > will always get a fresh TGT without manual intervention. So in case you > have a valid keytab this is even more flexible than 'kinit -R' > > HTH > > bye, > Sumit > >> >> _______________________________________________ >> sssd-users mailing list -- sssd-users@lists.fedorahosted.org <mailto:sssd-users@lists.fedorahosted.org> >> To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org <mailto:sssd-users-leave@lists.fedorahosted.org> >> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ <https://docs.fedoraproject.org/en-US/project/code-of-conduct/> >> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines <https://fedoraproject.org/wiki/Mailing_list_guidelines> >> List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org <https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org> >> Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure <https://pagure.io/fedora-infrastructure> > _______________________________________________ > sssd-users mailing list -- sssd-users@lists.fedorahosted.org <mailto:sssd-users@lists.fedorahosted.org> > To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org <mailto:sssd-users-leave@lists.fedorahosted.org> > Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ <https://docs.fedoraproject.org/en-US/project/code-of-conduct/> > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines <https://fedoraproject.org/wiki/Mailing_list_guidelines> > List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org <https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org> > Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure <https://pagure.io/fedora-infrastructure> >>> This message is from an external sender. Learn more about why this << >>> matters at https://links.utexas.edu/rtyclf <https://links.utexas.edu/rtyclf>. << _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org <mailto:sssd-users@lists.fedorahosted.org> To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org <mailto:sssd-users-leave@lists.fedorahosted.org> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ <https://docs.fedoraproject.org/en-US/project/code-of-conduct/> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines <https://fedoraproject.org/wiki/Mailing_list_guidelines> List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org <https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org> Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure <https://pagure.io/fedora-infrastructure>
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
This message is from an external sender. Learn more about why this << matters at https://links.utexas.edu/rtyclf. <<
Like Spike said, there are two separate but related things here. The Kerberos portion and the AD account password portion.
The $ at the end of the host name is for AD. <short hostname>$ is the actual name of the account in AD. The Kerberos utilities are just asking the KDC to renew tickets for accounts. Computer accounts in AD happen to have a $ appended to them under the covers. They are obfuscated from most human views. Msktutil may be appending the $ under its covers, you'd have to examine the source to know, but it is likely since the actual account name in AD has the $ and would likely return the 'not found in the database' error message without it.
A valid Kerberos identity and ticket tells AD that this entity is who they say they are. This allows all of the other domain related activities happen, changing the account password, updating dynamic DNS, etc. those activities are done through using Kerberos to validate that that context can do those things.
The AD account password rotation is separate from that and is, by default, required to rotate every 30 days. (you can turn it off if you want to, in AD)
In the 1.x days of SSSD and the I think .7 days of adcli this wasn't handled super great, Kerberos tickets would expire and then AD couldn't validate the computer was who it said it was, so AD denied the right to change the account password, and viola, computer falls off domain.
Kinit -k -R <shorthostname>$ as root back then kept the Kerberos ticket for the computer account valid and didn't let it expire, there by allowing whatever other process that was changing the computer account password to successfully complete.
It sounds now though that some of these things may be incorporated into SSSD 2.x now. SSSD wakes up to check the computer account password, and it might wake up to check the Kerberos ticket validity too with the right option, negating the need for a separate cron job to do either now. I haven't kept up with it enough to know off the top of my head though.
Todd
-----Original Message----- From: Patrick Goetz pgoetz@math.utexas.edu Sent: Wednesday, September 8, 2021 11:05 AM To: sssd-users@lists.fedorahosted.org Subject: [SSSD-users]Re: Trouble-shooting sssd’s ‘Automatic Kerberos Host Keytab Renewal’ with AD back-end….
Hi Spike -
Thanks for this clarification. Now I'm wondering what happens when you run `kinit -k -R HOST$` which was our previous cron job solution to linux machines losing their AD connections. Switching to
`msktutil --update --computer-name HOST`
Modulo my ongoing confusion about when you need to include the $ in the host name and when you don't, this seems to have mostly resolved that issue. Since msktutil --update is updating the password, one would think we shouldn't need this, given the every 24-hour adcli thing, but prior to this all the Ubuntu hosts would routinely drop out of AD every month, so it's still not clear what is going on. Based on a previous examination of the log files, the password was expiring, so maybe adcli can't run for some reason? Maybe the problem is we're using an older version of sssd? Even on the Ubuntu 20.04 machines the sssd version is 2.2.3.
All that said, the kinit man page is awfully confusing and could use a rewrite. The -R flag explicitly mentions renewing a TGT, while the -k flag just talks about renewing a ticket. Tickets and TGT's are not the same thing in Kerberos.
-------------------------------------- -R requests renewal of the ticket-granting ticket. Note that an expired ticket cannot be renewed, even if the ticket is still within its renew‐ able life.
Note that renewable tickets that have expired as reported by klist(1) may sometimes be renewed using this option, because the KDC applies a grace period to account for client-KDC clock skew. See krb5.conf(5) clockskew setting.
-k [-i | -t keytab_file] requests a ticket, obtained from a key in the local host's keytab. The location of the keytab may be specified with the -t keytab_file option, or with the -i option to specify the use of the default client keytab; otherwise the default keytab will be used. By default, a host ticket for the local host is requested, but any principal may be specified. On a KDC, the special keytab location KDB: can be used to indicate that kinit should open the KDC database and look up the key directly. This permits an administrator to obtain tickets as any principal that supports authen‐ tication based on the key. --------------------------------------
On 9/7/21 1:40 PM, Spike White wrote:
Patrick,
kinit -k acquires a new fresh TGT ticket.
kinit -R renews an existing TGT ticket (if it's not already expired). Even if renewed, "renew until" doesn't change (usually 7 days).
None of these are updating any computer account password on AD. That's an AD-specific requirement, that machines update their machine account passwords every 40 days or be locked out.
sssd wakes up every 24 hrs by default (controlled by ad_machine_account_password_renewal_opts). It checks to see if machine account password is older than ad_maximum_machine_account_password_age (default 30 days). If it's < 30 days, sssd do nothing. If 31 days or greater, it calls adcli update with various flags. to update the machine account password.
Spike
On Tue, Sep 7, 2021 at 9:53 AM Patrick Goetz <pgoetz@math.utexas.edu mailto:pgoetz@math.utexas.edu> wrote:
On 9/6/21 4:49 AM, Sumit Bose wrote: > Am Thu, Sep 02, 2021 at 10:02:54AM -0500 schrieb Patrick Goetz: >> >> On 9/2/21 12:49 AM, Sumit Bose wrote: >>> The reason is that 'kinit -k' constructs the principal by calling >>> gethostname() or similar, adding the 'host/' prefix and the realm. But >>> by default this principal in AD is only a service principal can cannot >>> be used to request a TGT as kinit does. AD only allows user principals >>> for request a TGT and this is by default 'SHORT$@AD.REALM'. If the >>> userPrincipalName attribute is set, this principal given here is allowed >>> as well. >>> >> >> This raises a couple of questions. Because of AD's flat address space, we >> use a host naming convention in AD as a sort of low rent namespacing; so, >> for example, for this host the college is cns and the research group cryo, >> so the AD hostname is cns-cryo-ross1$ >> >> However, >> >> # hostname >> rossmann.biosci.utexas.edu <http://rossmann.biosci.utexas.edu> >> >> >> which is easier for the users to remember for ssh purposes. We set >> >> ad_hostname = cns-cryo-ross1.austin.utexas.edu <http://cns-cryo-ross1.austin.utexas.edu> >> >> in /etc/sssd/sssd.conf. >> >> But I just checked, and kinit does not use ad_hostname, so I have to run it >> as >> >> kinit -k -R cns-cryo-ross1$ >> >> The question is, then what does use the ad_hostname key/value pair? >> >> Next, the kinit example provided by Spike was `kinit -k` -- we always run >> `kinit -k -R` >> >> -R renews the TGT, which is what I thought is the thing set to expire in AD >> that needs to be periodically renewed. What's the purpose of running `kinit >> -k` without the -R? > > Hi, > > there are two different things. > > First, there are the host keys in the keytab which are equivalent to a > user password. Those keys are renewed by 'adcli update' if they are > older then 30 days, similar as you would renew you user password if the > AD DC tells you to do it. > > Second, with those keys you can request a Kerberos TGT > > kinit -k 'shortname$' > I thought, based on the kinit man page, that the -k flag is just an ordinary ticket request and that you need to add the -R flag to request a TGT. What you're saying is it also renews the TGT? OTOH I thought `kinit -k` was updating the computer account password on the domain controller, but that doesn't seem to be the case, in which case I'm not even sure what the purpose of an ordinary (non-TGT) ticket is if you're not requesting automatic login to some specifically requested service. Also, just to make sure I'm clear on this, the "renew until" doesn't change because this is based on the computer account password expiration, and further that sssd runs `adcli update` for you periodically? How often, by the way? > as you can do with your user password: > > kinit user@REALM > Password for user@REALM > > This TGT has a lifetime and it might have a renewal time as well: > > # klist > Ticket cache: KCM:0:69840 > Default principal: Administrator@CHILD.AD.VM > > Valid starting Expires Service principal > 09/06/2021 09:39:28 09/06/2021 19:39:28 krbtgt/CHILD.AD.VM@CHILD.AD.VM > renew until 09/07/2021 09:39:24 > > > In the example above the TGT will expire at '09/06/2021 19:39:28' but > can be renewed until '09/07/2021 09:39:24'. This means that if you call > > kinit -R > > before '09/06/2021 19:39:28' you will get a fresh TGT without entering > your password. The new TGT will have a new lifetime but 'renew until' > will stay the same. After '09/07/2021 09:39:24' 'kinit -R' will not work > anymore and you have to enter your password again. It does not matter > here if the TGT was originally requested with a keytab with 'kinit -k' > or with plain 'kinit' and a password. > > However, since the keytab is present in the file system calling > > kinit -k 'shortname$' > > will always get a fresh TGT without manual intervention. So in case you > have a valid keytab this is even more flexible than 'kinit -R' > > HTH > > bye, > Sumit > >> >> _______________________________________________ >> sssd-users mailing list -- sssd-users@lists.fedorahosted.org <mailto:sssd-users@lists.fedorahosted.org> >> To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org <mailto:sssd-users-leave@lists.fedorahosted.org> >> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ <https://docs.fedoraproject.org/en-US/project/code-of-conduct/> >> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines <https://fedoraproject.org/wiki/Mailing_list_guidelines> >> List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org <https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org> >> Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure <https://pagure.io/fedora-infrastructure> > _______________________________________________ > sssd-users mailing list -- sssd-users@lists.fedorahosted.org <mailto:sssd-users@lists.fedorahosted.org> > To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org <mailto:sssd-users-leave@lists.fedorahosted.org> > Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ <https://docs.fedoraproject.org/en-US/project/code-of-conduct/> > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines <https://fedoraproject.org/wiki/Mailing_list_guidelines> > List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org <https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org> > Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure <https://pagure.io/fedora-infrastructure> >>> This message is from an external sender. Learn more about why this << >>> matters at https://links.utexas.edu/rtyclf <https://links.utexas.edu/rtyclf>. << _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org <mailto:sssd-users@lists.fedorahosted.org> To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org <mailto:sssd-users-leave@lists.fedorahosted.org> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ <https://docs.fedoraproject.org/en-US/project/code-of-conduct/> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines <https://fedoraproject.org/wiki/Mailing_list_guidelines> List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org <https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org> Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure <https://pagure.io/fedora-infrastructure>
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedoraho sted.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
This message is from an external sender. Learn more about why this << matters at https://links.utexas.edu/rtyclf. <<
_______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
This message is from an external sender. Learn more about why this << matters at https://links.utexas.edu/rtyclf. <<
Sumit and others,
Our level 1 server support team has identified 107 servers that dropped out of the domain in Aug. By far, that's their biggest burden with sssd -- the automatic machine account renewal.
Over the long weekend, our team ran a report that identified any pingable candidates that (according to AD) had a passwordLastSet age between 31 and 40 days. These would be our interesting candidates; candidates > 40 days would not be of interest to us because AD would have locked the account.
We identified 13 candidates today. From our various research, so far we have determined 8 categories of such sssd "automatic machine account renewal" failure.
1. Some SE cloned VM and renamed hostname, IP address, rejoined AD. Old <HOSTNAME>$ entries early in /etc/krb5.keytab file and adcli update grabs first entry in /etc/krb5.keytab with $ at end of it.
2. CPU spiked to 100% for 30 days.
3. Polkit service not running.
4. msDS-KeyVersionNumber in AD set to one more than KVNO in local /etc/krb5.keytab file. passwordLastSet Set to 30 days past last timestamp in local /etc/krb5.keytab file. IOW, sssd called adcli update after 30 days. Adcli update updated AD, not local /etc/krb5.keytab file.
5. DNS firewall problems. Specifically, DNS TCP port 53 blocked, so adcli update could not find Kerberos servers (_kerberos._ tcp.AMER.COMPANY.COM) or LDAP servers (_ldap._tcp.AMER.COMPANY.COM).
6. SELinux enabled; adcli not allowed to update /etc/krb5.keytab file (from Sumit).
7. Time sync too far out for adcli update to successfully do an update on machine account.
8. /var filesystem: Input/Output errors.
By far, today the most common category is #4. It amounted to 9 of the 13 candidates today. Category #7 was another 2 candidates today.
So by far, it's category #4 we want to drill down into -- if we can eliminate that, we've strongly decreased the sssd burden. Also, we think we can put pro-active monitoring in place for category #3 and #7.
Our old commerical AD integration product didn't depend on polkit/dbus. So categories #3 and #4 are new for sssd. If we can understand #4 and proactively monitor for #3, we can reduce the sssd burden to that of the former product.
Category #4 appears to occur randomly -- no rhyme or reason. Also we have not found any repeat offenders -- so it's very hard to track down.
We plan to turn on sssd debug_level 7 (on that one sssd [domain/xxx] stanza only). Debug level 7 is min level to get verbose output from adcli update. We know that turning on debug level 9 on all sssd stanzas (nss, pam, ifp, [domain/xxx]) fills /var/log filesystem to 100% in a few days.
Spike
On Tue, Sep 7, 2021 at 9:53 AM Patrick Goetz pgoetz@math.utexas.edu wrote:
On 9/6/21 4:49 AM, Sumit Bose wrote:
Am Thu, Sep 02, 2021 at 10:02:54AM -0500 schrieb Patrick Goetz:
On 9/2/21 12:49 AM, Sumit Bose wrote:
The reason is that 'kinit -k' constructs the principal by calling gethostname() or similar, adding the 'host/' prefix and the realm. But by default this principal in AD is only a service principal can cannot be used to request a TGT as kinit does. AD only allows user principals for request a TGT and this is by default 'SHORT$@AD.REALM'. If the userPrincipalName attribute is set, this principal given here is
allowed
as well.
This raises a couple of questions. Because of AD's flat address space,
we
use a host naming convention in AD as a sort of low rent namespacing;
so,
for example, for this host the college is cns and the research group
cryo,
so the AD hostname is cns-cryo-ross1$
However,
# hostname rossmann.biosci.utexas.edu
which is easier for the users to remember for ssh purposes. We set
ad_hostname = cns-cryo-ross1.austin.utexas.edu
in /etc/sssd/sssd.conf.
But I just checked, and kinit does not use ad_hostname, so I have to
run it
as
kinit -k -R cns-cryo-ross1$
The question is, then what does use the ad_hostname key/value pair?
Next, the kinit example provided by Spike was `kinit -k` -- we always
run
`kinit -k -R`
-R renews the TGT, which is what I thought is the thing set to expire
in AD
that needs to be periodically renewed. What's the purpose of running
`kinit
-k` without the -R?
Hi,
there are two different things.
First, there are the host keys in the keytab which are equivalent to a user password. Those keys are renewed by 'adcli update' if they are older then 30 days, similar as you would renew you user password if the AD DC tells you to do it.
Second, with those keys you can request a Kerberos TGT
kinit -k 'shortname$'I thought, based on the kinit man page, that the -k flag is just an ordinary ticket request and that you need to add the -R flag to request a TGT. What you're saying is it also renews the TGT?
OTOH I thought `kinit -k` was updating the computer account password on the domain controller, but that doesn't seem to be the case, in which case I'm not even sure what the purpose of an ordinary (non-TGT) ticket is if you're not requesting automatic login to some specifically requested service.
Also, just to make sure I'm clear on this, the "renew until" doesn't change because this is based on the computer account password expiration, and further that sssd runs `adcli update` for you periodically? How often, by the way?
as you can do with your user password:
kinit user@REALM Password for user@REALMThis TGT has a lifetime and it might have a renewal time as well:
# klist Ticket cache: KCM:0:69840 Default principal: Administrator@CHILD.AD.VM
Valid starting Expires Service principal 09/06/2021 09:39:28 09/06/2021 19:39:28 krbtgt/CHILD.AD.VM@CHILD.AD.VM renew until 09/07/2021 09:39:24
In the example above the TGT will expire at '09/06/2021 19:39:28' but can be renewed until '09/07/2021 09:39:24'. This means that if you call
kinit -Rbefore '09/06/2021 19:39:28' you will get a fresh TGT without entering your password. The new TGT will have a new lifetime but 'renew until' will stay the same. After '09/07/2021 09:39:24' 'kinit -R' will not work anymore and you have to enter your password again. It does not matter here if the TGT was originally requested with a keytab with 'kinit -k' or with plain 'kinit' and a password.
However, since the keytab is present in the file system calling
kinit -k 'shortname$'will always get a fresh TGT without manual intervention. So in case you have a valid keytab this is even more flexible than 'kinit -R'
HTH
bye, Sumit
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure
This message is from an external sender. Learn more about why this << matters at https://links.utexas.edu/rtyclf. <<
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Am Tue, Sep 07, 2021 at 01:59:34PM -0500 schrieb Spike White:
Sumit and others,
Our level 1 server support team has identified 107 servers that dropped out of the domain in Aug. By far, that's their biggest burden with sssd -- the automatic machine account renewal.
Over the long weekend, our team ran a report that identified any pingable candidates that (according to AD) had a passwordLastSet age between 31 and 40 days. These would be our interesting candidates; candidates > 40 days would not be of interest to us because AD would have locked the account.
We identified 13 candidates today. From our various research, so far we have determined 8 categories of such sssd "automatic machine account renewal" failure.
Some SE cloned VM and renamed hostname, IP address, rejoined AD.Old <HOSTNAME>$ entries early in /etc/krb5.keytab file and adcli update grabs first entry in /etc/krb5.keytab with $ at end of it.
CPU spiked to 100% for 30 days.Polkit service not running.
Hi,
adcli does not use polkit or DBus. realmd is using polkit to make it possible for non-root user to join a domain but not adcli. So I would expect that there should be a different reason on those systems.
msDS-KeyVersionNumber in AD set to one more than KVNO in local/etc/krb5.keytab file. passwordLastSet Set to 30 days past last timestamp in local /etc/krb5.keytab file. IOW, sssd called adcli update after 30 days. Adcli update updated AD, not local /etc/krb5.keytab file.
DNS firewall problems. Specifically, DNS TCP port 53 blocked, soadcli update could not find Kerberos servers (_kerberos._ tcp.AMER.COMPANY.COM) or LDAP servers (_ldap._tcp.AMER.COMPANY.COM).
Are you using hardcoded server names in sssd.conf in this case? Because otherwise SSSD would have issues as well. Additionally SSSD should use adcli's --domain-controller option with the current AD DC SSSD is talking to.
SELinux enabled; adcli not allowed to update /etc/krb5.keytabfile (from Sumit).
This should be fixed by updating the selinux-policy.
Time sync too far out for adcli update to successfully do anupdate on machine account.
Would it be possible to run ntpd or chrony?
/var filesystem: Input/Output errors.By far, today the most common category is #4. It amounted to 9 of the 13 candidates today. Category #7 was another 2 candidates today.
So by far, it's category #4 we want to drill down into -- if we can eliminate that, we've strongly decreased the sssd burden. Also, we think we can put pro-active monitoring in place for category #3 and #7.
Our old commerical AD integration product didn't depend on polkit/dbus. So categories #3 and #4 are new for sssd. If we can understand #4 and proactively monitor for #3, we can reduce the sssd burden to that of the former product.
Category #4 appears to occur randomly -- no rhyme or reason. Also we have not found any repeat offenders -- so it's very hard to track down.
So far I'm aware of two reasons for this. One is that AD returns a temporary error which confuses libkrb5 on the client so that adcli thinks the renewal failed but in the end it was successful on AD so that there are a renewal in AD but not on the client. This issue lead to the fix in libkrb5 to always use TCP for kpasswd operations.
The second are local permissions which didn't allow adcli called by SSSD to update the keytab file. This might be triggered by SELinux (#6). Another reason might be that SSSD is not running as root.
But permissions would make the issue appear again on the same host, so I guess this might not be the reason in your case.
So it might still be some unexpected reply from AD which should not be treated as an error. Do you by chance have read-only domain controllers (RODCs)?
bye, Sumit
We plan to turn on sssd debug_level 7 (on that one sssd [domain/xxx] stanza only). Debug level 7 is min level to get verbose output from adcli update. We know that turning on debug level 9 on all sssd stanzas (nss, pam, ifp, [domain/xxx]) fills /var/log filesystem to 100% in a few days.
Spike
On Tue, Sep 7, 2021 at 9:53 AM Patrick Goetz pgoetz@math.utexas.edu wrote:
On 9/6/21 4:49 AM, Sumit Bose wrote:
Am Thu, Sep 02, 2021 at 10:02:54AM -0500 schrieb Patrick Goetz:
On 9/2/21 12:49 AM, Sumit Bose wrote:
The reason is that 'kinit -k' constructs the principal by calling gethostname() or similar, adding the 'host/' prefix and the realm. But by default this principal in AD is only a service principal can cannot be used to request a TGT as kinit does. AD only allows user principals for request a TGT and this is by default 'SHORT$@AD.REALM'. If the userPrincipalName attribute is set, this principal given here is
allowed
as well.
This raises a couple of questions. Because of AD's flat address space,
we
use a host naming convention in AD as a sort of low rent namespacing;
so,
for example, for this host the college is cns and the research group
cryo,
so the AD hostname is cns-cryo-ross1$
However,
# hostname rossmann.biosci.utexas.edu
which is easier for the users to remember for ssh purposes. We set
ad_hostname = cns-cryo-ross1.austin.utexas.edu
in /etc/sssd/sssd.conf.
But I just checked, and kinit does not use ad_hostname, so I have to
run it
as
kinit -k -R cns-cryo-ross1$
The question is, then what does use the ad_hostname key/value pair?
Next, the kinit example provided by Spike was `kinit -k` -- we always
run
`kinit -k -R`
-R renews the TGT, which is what I thought is the thing set to expire
in AD
that needs to be periodically renewed. What's the purpose of running
`kinit
-k` without the -R?
Hi,
there are two different things.
First, there are the host keys in the keytab which are equivalent to a user password. Those keys are renewed by 'adcli update' if they are older then 30 days, similar as you would renew you user password if the AD DC tells you to do it.
Second, with those keys you can request a Kerberos TGT
kinit -k 'shortname$'I thought, based on the kinit man page, that the -k flag is just an ordinary ticket request and that you need to add the -R flag to request a TGT. What you're saying is it also renews the TGT?
OTOH I thought `kinit -k` was updating the computer account password on the domain controller, but that doesn't seem to be the case, in which case I'm not even sure what the purpose of an ordinary (non-TGT) ticket is if you're not requesting automatic login to some specifically requested service.
Also, just to make sure I'm clear on this, the "renew until" doesn't change because this is based on the computer account password expiration, and further that sssd runs `adcli update` for you periodically? How often, by the way?
as you can do with your user password:
kinit user@REALM Password for user@REALMThis TGT has a lifetime and it might have a renewal time as well:
# klist Ticket cache: KCM:0:69840 Default principal: Administrator@CHILD.AD.VM
Valid starting Expires Service principal 09/06/2021 09:39:28 09/06/2021 19:39:28 krbtgt/CHILD.AD.VM@CHILD.AD.VM renew until 09/07/2021 09:39:24
In the example above the TGT will expire at '09/06/2021 19:39:28' but can be renewed until '09/07/2021 09:39:24'. This means that if you call
kinit -Rbefore '09/06/2021 19:39:28' you will get a fresh TGT without entering your password. The new TGT will have a new lifetime but 'renew until' will stay the same. After '09/07/2021 09:39:24' 'kinit -R' will not work anymore and you have to enter your password again. It does not matter here if the TGT was originally requested with a keytab with 'kinit -k' or with plain 'kinit' and a password.
However, since the keytab is present in the file system calling
kinit -k 'shortname$'will always get a fresh TGT without manual intervention. So in case you have a valid keytab this is even more flexible than 'kinit -R'
HTH
bye, Sumit
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure
This message is from an external sender. Learn more about why this << matters at https://links.utexas.edu/rtyclf. <<
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
OK,
That particular candidate seems like a very unusual end corner case. Where someone cloned an existing VM, renamed it, re-IP'd and (incorrectly) re-joined it to AD.
I saw "incorrectly", because they did not clear the existing /etc/krb5.keytab file prior to the re-join. Hence, the old bogus entries in /etc/krb5.keytab with the original hostname -- which confused adcli.
BTW, Sumit -- I completely understand what you're saying about AD and TGTs. You can request TGT tickets only off your UPN. So our UPN is always 'host/fqdn@REALM'. in AD, you cannot request TGT tickets off your SPNs. In other Kerberos implementations, they allow you to request TGT tickets off SPNs -- but not AD. AD does not allow that. So I completely get the difference between kinit -k and adcli.
I think a better command-line test for us to troubleshoot sssd would be request a Kerberos service ticket, not a Kerberos TGT ticket. But I don't know how to do that on the command line (kinit only requests TGT tickets and renews tickets). We use 'adcli testjoin' to simulate sssd Kerberos initialization.
Anyway, back to our research. We have now found 8 candidates that has AD attribute 'passwordLastSet' between 31 days and 40 days. (Actually, two are at 40 days, so probably AD has locked those machine accounts.)
On this first candidate, we think it's another one-off for that particular server. CPU pegged at 100% since Aug 9. We rebooted, set debug_level 9 and it appears to have renewed this first attempt (based on timestamps). We see this output in the sssd_amer.company.com.log
(2021-09-01 14:44:36): [be[amer.company.com]] [ad_machine_account_password_renewal_done] (0x1000): --- adcli output start--- ---adcli output end--- (2021-09-01 14:44:36): [be[amer.company.com]] [be_ptask_done] (0x0400): Task [AD machine account password renewal]: finished successfully (2021-09-01 14:44:36): [be[amer.company.com]] [be_ptask_schedule] (0x0400): Task [AD machine account password renewal]: scheduling task 86400 seconds from last execution time [1630608275]
We saw a second update attempt later (today?) with a lot of adcli output, but it said:
...--- adcli output start--- ... * Password not too old, no change needed ... ---adcli output end---
So we suspect this candidate is yet another edge-corner case (CPU at 100% -- not able to adcli update).
Looking at the next couple of candidates, these are a more interesting (& seemingly more common) case. They updated their entry in AD, but the local /etc/krb5.keytab file was not updated. These happen to be OL7 servers (but we have a RHEL7 candidate at 40 days non-check-in). Because these are *L7, it's not the Kerberos UDP kpasswd problem (that's only on RHEL6/OL6).
Let's take the first one. casnlrritpgm206. According to AD, it has a 'passwordLastSet' of 7/28/2021.
PS C:\Users\spike_white> get-adcomputer casnlrritpgm206 -properties 'passwordLastSet'
DistinguishedName : CN=CASNLRRITPGM206,OU=Servers,OU=UNIX,DC=amer,DC=company,DC=com DNSHostName : casnlrritpgm206.us.company.com Enabled : True Name : CASNLRRITPGM206 ObjectClass : computer ObjectGUID : f9fa2b5b-75b6-434b-8e94-477599d1afca PasswordLastSet : 7/28/2021 10:04:49 PM SamAccountName : CASNLRRITPGM206$ SID : S-1-5-21-1802859667-647903414-1863928812-3091065 UserPrincipalName : host/casnlrritpgm206.us.company.com@AMER.COMPANY.COM
It has a 'msDS-KeyVersionNumber' of 7.
but in the local /etc/krb5.keytab file, it has a last KVNO of 6 with a timestamp of 6/28:
Keytab name: FILE:/etc/krb5.keytab
KVNO Timestamp Principal
---- ------------------- ------------------------------------------------------
6 06/28/2021 06:53:30 CASNLRRITPGM206$@AMER.COMPANY.COM (arcfour-hmac)
6 06/28/2021 06:53:30 CASNLRRITPGM206$@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96)
6 06/28/2021 06:53:30 CASNLRRITPGM206$@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96)
6 06/28/2021 06:53:30 host/ casnlrritpgm206.us.company.com@AMER.COMPANY.COM (arcfour-hmac)
6 06/28/2021 06:53:30 host/ casnlrritpgm206.us.company.com@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96)
6 06/28/2021 06:53:30 host/ casnlrritpgm206.us.company.com@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96)
6 06/28/2021 06:53:30 host/CASNLRRITPGM206@AMER.COMPANY.COM (arcfour-hmac)
6 06/28/2021 06:53:30 host/CASNLRRITPGM206@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96)
6 06/28/2021 06:53:30 host/CASNLRRITPGM206@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96)
6 06/28/2021 06:53:30 RestrictedKrbHost/CASNLRRITPGM206@AMER.COMPANY.COM (arcfour-hmac)
6 06/28/2021 06:53:31 RestrictedKrbHost/CASNLRRITPGM206@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96)
6 06/28/2021 06:53:31 RestrictedKrbHost/CASNLRRITPGM206@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96)
6 06/28/2021 06:53:31 RestrictedKrbHost/ casnlrritpgm206.us.company.com@AMER.COMPANY.COM (arcfour-hmac)
6 06/28/2021 06:53:31 RestrictedKrbHost/ casnlrritpgm206.us.company.com@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96)
6 06/28/2021 06:53:31 RestrictedKrbHost/ casnlrritpgm206.us.company.com@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96)
5 05/29/2021 02:08:37 CASNLRRITPGM206$@AMER.COMPANY.COM (arcfour-hmac)
5 05/29/2021 02:08:37 CASNLRRITPGM206$@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96)
5 05/29/2021 02:08:37 CASNLRRITPGM206$@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96)
So 7/28 is 30 days after 6/28. It appears on 7/28 that sssd invoked adcli update to update the machine account password in AD. adcli update updated it in AD, but not in the local /etc/krb5.keytab file.
The next candidate is similar. It has a KVNO in AD one more than in /etc/krb5.keytab file with a timestamp 30 days past the timestamp of the latest entry in /etc/krb5.keytab file.
This sure seems similar to the Kerberos kpasswd UDP problem. But it's not -- krb5-libs quit using UDP for kpasswd after RHEL6/OL6.
We know how to remediate when we hit such a candidate. adcli update with the valid user principal and valid login ccache of a principal that have AD privs to update these machine accounts.
So -- this is the ultimate question? *Why* is this happening? Why is the adcli update (called by sssd) updating the passwd in AD and the msDS-KeyVersionNumber, but not updating /etc/krb5.keytab?
I think this is the common case that we're seeing -- that these other cases (plus one other) are the unusual end-corner cases.
Spike
On Thu, Sep 2, 2021 at 12:49 AM Sumit Bose sbose@redhat.com wrote:
Am Wed, Sep 01, 2021 at 11:39:30AM -0500 schrieb Spike White:
So to respond to my own email, but a co-worker did finally find some references to that bizarre name ZZZKBTDURBOL8.
[root@nwpllv8bu100 post_install]# klist -kte Keytab name: FILE:/etc/krb5.keytab KVNO Timestamp Principal
2 07/13/2021 16:42:17 ZZZKBTDURBOL8$@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 ZZZKBTDURBOL8$@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 host/ zzzkbtdurbol8.amer.company.com@AMER.COMPANY.COM
(aes128-cts-hmac-sha1-96)
2 07/13/2021 16:42:17 host/ zzzkbtdurbol8.amer.company.com@AMER.COMPANY.COM
(aes256-cts-hmac-sha1-96)
2 07/13/2021 16:42:17 host/ZZZKBTDURBOL8@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 host/ZZZKBTDURBOL8@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 RestrictedKrbHost/
ZZZKBTDURBOL8@AMER.COMPANY.COM
(aes128-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 RestrictedKrbHost/
ZZZKBTDURBOL8@AMER.COMPANY.COM
(aes256-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 RestrictedKrbHost/ zzzkbtdurbol8.amer.company.com@AMER.COMPANY.COM
(aes128-cts-hmac-sha1-96)
2 07/13/2021 16:42:17 RestrictedKrbHost/ zzzkbtdurbol8.amer.company.com@AMER.COMPANY.COM
(aes256-cts-hmac-sha1-96)
3 07/29/2021 13:38:27 NWPLLV8BU100$@AMER.COMPANY.COM (DEPRECATED:arcfour-hmac) 3 07/29/2021 13:38:27 NWPLLV8BU100$@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 NWPLLV8BU100$@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 host/
nwpllv8bu100.amer.company.com@AMER.COMPANY.COM
(DEPRECATED:arcfour-hmac) 3 07/29/2021 13:38:27 host/
nwpllv8bu100.amer.company.com@AMER.COMPANY.COM
(aes128-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 host/
nwpllv8bu100.amer.company.com@AMER.COMPANY.COM
(aes256-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 host/NWPLLV8BU100@AMER.COMPANY.COM (DEPRECATED:arcfour-hmac) 3 07/29/2021 13:38:27 host/NWPLLV8BU100@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 host/NWPLLV8BU100@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 RestrictedKrbHost/NWPLLV8BU100@AMER.COMPANY.COM (DEPRECATED:arcfour-hmac) 3 07/29/2021 13:38:27 RestrictedKrbHost/NWPLLV8BU100@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 RestrictedKrbHost/NWPLLV8BU100@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 RestrictedKrbHost/ nwpllv8bu100.amer.company.com@AMER.COMPANY.COM (DEPRECATED:arcfour-hmac) 3 07/29/2021 13:38:27 RestrictedKrbHost/ nwpllv8bu100.amer.company.com@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 RestrictedKrbHost/ nwpllv8bu100.amer.company.com@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96)
I realize this is reflecting the literal entries in the /etc/krb5.keytab file. So it appears that when this VM was born (on July 13th), it was named zzzkbtdurbo18.amer.company.com. (I see other supporting evidence for this). On or before July 29th, it was renamed to final FQDN nwpllv8bu100.amer.company.com. /etc/hostname, /etc/hosts, /etc/sysconfig/network etc were all updated and it was rejoined to AD.
kinit -k works fine. It picks up the current hostname and apparently
uses
host/nwpllv8bu100.amer.company.com@AMER.COMPANY.COM as its service principal. Since there's a valid entry in /etc/krb5.keytab file for
this,
it uses this and all is good. (I'm guessing it uses the 14th or 15th /etc/krb5.keytab file entry above.)
sssd works, because it has this line:
ldap_sasl_authid = host/nwpllv8bu100.amer.company.com@AMER.COMPANY.COM
But when sssd invokes adcli update to refresh the machine account
password,
adcli update fails.
Also, I see that adcli testjoin fails.
[root@nwpllv8bu100 tmp]# adcli testjoin -D AMER.COMPANY.COM adcli: couldn't connect to AMER.COMPANY.COM domain: Couldn't
authenticate
as machine account: ZZZKBTDURBOL8: Preauthentication failed [root@nwpllv8bu100 tmp]#
From a strace of this adcli testjoin, it appears that adcli is opening
the
/etc/krb5.keytab file to determine the default service principal to use
and
is pulling the old server name. (instead of using the correct service principal, as kinit -k somehow does.)
Hi,
it looks like your environment is a bit special. I guess you have added 'host/nwpllv8bu100.amer.company.com@AMER.COMPANY.COM' to the 'userPrincipalName' LDAP attribute in the AD computer object for this host.
# klist -k Keytab name: FILE:/etc/krb5.keytab KVNO Principal
2 MASTER$@CHILD.AD.VM 2 MASTER$@CHILD.AD.VM 2 host/MASTER@CHILD.AD.VM 2 host/MASTER@CHILD.AD.VM 2 host/master.client.vm@CHILD.AD.VM 2 host/master.client.vm@CHILD.AD.VM 2 RestrictedKrbHost/MASTER@CHILD.AD.VM 2 RestrictedKrbHost/MASTER@CHILD.AD.VM 2 RestrictedKrbHost/master.client.vm@CHILD.AD.VM 2 RestrictedKrbHost/master.client.vm@CHILD.AD.VM # kdestroy -A # # # kinit -k kinit: Client 'host/master.client.vm@CHILD.AD.VM' not found in Kerberos database while getting initial credentials # kinit -k 'MASTER$@CHILD.AD.VM'
The reason is that 'kinit -k' constructs the principal by calling gethostname() or similar, adding the 'host/' prefix and the realm. But by default this principal in AD is only a service principal can cannot be used to request a TGT as kinit does. AD only allows user principals for request a TGT and this is by default 'SHORT$@AD.REALM'. If the userPrincipalName attribute is set, this principal given here is allowed as well.
That's why adcli is checking the keytab for a principal with a '$' character and by default it uses the first it finds because it is expected there is only one. Adding some heuristics in case there are more '$' principals in the keytab like highest KVNO might help in some cases but would fail in other cases so I think just using the first is good enough.
Maybe when sssd constructs this adcli update invocation, it's not passing the ldap_sasl_authid, so the adcli update is doing the above logic to
pull
the old server name?
Adding a new option to adcli and using the value from ldap_sasl_authid might be a solution, I will think about it.
HTH
bye, Sumit
Sounds like an adcli problem. Adcli should do as 'kinit -k' does when
it's
passed no explicit service principal. Should dive into /etc/krb5.keytab file and use the most recent set of entries (KVNO = 3 in above example). Maybe derive the default service principal off the current FQDN and Kerberos realm?
Spike PS As a general policy, we are not supposed to clone a VM and rename it
to
another FQDN/IP address. I'll be trying to track down who did this and
for
what reason.
On Wed, Sep 1, 2021 at 10:08 AM Spike White spikewhitetx@gmail.com
wrote:
Ok, this is *very* illuminating!
I see this in sssd_amer.company.com.log"
(2021-09-01 3:44:46): [be[amer.company.com]] [ad_machine_account_password_renewal_done] (0x1000): --- adcli output start--- adcli: couldn't connect to amer.company.com domain: Couldn't
authenticate
as machine account: ZZZKBTDURBOL8: Preauthentication failed ---adcli output end---
However, I don't find that host name ZZZKBTDURBOL8 anywhere on the system. (By company convention, servers named ZZZ* are test servers
that
linux SEs spin up themselves).
This server that's not renewing its creds is named: nwpllv8bu100.amer.company.com. it's a std dev server. in /etc/sssd/sssd.conf file, it has that as its sasl auth ID:
[root@nwpllv8bu100 sssd]# grep sasl /etc/sssd/sssd.conf ldap_sasl_authid = host/nwpllv8bu100.amer.dell.com@AMER.COMPANY.COM [root@nwpllv8bu100 sssd]#
If I do 'kinit -k', the /etc/krb5.keytab file has that name as well:
[root@nwpllv8bu100 sssd]# kinit -k [root@nwpllv8bu100 sssd]# klist Ticket cache: KCM:0 Default principal: host/nwpllv8bu100.amer.dell.com@AMER.COMPANY.COM
Valid starting Expires Service principal 09/01/2021 11:04:16 09/01/2021 21:04:16 krbtgt/ AMER.DELL.COM@AMER.COMPANY.COM renew until 09/08/2021 11:04:16 [root@nwpllv8bu100 sssd]#
I searched /etc/sssd/sssd.conf -- no "zzz" or "ZZZ" string is anywhere
in
there. So where is sssd picking up this name ZZZKBTDURBOL8 and
passing it
to adcli update?
Spike
On Wed, Sep 1, 2021 at 2:46 AM Sumit Bose sbose@redhat.com wrote:
Am Tue, Aug 31, 2021 at 09:53:01PM +0200 schrieb Alexey Tikhonov:
On Tue, Aug 31, 2021 at 6:47 PM Spike White <spikewhitetx@gmail.com
wrote:
All,
OK we have a query we run in AD for machine account passwords for
a
certain age. In today's run, 31 - 32 days. Then we verify it's
pingable.
We have found such one such suspicious candidate today (two
actually,
but
the other Linux server is quite sick). So one good research
candidate.
According to both AD and /etc/krb5.keytab file, the machine
account
password was last set on 7/29. Today is 8/31, so that would be 32
days.
This 'automatic machine account keytab renewal' background task
should
trigger again today.
sssd service was last started 2 weeks ago and, by all appearances,
appears
healthy. sssctl domain-status <domain> shows online, connected
to AD
servers (both domain and GC servers).. All logins and group
enumerations
working as expected.
Just now, we dynamically set the debug level to 9 with 'sssctl
debug-level
9'. This particular server is Oracle Linux 8.4, running sssd-*-2.4.0-9.0.1.el8_4.1.x86_64. Installed July 13th,
- So
-- very recent sssd version. (This problem occurs with both RHEL
& OL
6/7/8, it's just today's candidate happens to be OL8.)
We can't keep debug level 9 up for a great many days; it swamps
the
/var/log filesystem. But we can leave up for a few days. We
purposely did
not restart sssd server as we know that would trigger a machine
account
renewal.
Speaking of that -- from Sumit's sssd source code in ad_provider/ad_machine_pw_renewal.c, it appears that sssd is
creating
a
back-end task to call external program /usr/sbin/adcli with
certain
args.
What string can I look for in which sssd log file (now that I
have
debug
level 9 enabled) to tell me when this 'adcli update' task (aka
'automatic
machine account keytab renewal') is triggered?
It seems SSSD itself only logs in case of errors. I didn't find any explicit logs around `ad_machine_account_password_renewal_send()`. But perhaps there will be something like "[be_ptask_execute]
(0x0400):
Task
[AD machine account password renewal]: executing task" from generic be_ptask_* helpers in the sssd_$domain.log (I'm not sure).
Also at this verbosity level `--verbose` should be supplied to adcli
itself
and I guess output should be captured in sssd_$domain.log as well.
I'm
not
familiar with `adcli` internals, you can take a glance at https://gitlab.freedesktop.org/realmd/adcli to find its log
messages.
Hi,
if SSSD's debug_level is 7 or higher the '--verbose' option is set when calling adcli and the output is added to the backend logs. It
will
start with log message "--- adcli output start---".
HTH
bye, Sumit
I'm less certain now that we've surveyed our env that this
background
'adcli update' task is the reason behind 70 - 80 servers / month
dropping
off the domain. It might be a slight contributor, but I find
only a
very
few pingable servers with machine account last renewal date
between
30 and
40 days.
Yes, I can disable this default 30 day automatic update and roll
my
own
'adcli update' cron. But that's a mass deployment, to fix what
might
not
be the problem. I want to verify this is the actual culprit
before
I take
those drastic steps.
Spike
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to
sssd-users-leave@lists.fedorahosted.org
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to
sssd-users-leave@lists.fedorahosted.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Am Thu, Sep 02, 2021 at 11:41:47AM -0500 schrieb Spike White:
OK,
That particular candidate seems like a very unusual end corner case. Where someone cloned an existing VM, renamed it, re-IP'd and (incorrectly) re-joined it to AD.
I saw "incorrectly", because they did not clear the existing /etc/krb5.keytab file prior to the re-join. Hence, the old bogus entries in /etc/krb5.keytab with the original hostname -- which confused adcli.
BTW, Sumit -- I completely understand what you're saying about AD and TGTs. You can request TGT tickets only off your UPN. So our UPN is always 'host/fqdn@REALM'. in AD, you cannot request TGT tickets off your SPNs. In other Kerberos implementations, they allow you to request TGT tickets off SPNs -- but not AD. AD does not allow that. So I completely get the difference between kinit -k and adcli.
I think a better command-line test for us to troubleshoot sssd would be request a Kerberos service ticket, not a Kerberos TGT ticket. But I don't know how to do that on the command line (kinit only requests TGT tickets and renews tickets). We use 'adcli testjoin' to simulate sssd Kerberos initialization.
Hi,
you can use 'kvno' to request a service ticket on the command line.
Anyway, back to our research. We have now found 8 candidates that has AD attribute 'passwordLastSet' between 31 days and 40 days. (Actually, two are at 40 days, so probably AD has locked those machine accounts.)
On this first candidate, we think it's another one-off for that particular server. CPU pegged at 100% since Aug 9. We rebooted, set debug_level 9 and it appears to have renewed this first attempt (based on timestamps). We see this output in the sssd_amer.company.com.log
(2021-09-01 14:44:36): [be[amer.company.com]] [ad_machine_account_password_renewal_done] (0x1000): --- adcli output start--- ---adcli output end--- (2021-09-01 14:44:36): [be[amer.company.com]] [be_ptask_done] (0x0400): Task [AD machine account password renewal]: finished successfully (2021-09-01 14:44:36): [be[amer.company.com]] [be_ptask_schedule] (0x0400): Task [AD machine account password renewal]: scheduling task 86400 seconds from last execution time [1630608275]
We saw a second update attempt later (today?) with a lot of adcli output, but it said:
...--- adcli output start--- ...
- Password not too old, no change needed
... ---adcli output end---
So we suspect this candidate is yet another edge-corner case (CPU at 100% -- not able to adcli update).
Looking at the next couple of candidates, these are a more interesting (& seemingly more common) case. They updated their entry in AD, but the local /etc/krb5.keytab file was not updated. These happen to be OL7 servers (but we have a RHEL7 candidate at 40 days non-check-in). Because these are *L7, it's not the Kerberos UDP kpasswd problem (that's only on RHEL6/OL6).
I recently came across a similar issue where an older SELinux policy was installed which prevented 'adcli' called from SSSD under SSSD's SELinux context to write to the keytab.
bye, Sumit
Let's take the first one. casnlrritpgm206. According to AD, it has a 'passwordLastSet' of 7/28/2021.
PS C:\Users\spike_white> get-adcomputer casnlrritpgm206 -properties 'passwordLastSet'
DistinguishedName : CN=CASNLRRITPGM206,OU=Servers,OU=UNIX,DC=amer,DC=company,DC=com DNSHostName : casnlrritpgm206.us.company.com Enabled : True Name : CASNLRRITPGM206 ObjectClass : computer ObjectGUID : f9fa2b5b-75b6-434b-8e94-477599d1afca PasswordLastSet : 7/28/2021 10:04:49 PM SamAccountName : CASNLRRITPGM206$ SID : S-1-5-21-1802859667-647903414-1863928812-3091065 UserPrincipalName : host/casnlrritpgm206.us.company.com@AMER.COMPANY.COM
It has a 'msDS-KeyVersionNumber' of 7.
but in the local /etc/krb5.keytab file, it has a last KVNO of 6 with a timestamp of 6/28:
Keytab name: FILE:/etc/krb5.keytab
KVNO Timestamp Principal
6 06/28/2021 06:53:30 CASNLRRITPGM206$@AMER.COMPANY.COM (arcfour-hmac)
6 06/28/2021 06:53:30 CASNLRRITPGM206$@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96)
6 06/28/2021 06:53:30 CASNLRRITPGM206$@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96)
6 06/28/2021 06:53:30 host/ casnlrritpgm206.us.company.com@AMER.COMPANY.COM (arcfour-hmac)
6 06/28/2021 06:53:30 host/ casnlrritpgm206.us.company.com@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96)
6 06/28/2021 06:53:30 host/ casnlrritpgm206.us.company.com@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96)
6 06/28/2021 06:53:30 host/CASNLRRITPGM206@AMER.COMPANY.COM (arcfour-hmac)
6 06/28/2021 06:53:30 host/CASNLRRITPGM206@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96)
6 06/28/2021 06:53:30 host/CASNLRRITPGM206@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96)
6 06/28/2021 06:53:30 RestrictedKrbHost/CASNLRRITPGM206@AMER.COMPANY.COM (arcfour-hmac)
6 06/28/2021 06:53:31 RestrictedKrbHost/CASNLRRITPGM206@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96)
6 06/28/2021 06:53:31 RestrictedKrbHost/CASNLRRITPGM206@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96)
6 06/28/2021 06:53:31 RestrictedKrbHost/ casnlrritpgm206.us.company.com@AMER.COMPANY.COM (arcfour-hmac)
6 06/28/2021 06:53:31 RestrictedKrbHost/ casnlrritpgm206.us.company.com@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96)
6 06/28/2021 06:53:31 RestrictedKrbHost/ casnlrritpgm206.us.company.com@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96)
5 05/29/2021 02:08:37 CASNLRRITPGM206$@AMER.COMPANY.COM (arcfour-hmac)
5 05/29/2021 02:08:37 CASNLRRITPGM206$@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96)
5 05/29/2021 02:08:37 CASNLRRITPGM206$@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96)
So 7/28 is 30 days after 6/28. It appears on 7/28 that sssd invoked adcli update to update the machine account password in AD. adcli update updated it in AD, but not in the local /etc/krb5.keytab file.
The next candidate is similar. It has a KVNO in AD one more than in /etc/krb5.keytab file with a timestamp 30 days past the timestamp of the latest entry in /etc/krb5.keytab file.
This sure seems similar to the Kerberos kpasswd UDP problem. But it's not -- krb5-libs quit using UDP for kpasswd after RHEL6/OL6.
We know how to remediate when we hit such a candidate. adcli update with the valid user principal and valid login ccache of a principal that have AD privs to update these machine accounts.
So -- this is the ultimate question? *Why* is this happening? Why is the adcli update (called by sssd) updating the passwd in AD and the msDS-KeyVersionNumber, but not updating /etc/krb5.keytab?
I think this is the common case that we're seeing -- that these other cases (plus one other) are the unusual end-corner cases.
Spike
On Thu, Sep 2, 2021 at 12:49 AM Sumit Bose sbose@redhat.com wrote:
Am Wed, Sep 01, 2021 at 11:39:30AM -0500 schrieb Spike White:
So to respond to my own email, but a co-worker did finally find some references to that bizarre name ZZZKBTDURBOL8.
[root@nwpllv8bu100 post_install]# klist -kte Keytab name: FILE:/etc/krb5.keytab KVNO Timestamp Principal
2 07/13/2021 16:42:17 ZZZKBTDURBOL8$@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 ZZZKBTDURBOL8$@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 host/ zzzkbtdurbol8.amer.company.com@AMER.COMPANY.COM
(aes128-cts-hmac-sha1-96)
2 07/13/2021 16:42:17 host/ zzzkbtdurbol8.amer.company.com@AMER.COMPANY.COM
(aes256-cts-hmac-sha1-96)
2 07/13/2021 16:42:17 host/ZZZKBTDURBOL8@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 host/ZZZKBTDURBOL8@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 RestrictedKrbHost/
ZZZKBTDURBOL8@AMER.COMPANY.COM
(aes128-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 RestrictedKrbHost/
ZZZKBTDURBOL8@AMER.COMPANY.COM
(aes256-cts-hmac-sha1-96) 2 07/13/2021 16:42:17 RestrictedKrbHost/ zzzkbtdurbol8.amer.company.com@AMER.COMPANY.COM
(aes128-cts-hmac-sha1-96)
2 07/13/2021 16:42:17 RestrictedKrbHost/ zzzkbtdurbol8.amer.company.com@AMER.COMPANY.COM
(aes256-cts-hmac-sha1-96)
3 07/29/2021 13:38:27 NWPLLV8BU100$@AMER.COMPANY.COM (DEPRECATED:arcfour-hmac) 3 07/29/2021 13:38:27 NWPLLV8BU100$@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 NWPLLV8BU100$@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 host/
nwpllv8bu100.amer.company.com@AMER.COMPANY.COM
(DEPRECATED:arcfour-hmac) 3 07/29/2021 13:38:27 host/
nwpllv8bu100.amer.company.com@AMER.COMPANY.COM
(aes128-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 host/
nwpllv8bu100.amer.company.com@AMER.COMPANY.COM
(aes256-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 host/NWPLLV8BU100@AMER.COMPANY.COM (DEPRECATED:arcfour-hmac) 3 07/29/2021 13:38:27 host/NWPLLV8BU100@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 host/NWPLLV8BU100@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 RestrictedKrbHost/NWPLLV8BU100@AMER.COMPANY.COM (DEPRECATED:arcfour-hmac) 3 07/29/2021 13:38:27 RestrictedKrbHost/NWPLLV8BU100@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 RestrictedKrbHost/NWPLLV8BU100@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 RestrictedKrbHost/ nwpllv8bu100.amer.company.com@AMER.COMPANY.COM (DEPRECATED:arcfour-hmac) 3 07/29/2021 13:38:27 RestrictedKrbHost/ nwpllv8bu100.amer.company.com@AMER.COMPANY.COM (aes128-cts-hmac-sha1-96) 3 07/29/2021 13:38:27 RestrictedKrbHost/ nwpllv8bu100.amer.company.com@AMER.COMPANY.COM (aes256-cts-hmac-sha1-96)
I realize this is reflecting the literal entries in the /etc/krb5.keytab file. So it appears that when this VM was born (on July 13th), it was named zzzkbtdurbo18.amer.company.com. (I see other supporting evidence for this). On or before July 29th, it was renamed to final FQDN nwpllv8bu100.amer.company.com. /etc/hostname, /etc/hosts, /etc/sysconfig/network etc were all updated and it was rejoined to AD.
kinit -k works fine. It picks up the current hostname and apparently
uses
host/nwpllv8bu100.amer.company.com@AMER.COMPANY.COM as its service principal. Since there's a valid entry in /etc/krb5.keytab file for
this,
it uses this and all is good. (I'm guessing it uses the 14th or 15th /etc/krb5.keytab file entry above.)
sssd works, because it has this line:
ldap_sasl_authid = host/nwpllv8bu100.amer.company.com@AMER.COMPANY.COM
But when sssd invokes adcli update to refresh the machine account
password,
adcli update fails.
Also, I see that adcli testjoin fails.
[root@nwpllv8bu100 tmp]# adcli testjoin -D AMER.COMPANY.COM adcli: couldn't connect to AMER.COMPANY.COM domain: Couldn't
authenticate
as machine account: ZZZKBTDURBOL8: Preauthentication failed [root@nwpllv8bu100 tmp]#
From a strace of this adcli testjoin, it appears that adcli is opening
the
/etc/krb5.keytab file to determine the default service principal to use
and
is pulling the old server name. (instead of using the correct service principal, as kinit -k somehow does.)
Hi,
it looks like your environment is a bit special. I guess you have added 'host/nwpllv8bu100.amer.company.com@AMER.COMPANY.COM' to the 'userPrincipalName' LDAP attribute in the AD computer object for this host.
# klist -k Keytab name: FILE:/etc/krb5.keytab KVNO Principal
2 MASTER$@CHILD.AD.VM 2 MASTER$@CHILD.AD.VM 2 host/MASTER@CHILD.AD.VM 2 host/MASTER@CHILD.AD.VM 2 host/master.client.vm@CHILD.AD.VM 2 host/master.client.vm@CHILD.AD.VM 2 RestrictedKrbHost/MASTER@CHILD.AD.VM 2 RestrictedKrbHost/MASTER@CHILD.AD.VM 2 RestrictedKrbHost/master.client.vm@CHILD.AD.VM 2 RestrictedKrbHost/master.client.vm@CHILD.AD.VM # kdestroy -A # # # kinit -k kinit: Client 'host/master.client.vm@CHILD.AD.VM' not found in Kerberos database while getting initial credentials # kinit -k 'MASTER$@CHILD.AD.VM'
The reason is that 'kinit -k' constructs the principal by calling gethostname() or similar, adding the 'host/' prefix and the realm. But by default this principal in AD is only a service principal can cannot be used to request a TGT as kinit does. AD only allows user principals for request a TGT and this is by default 'SHORT$@AD.REALM'. If the userPrincipalName attribute is set, this principal given here is allowed as well.
That's why adcli is checking the keytab for a principal with a '$' character and by default it uses the first it finds because it is expected there is only one. Adding some heuristics in case there are more '$' principals in the keytab like highest KVNO might help in some cases but would fail in other cases so I think just using the first is good enough.
Maybe when sssd constructs this adcli update invocation, it's not passing the ldap_sasl_authid, so the adcli update is doing the above logic to
pull
the old server name?
Adding a new option to adcli and using the value from ldap_sasl_authid might be a solution, I will think about it.
HTH
bye, Sumit
Sounds like an adcli problem. Adcli should do as 'kinit -k' does when
it's
passed no explicit service principal. Should dive into /etc/krb5.keytab file and use the most recent set of entries (KVNO = 3 in above example). Maybe derive the default service principal off the current FQDN and Kerberos realm?
Spike PS As a general policy, we are not supposed to clone a VM and rename it
to
another FQDN/IP address. I'll be trying to track down who did this and
for
what reason.
On Wed, Sep 1, 2021 at 10:08 AM Spike White spikewhitetx@gmail.com
wrote:
Ok, this is *very* illuminating!
I see this in sssd_amer.company.com.log"
(2021-09-01 3:44:46): [be[amer.company.com]] [ad_machine_account_password_renewal_done] (0x1000): --- adcli output start--- adcli: couldn't connect to amer.company.com domain: Couldn't
authenticate
as machine account: ZZZKBTDURBOL8: Preauthentication failed ---adcli output end---
However, I don't find that host name ZZZKBTDURBOL8 anywhere on the system. (By company convention, servers named ZZZ* are test servers
that
linux SEs spin up themselves).
This server that's not renewing its creds is named: nwpllv8bu100.amer.company.com. it's a std dev server. in /etc/sssd/sssd.conf file, it has that as its sasl auth ID:
[root@nwpllv8bu100 sssd]# grep sasl /etc/sssd/sssd.conf ldap_sasl_authid = host/nwpllv8bu100.amer.dell.com@AMER.COMPANY.COM [root@nwpllv8bu100 sssd]#
If I do 'kinit -k', the /etc/krb5.keytab file has that name as well:
[root@nwpllv8bu100 sssd]# kinit -k [root@nwpllv8bu100 sssd]# klist Ticket cache: KCM:0 Default principal: host/nwpllv8bu100.amer.dell.com@AMER.COMPANY.COM
Valid starting Expires Service principal 09/01/2021 11:04:16 09/01/2021 21:04:16 krbtgt/ AMER.DELL.COM@AMER.COMPANY.COM renew until 09/08/2021 11:04:16 [root@nwpllv8bu100 sssd]#
I searched /etc/sssd/sssd.conf -- no "zzz" or "ZZZ" string is anywhere
in
there. So where is sssd picking up this name ZZZKBTDURBOL8 and
passing it
to adcli update?
Spike
On Wed, Sep 1, 2021 at 2:46 AM Sumit Bose sbose@redhat.com wrote:
Am Tue, Aug 31, 2021 at 09:53:01PM +0200 schrieb Alexey Tikhonov:
On Tue, Aug 31, 2021 at 6:47 PM Spike White <spikewhitetx@gmail.com
wrote:
> All, > > OK we have a query we run in AD for machine account passwords for
a
> certain age. In today's run, 31 - 32 days. Then we verify it's
pingable.
> > We have found such one such suspicious candidate today (two
actually,
but
> the other Linux server is quite sick). So one good research
candidate.
> According to both AD and /etc/krb5.keytab file, the machine
account
> password was last set on 7/29. Today is 8/31, so that would be 32
days.
> This 'automatic machine account keytab renewal' background task
should
> trigger again today. > > sssd service was last started 2 weeks ago and, by all appearances,
appears
> healthy. sssctl domain-status <domain> shows online, connected
to AD
> servers (both domain and GC servers).. All logins and group
enumerations
> working as expected. > > Just now, we dynamically set the debug level to 9 with 'sssctl
debug-level
> 9'. This particular server is Oracle Linux 8.4, > running sssd-*-2.4.0-9.0.1.el8_4.1.x86_64. Installed July 13th,
- So
> -- very recent sssd version. (This problem occurs with both RHEL
& OL
> 6/7/8, it's just today's candidate happens to be OL8.) > > We can't keep debug level 9 up for a great many days; it swamps
the
> /var/log filesystem. But we can leave up for a few days. We
purposely did
> not restart sssd server as we know that would trigger a machine
account
> renewal. > > Speaking of that -- from Sumit's sssd source code in > ad_provider/ad_machine_pw_renewal.c, it appears that sssd is
creating
a
> back-end task to call external program /usr/sbin/adcli with
certain
args.
> What string can I look for in which sssd log file (now that I
have
debug
> level 9 enabled) to tell me when this 'adcli update' task (aka
'automatic
> machine account keytab renewal') is triggered? >
It seems SSSD itself only logs in case of errors. I didn't find any explicit logs around `ad_machine_account_password_renewal_send()`. But perhaps there will be something like "[be_ptask_execute]
(0x0400):
Task
[AD machine account password renewal]: executing task" from generic be_ptask_* helpers in the sssd_$domain.log (I'm not sure).
Also at this verbosity level `--verbose` should be supplied to adcli
itself
and I guess output should be captured in sssd_$domain.log as well.
I'm
not
familiar with `adcli` internals, you can take a glance at https://gitlab.freedesktop.org/realmd/adcli to find its log
messages.
Hi,
if SSSD's debug_level is 7 or higher the '--verbose' option is set when calling adcli and the output is added to the backend logs. It
will
start with log message "--- adcli output start---".
HTH
bye, Sumit
> > I'm less certain now that we've surveyed our env that this
background
> 'adcli update' task is the reason behind 70 - 80 servers / month
dropping
> off the domain. It might be a slight contributor, but I find
only a
very
> few pingable servers with machine account last renewal date
between
30 and
> 40 days. > > Yes, I can disable this default 30 day automatic update and roll
my
own
> 'adcli update' cron. But that's a mass deployment, to fix what
might
not
> be the problem. I want to verify this is the actual culprit
before
I take
> those drastic steps. > > Spike > >
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to
sssd-users-leave@lists.fedorahosted.org
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to
sssd-users-leave@lists.fedorahosted.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives:
https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o...
Do not reply to spam on the list, report it:
https://pagure.io/fedora-infrastructure _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
sssd-users@lists.fedorahosted.org