Ok, this is *very* illuminating!

I see this in sssd_amer.company.com.log"

(2021-09-01  3:44:46): [be[amer.company.com]] [ad_machine_account_password_renewal_done] (0x1000): --- adcli output start---
adcli: couldn't connect to amer.company.com domain: Couldn't authenticate as machine account: ZZZKBTDURBOL8: Preauthentication failed
---adcli output end---

However, I don't find that host name ZZZKBTDURBOL8 anywhere on the system.  (By company convention, servers named ZZZ* are test servers that linux SEs spin up themselves).

This server that's not renewing its creds is named:  nwpllv8bu100.amer.company.com.  it's a std dev server.  in /etc/sssd/sssd.conf file, it has that as its sasl auth ID:

[root@nwpllv8bu100 sssd]# grep sasl /etc/sssd/sssd.conf
ldap_sasl_authid = host/nwpllv8bu100.amer.dell.com@AMER.COMPANY.COM
[root@nwpllv8bu100 sssd]#

If I do 'kinit -k',  the /etc/krb5.keytab file has that name as well:

[root@nwpllv8bu100 sssd]# kinit -k
[root@nwpllv8bu100 sssd]# klist
Ticket cache: KCM:0
Default principal: host/nwpllv8bu100.amer.dell.com@AMER.COMPANY.COM

Valid starting       Expires              Service principal
09/01/2021 11:04:16  09/01/2021 21:04:16  krbtgt/AMER.DELL.COM@AMER.COMPANY.COM
        renew until 09/08/2021 11:04:16
[root@nwpllv8bu100 sssd]# 

I searched /etc/sssd/sssd.conf -- no "zzz" or "ZZZ" string is anywhere in there.   So where is sssd picking up this name ZZZKBTDURBOL8 and passing it to adcli update?

Spike







On Wed, Sep 1, 2021 at 2:46 AM Sumit Bose <sbose@redhat.com> wrote:
Am Tue, Aug 31, 2021 at 09:53:01PM +0200 schrieb Alexey Tikhonov:
> On Tue, Aug 31, 2021 at 6:47 PM Spike White <spikewhitetx@gmail.com> wrote:
>
> > All,
> >
> > OK we have a query we run in AD for machine account passwords for a
> > certain age.  In today's run, 31 - 32 days.  Then we verify it's pingable.
> >
> > We have found such one such suspicious candidate today (two actually, but
> > the other Linux server is quite sick).  So one good research candidate.
> > According to both AD and /etc/krb5.keytab file, the machine account
> > password was last set on 7/29.  Today is 8/31, so that would be 32 days.
> > This 'automatic machine account keytab renewal'  background task should
> > trigger again today.
> >
> > sssd service was last started 2 weeks ago and, by all appearances, appears
> > healthy.  sssctl domain-status <domain> shows online, connected to AD
> > servers (both domain and GC servers)..  All logins and group enumerations
> > working as expected.
> >
> > Just now, we dynamically set the debug level to 9 with 'sssctl debug-level
> > 9'.  This particular server is Oracle Linux 8.4,
> > running sssd-*-2.4.0-9.0.1.el8_4.1.x86_64.   Installed July 13th, 2021.  So
> > -- very recent sssd version.  (This problem occurs with both RHEL & OL
> > 6/7/8, it's just today's candidate happens to be OL8.)
> >
> > We can't keep debug level 9 up for a great many days;  it swamps the
> > /var/log filesystem.  But we can leave up for a few days.  We purposely did
> > not restart sssd server as we know that would trigger a machine account
> > renewal.
> >
> > Speaking of that -- from Sumit's sssd source code in
> > ad_provider/ad_machine_pw_renewal.c, it appears that sssd is creating a
> > back-end task to call external program /usr/sbin/adcli with certain args.
> >  What string can I look for in which sssd log file (now that I have debug
> > level 9 enabled) to tell me when this 'adcli update' task (aka 'automatic
> > machine account keytab renewal')  is triggered?
> >
>
> It seems SSSD itself only logs in case of errors. I didn't find any
> explicit logs around `ad_machine_account_password_renewal_send()`.
> But perhaps there will be something like "[be_ptask_execute] (0x0400): Task
> [AD machine account password renewal]: executing task" from generic
> be_ptask_* helpers in the sssd_$domain.log (I'm not sure).
>
> Also at this verbosity level `--verbose` should be supplied to adcli itself
> and I guess output should be captured in sssd_$domain.log as well. I'm not
> familiar with `adcli` internals, you can take a glance at
> https://gitlab.freedesktop.org/realmd/adcli to find its log messages.

Hi,

if SSSD's debug_level is 7 or higher the '--verbose' option is set
when calling adcli and the output is added to the backend logs. It will
start with log message "--- adcli output start---".

HTH

bye,
Sumit

>
>
> >
> > I'm less certain now that we've surveyed our env that this background
> > 'adcli update' task is the reason behind 70 - 80 servers / month dropping
> > off the domain.  It might be a slight contributor, but I find only a very
> > few pingable servers with machine account last renewal date between 30 and
> > 40 days.
> >
> > Yes, I can disable this default 30 day automatic update and roll my own
> > 'adcli update' cron.  But that's a mass deployment, to fix what might not
> > be the problem.   I want to verify this is the actual culprit before I take
> > those drastic steps.
> >
> > Spike
> >
> >

> _______________________________________________
> sssd-users mailing list -- sssd-users@lists.fedorahosted.org
> To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org
> Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
_______________________________________________
sssd-users mailing list -- sssd-users@lists.fedorahosted.org
To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure