On 3 November 2017 at 09:02, Lukas Slebodnik <lslebodn@redhat.com> wrote:
On (03/11/17 08:53), Lachlan Musicman wrote:
>On 3 November 2017 at 08:19, Lukas Slebodnik <lslebodn@redhat.com> wrote:
>
>> On (02/11/17 08:20), Lachlan Musicman wrote:
>> >Last night sssd shutdown on one of my servers.
>> >
>> >I had updated the IPA server earlier in the day - but only patches to
>> >4.5.0, nothing major.
>> >
>> >The error I saw this AM was:
>> >
>> >
>> >(Wed Nov  1 17:08:22 2017) [sssd[be[unix.domain.com]]] [orderly_shutdown]
>> >(0x0010): SIGTERM: killing
>> >children
>> >(Wed Nov  1 17:08:50 2017) [sssd[be[unix.domain.com]]]
>> >[sysdb_domain_cache_connect] (0x0010): DB version too old [0.18], expected
>> >[0.19] for domain unix.domain.com!
>>
>> sysdb version 0.19 is only in sssd-1.16.0 which is not in el7.4 by default.
>>
>
>
>Ah!
>
>And we are using the SSSD 1.16.0 from COPR.
>
>Hmm. What should we do? All of our servers are using  sssd from the COPR
>repo and our IPA server is using the CentOS repos for ipa-*.
>

sssd cache should be upgraded after restart. I have no idea how it is
possible that new binaries are used and sssd cache is old.

In theory, there is an explanation that sssd was not restarted
and backend(sssd_be was restarted) and thus new version of binary was used.

Another explanation is that upgrade for some reason failed.
But in this case I would expect that sssd should not run.

It would be good if you could provide more details or even
reproducer :-)

No doubt! I'm sorry I can't be more helpful - it was hard to diagnose what the problem was exactly because of various symptoms which were eventually unrelated. We were in the middle of diagnosing why the replica installation was failing (ipareplica-conncheck kept failing despite being able to login via ssh in both directions) when it all happened.

The client in question is relatively busy - the login node to the cluster - the important thing (for my manager) was to make it work and damn the diagnostics. Because the upgrade had worked in my dev environment, I didn't think to start taking logs or to worry.

Of course, now I have the unenviable problem that my manager is update shy, and doesn't want to upgrade the IPA server again. The plan is to clone the problematic client, boot the clone into the dev domain and test there. I can report back on reproducibility then. Debug has been turned back on in prod.

Also, I can't remember clearing the cache on the client after the update - I probably didn't due to the size of our domain and the high usage of that server. So it may be that the problem has disappeared.


Cheers
L.


------
"The antidote to apocalypticism is apocalyptic civics. Apocalyptic civics is the insistence that we cannot ignore the truth, nor should we panic about it. It is a shared consciousness that our institutions have failed and our ecosystem is collapsing, yet we are still here — and we are creative agents who can shape our destinies. Apocalyptic civics is the conviction that the only way out is through, and the only way through is together. "

Greg Bloom @greggish https://twitter.com/greggish/status/873177525903609857