On 3 November 2017 at 09:02, Lukas Slebodnik <lslebodn(a)redhat.com> wrote:
On (03/11/17 08:53), Lachlan Musicman wrote:
>On 3 November 2017 at 08:19, Lukas Slebodnik <lslebodn(a)redhat.com> wrote:
>
>> On (02/11/17 08:20), Lachlan Musicman wrote:
>> >Last night sssd shutdown on one of my servers.
>> >
>> >I had updated the IPA server earlier in the day - but only patches to
>> >4.5.0, nothing major.
>> >
>> >The error I saw this AM was:
>> >
>> >
>> >(Wed Nov 1 17:08:22 2017) [sssd[be[unix.domain.com]]]
[orderly_shutdown]
>> >(0x0010): SIGTERM: killing
>> >children
>> >(Wed Nov 1 17:08:50 2017) [sssd[be[unix.domain.com]]]
>> >[sysdb_domain_cache_connect] (0x0010): DB version too old [0.18],
expected
>> >[0.19] for domain
unix.domain.com!
>>
>> sysdb version 0.19 is only in sssd-1.16.0 which is not in el7.4 by
default.
>>
>
>
>Ah!
>
>And we are using the SSSD 1.16.0 from COPR.
>
>Hmm. What should we do? All of our servers are using sssd from the COPR
>repo and our IPA server is using the CentOS repos for ipa-*.
>
sssd cache should be upgraded after restart. I have no idea how it is
possible that new binaries are used and sssd cache is old.
In theory, there is an explanation that sssd was not restarted
and backend(sssd_be was restarted) and thus new version of binary was used.
Another explanation is that upgrade for some reason failed.
But in this case I would expect that sssd should not run.
It would be good if you could provide more details or even
reproducer :-)
No doubt! I'm sorry I can't be more helpful - it was hard to diagnose what
the problem was exactly because of various symptoms which were eventually
unrelated. We were in the middle of diagnosing why the replica installation
was failing (ipareplica-conncheck kept failing despite being able to login
via ssh in both directions) when it all happened.
The client in question is relatively busy - the login node to the cluster -
the important thing (for my manager) was to make it work and damn the
diagnostics. Because the upgrade had worked in my dev environment, I didn't
think to start taking logs or to worry.
Of course, now I have the unenviable problem that my manager is update shy,
and doesn't want to upgrade the IPA server again. The plan is to clone the
problematic client, boot the clone into the dev domain and test there. I
can report back on reproducibility then. Debug has been turned back on in
prod.
Also, I can't remember clearing the cache on the client after the update -
I probably didn't due to the size of our domain and the high usage of that
server. So it may be that the problem has disappeared.
Cheers
L.
------
"The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic civics
is the insistence that we cannot ignore the truth, nor should we panic
about it. It is a shared consciousness that our institutions have failed
and our ecosystem is collapsing, yet we are still here — and we are
creative agents who can shape our destinies. Apocalyptic civics is the
conviction that the only way out is through, and the only way through is
together. "
*Greg Bloom* @greggish
https://twitter.com/greggish/status/873177525903609857