On 1/17/24 12:55, Rob Crittenden wrote:
> Harry G Coin wrote:
>> On 1/15/24 13:26, Rob Crittenden wrote:
>>> Harry G Coin via FreeIPA-users wrote:
>>>> Hi! This is meant for the good future of freeipa, a package I've
>>>> appreciated for some years, so across the user cultures and languages
>>>> please understand it as supportive and not a complaint!
>>>>
>>>> For all freeipa's 'master-master' replica technology, there
remain
>>>> 'some instances more primary than others' even if the topology
>>>> diagrams
>>>> claim equivalence. Lose 'that one that's even more primary'
and
>>>> (absent
>>>> high-learning-curve, on-site capability, and intervention that
>>>> calls for
>>>> high-bar mastery of seldom used subsystems) -- you're on a track to
>>>> breakage. Why? Because it's when, not if, that 'primary'
system will
>>>> need a major OS point release (8 to 9 in the present situation). In
>>>> that case, there is as yet no 'just works' upgrade path. With
'not
>>>> the
>>>> super special 'even more master than other' master replicas,
it's easy
>>>> and 'it just works'... but 'for that one...' freeipa is
not ready for
>>>> 'prime time'.
>>>>
>>>> For example, should site admins 'just know' whether there is a
current
>>>> kasp.db maintained in more than one place? How many know about
>>>> ipa-crlgen-manage, or whether /etc/pki/pki-tomcat/ca/CS.cfg should or
>>>> shouldn't have ca.certStatusUpdateInterval=0, or have the command
ipa
>>>> config-mod --ca-renewal-master-server at the top of their mind? SID
>>>> range assignments?
>>>>
>>>> Fundamentally, the fair question is: Which freeipa subsystems that I
>>>> don't happen to have studied in dev-level detail have similar
'deep
>>>> gotchas that are obvious to the one who specializes in that, but
>>>> opaque
>>>> to everyone else'? Not even the freeipa devs who write the docs
>>>> collect
>>>> all the steps in one place. While there are 'characterizations of
>>>> worries' those come without steps, the advice doesn't say what
steps
>>>> will work, just what won't. ('don't leapp upgrade').
>>>>
>>>> The way forward I think is fairly doable. First is to have each
'dev
>>>> that's an expert in their thing' (dns, kra, etc. etc.) make sure
all
>>>> 'master' level replicas have, updated, whatever 'special
files'
>>>> might be
>>>> necessary, even if they aren't 'the extra special primary
replica',
>>>> and
>>>> may never get used.
>>>>
>>>> Second is an 'orchestration' command, to be run on a
master-replica
>>>> that
>>>> is 'the latest os', that will, 'all in one', do all the
magic to
>>>> become
>>>> 'the extra special primary master' and take those options off
'the old
>>>> primary', even if it means installing trust/dns/etc subsystems
>>>> extant on
>>>> the 'old master' but missing from the 'soon to be new primary
master'.
>>>> An orchestration command that manages everything from moving which
>>>> fqdn
>>>> is authoritative in SOA records, to magic tiny entries in CA.cfg
>>>> files.
>>>> When that command is done, the 'old primary' becomes
'just another
>>>> master replica that happens to be using an older os'. Then the
'old
>>>> primary' can be discarded and replaced with the latest os and a
fresh
>>>> install as a master replica. At that point, it's optional whether
to
>>>> move the 'special primary' status 'back' to the 'now
new OS master
>>>> system'.
>>>>
>>>> The admin pain involved at present 'for that one system that's
the
>>>> extra
>>>> special primary' at os major release upgrade time -- it sets too
>>>> high an
>>>> education bar, obviously higher than even one freeipa-dev has, as the
>>>> docs prove-- and as such needs a team approach to address,, before
>>>> OS 9
>>>> to 10 please!
>>> There is a whole guide on considerations when doing a major RHEL
>>> release
>>> migration. Have you seen it? Is it insufficient?
>>>
>>>
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/...
>>>
>>>
>>> I suppose an Ansible role could be created to do this type of
>>> transition
>>> and it would be a lot less error-prone.
>>>
>>> Of all the things you mention the only really catastrophic one to miss
>>> is setting the renewal master. This can lead to all the certs expiring
>>> and general pain overall. The rest are either highly visible (no DNSSEC
>>> keys) or more often not used at all (CRL).
>>>
>>> ipa-healthcheck tries to verify a few of these but it tends to focus
>>> only one a single system and not the entire cluster. So far anyway.
>>>
>>> rob
>> Hi Rob
>>
>> Yes, I did read it. You'll notice it has sections labelled
'optional'
>> without guidance about whether the 'option' is appropriate (subsystem
>> specific deep knowledge required). There is a certain 'ethos' or
>> 'outlook' in your suggestion that 'highly visible' errors and
mis-steps
>> are 'ok' because (presuming you have the wit and skills to look for it
>> as the official guide gives no hint) they are 'highly visible'. Well
>> you might ask yourself the question: where does it give steps to test
>> whether the 'highly visible' problems exist before the day comes the
>> underlying capability expires and suddenly it all 'just doesn't
work'.
>>
>> Ultimately, the fact the Freeipa team had to write an extensive and
>> even, at this time incomplete, guide to getting something advertised as
>> 'master' to 'just work' across an os point release...
shouldn't that
>> suggest capturing that knowledge in an orchestration command is
>> appropriate?
>>
>> Notice the 'ceph' community also took a long while to get to the place
>> they decided ansible and other approaches were not going to 'cut it' and
>> created their own 'orchestrator' which vaulted the effort from
'useful
>> for those who don't need to know other things as well' to 'broadly
>> deployable'.
>>
>> I think what we are seeing here in Freeipa is an emerging reality common
>> to all systems that aim to treat a collection in partnership as 'one
>> thing'.
>>
> If you have specific suggestions for the documentation I think we're all
> ears.
>
> Writing an orchestrator from scratch for IPA is simply out of the
> question. Ansible is the best bet here.
>
> IPA has always been about herding cats and its a pain point we are well
> aware of.
>
> rob
>
Hi Rob
I may just take you up on that! What person or group is it presently in
your company or the freeipa community charged with signing off on
quality assurance across major OS transitions?
There is no one person. We have a test which does these upgrades.
Whether this checks for CA renewal or CRL control I do not know.
I suppose the best place to start is an upstream ticket on
pagure.io/freeipa.
rob