Harry G Coin wrote:
On 1/17/24 12:55, Rob Crittenden wrote:
Harry G Coin wrote:
On 1/15/24 13:26, Rob Crittenden wrote:
Harry G Coin via FreeIPA-users wrote:
Hi! This is meant for the good future of freeipa, a package I've appreciated for some years, so across the user cultures and languages please understand it as supportive and not a complaint!
For all freeipa's 'master-master' replica technology, there remain 'some instances more primary than others' even if the topology diagrams claim equivalence. Lose 'that one that's even more primary' and (absent high-learning-curve, on-site capability, and intervention that calls for high-bar mastery of seldom used subsystems) -- you're on a track to breakage. Why? Because it's when, not if, that 'primary' system will need a major OS point release (8 to 9 in the present situation). In that case, there is as yet no 'just works' upgrade path. With 'not the super special 'even more master than other' master replicas, it's easy and 'it just works'... but 'for that one...' freeipa is not ready for 'prime time'.
For example, should site admins 'just know' whether there is a current kasp.db maintained in more than one place? How many know about ipa-crlgen-manage, or whether /etc/pki/pki-tomcat/ca/CS.cfg should or shouldn't have ca.certStatusUpdateInterval=0, or have the command ipa config-mod --ca-renewal-master-server at the top of their mind? SID range assignments?
Fundamentally, the fair question is: Which freeipa subsystems that I don't happen to have studied in dev-level detail have similar 'deep gotchas that are obvious to the one who specializes in that, but opaque to everyone else'? Not even the freeipa devs who write the docs collect all the steps in one place. While there are 'characterizations of worries' those come without steps, the advice doesn't say what steps will work, just what won't. ('don't leapp upgrade').
The way forward I think is fairly doable. First is to have each 'dev that's an expert in their thing' (dns, kra, etc. etc.) make sure all 'master' level replicas have, updated, whatever 'special files' might be necessary, even if they aren't 'the extra special primary replica', and may never get used.
Second is an 'orchestration' command, to be run on a master-replica that is 'the latest os', that will, 'all in one', do all the magic to become 'the extra special primary master' and take those options off 'the old primary', even if it means installing trust/dns/etc subsystems extant on the 'old master' but missing from the 'soon to be new primary master'. An orchestration command that manages everything from moving which fqdn is authoritative in SOA records, to magic tiny entries in CA.cfg files. When that command is done, the 'old primary' becomes 'just another master replica that happens to be using an older os'. Then the 'old primary' can be discarded and replaced with the latest os and a fresh install as a master replica. At that point, it's optional whether to move the 'special primary' status 'back' to the 'now new OS master system'.
The admin pain involved at present 'for that one system that's the extra special primary' at os major release upgrade time -- it sets too high an education bar, obviously higher than even one freeipa-dev has, as the docs prove-- and as such needs a team approach to address,, before OS 9 to 10 please!
There is a whole guide on considerations when doing a major RHEL release migration. Have you seen it? Is it insufficient?
I suppose an Ansible role could be created to do this type of transition and it would be a lot less error-prone.
Of all the things you mention the only really catastrophic one to miss is setting the renewal master. This can lead to all the certs expiring and general pain overall. The rest are either highly visible (no DNSSEC keys) or more often not used at all (CRL).
ipa-healthcheck tries to verify a few of these but it tends to focus only one a single system and not the entire cluster. So far anyway.
Hi Rob
Yes, I did read it. You'll notice it has sections labelled 'optional' without guidance about whether the 'option' is appropriate (subsystem specific deep knowledge required). There is a certain 'ethos' or 'outlook' in your suggestion that 'highly visible' errors and mis-steps are 'ok' because (presuming you have the wit and skills to look for it as the official guide gives no hint) they are 'highly visible'. Well you might ask yourself the question: where does it give steps to test whether the 'highly visible' problems exist before the day comes the underlying capability expires and suddenly it all 'just doesn't work'.
Ultimately, the fact the Freeipa team had to write an extensive and even, at this time incomplete, guide to getting something advertised as 'master' to 'just work' across an os point release... shouldn't that suggest capturing that knowledge in an orchestration command is appropriate?
Notice the 'ceph' community also took a long while to get to the place they decided ansible and other approaches were not going to 'cut it' and created their own 'orchestrator' which vaulted the effort from 'useful for those who don't need to know other things as well' to 'broadly deployable'.
I think what we are seeing here in Freeipa is an emerging reality common to all systems that aim to treat a collection in partnership as 'one thing'.
If you have specific suggestions for the documentation I think we're all ears.
Writing an orchestrator from scratch for IPA is simply out of the question. Ansible is the best bet here.
IPA has always been about herding cats and its a pain point we are well aware of.
Hi Rob
I may just take you up on that! What person or group is it presently in your company or the freeipa community charged with signing off on quality assurance across major OS transitions?
There is no one person. We have a test which does these upgrades. Whether this checks for CA renewal or CRL control I do not know.
I suppose the best place to start is an upstream ticket on pagure.io/freeipa.