Matthew,
My team runs at least 10 distinct IDM farms (realms), and our largest has 20 idm replicas
all over the world. All are RHEL7 (and RHEL6 when it was a thing), half use AD trusts,
and the others are self-contained. After many years and far more replica re-inits than I
care to remember, we have found the following points worth considering.
* WAN latency is your enemy! Place your replicas in well-connected locations, and let
your far-flung locations be clients.
* WAN resiliency is a must! If you have frequent network isolations, or bouncy WAN
links in certain locations, don’t put replicas there. Again, point the clients in those
sites to your core replica farm.
* Follow a 4-node “tightly coupled” model as best you can. Put no less than 2, nor
more than 4 replicas in one location
* Keep your per-node replication agreements to 4 or fewer, and across different
links.
* Let SSSD caching help you on your truly remote clients. Resist the temptation (or
pressure) to put an IDM server in one remote location just to speed up individual login
performance; you’ll hate yourself months later when you have to spend a week re-init’ing
every server to get them back in sync.
* Think hard about how much your content changes, and where. E.g:
* if you are registering new clients all the time in 60 different locations,
replication storms, and contention, can become a thing.
* If your content is relatively static it’ll tolerate less-than-ideal connectivity
better
* If you can focus the majority of your content changes in one location, with most
other sites being “read-only”, you might tolerate WAN issues better.
Without knowing your network, it is hard to say, but let’s imagine you’re in AWS. You
might put 4-node replication clusters in US-East, US-West, a couple clusters in EU, a
couple clusters in APAC, etc etc. Set up your replication agreements such that APAC can
get it from US and EU, US from APAC and EU, and EU from APAC and US. Then point each
client site to the nearest IDM replicas (hint: DNS SRV records). See Figure 3.2 on the
RH URL you shared.
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/...
What you don’t want is a user in one very remote location changing their password, and
another system registering as a new client on the other side of the world, and the only
way for every IDM server to replicate is to take 10 slow hops to get to the other. As a
matter of perspective: In our 20-node far-flung (less-than-ideal topology) IDM farm, if
we mass delete host entries, we do 100 at a time and wait 5 minutes before doing another
100 to let the deletion wave traverse the entire farm.
Cautionary Tale for you and anybody else reading this: This “happened to a friend” 😉
several years ago, and we learned many lessons from it… Imagine you have an autoscaling
group in AWS or whatever-cloud, and there’s an automated process to register the client to
IDM. Now, imagine there is something flawed in the ASG instances which cause them to
terminate immediately and spin up a new instance. Demand calls for 100 new instances, and
left over the weekend the ASG spins up (and terminates) 40,000 instances across multiple
regions and zones. That means 40,000 host-add’s swarming your IDM farm. It will
collapse, impacting every existing clients’ ability to login or do anything – don’t be
“that guy”.
Sorry for the long post (and I hope if formats OK), but I truly hope it helps you and
others. The 60-replica model is probably based in a lab environment with nearly 0ms
latency between all nodes. But again, my experience is RHEL7. RHEL8 seems to have some
extra capability, and upstream IPA even more, so YMMV.
Best Regards,
--
| Pat Larkin Patrick.Larkin@Sabre.com<mailto:Patrick.Larkin@Sabre.com> | Texas USA
|
| Manager | Linux Engineering |
http://go/linuximo |
| +1.682.213.4281 |
http://go/LinuxOps |
-----------------------------------------------------
From: Matthew Davis via FreeIPA-users <freeipa-users(a)lists.fedorahosted.org>
Sent: Thursday, September 15, 2022 4:53 PM
To: FreeIPA users list <freeipa-users(a)lists.fedorahosted.org>
Cc: Matthew Davis <fedoraproject(a)virtual.drop.net>
Subject: [Freeipa-users] Replication topology size limitations?
…
I have over 60 geographical locations I was hoping to place a replica. I will easily
exceed the 60 replica limitation outlined in the documentation. Can any elaborate on the
60 replica limitation? Is this a hard limit? What are the contributing factors for the
existing limitation?
Each location will have far less than 2000 clients. Are there any considerations that
could accommodate a larger number of replica servers?
Thanks
--
________________________________
Matthew Davis
1
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/...