So the DNS overload was my own fault. I was using 'while' in Ansible and
doing an entry at a time instead of just generating a playbook that adds
multiple entries. I've tested with 100 entries and had a single update per
zone to the replicas. So I've sorted that. I shouldn't Ansible on almost no
sleep.
However, the question about ~9k diskless nodes booting and all running
'ipa-client-install' with the '--force' option overloading the cluster in
the same manner.
On Thu, Jan 28, 2021, 11:14 AM Mark Potter <markp(a)dug.com> wrote:
The docs say 2k to 3k hosts per FreeIPA machine. We currently have 1
server and 3 replicas for ~9k hosts. The issue is that the hosts in
question are stateless so have to have ipa-client-install run every boot.
We've got that part handled but something came up that's got me concerne.
I was adding DNS records using ansible-freeipa. With needing DNS for all
of our sites along with BMC and such we have about ~38k valid DNS entries.
I was running two playbooks to add entries in parallel because we need
everything to resolve on
example.com and
example1.com. This is an
artifact that can't be avoided so we end up ~76k entries across two zones.
Example.com was adding with reverse and
example1.com was adding without
reverse. Based on the time it took for each entry to be added this should
have taken ~31 hours. At some point the three replicas stopped responding
to any requests. For instance
ipa1.example.com (primary) would validate
while adding a host but
example2.com (replica) would hang and never
timeout. Eventually both playbooks failed at ~33k DNS entries as ssh wasn't
responding on the primary. I wasn't monitoring at that point so I didn't
get to see it happen. There is nothing from OOM in the logs so it doesn't
look like ssshd got killed from memory usage, when I was monitoring load
never got over2.
The VMs have 16GiB memory, 6 cores, and a 10Gib connection. They are
running CentOS 7 with FreeIPA 4.6.5. Logs on ipa1 show:
Jan 27 11:39:14 ipa1 ns-slapd: [27/Jan/2021:11:39:14.363156372 -0600] -
WARN - NSMMReplicationPlugin - acquire_replica - agmt="cn=
meToipa2.example.com" (ipa2:389): Unable to receive the response for a
startReplication extended operation to consumer (Timed out). Will retry
later.
For both the left and right replicas (ipa2 and ipa3).
The replicas show:
Jan 27 16:38:02 ipa2 named-pkcs11[2516]: LDAP query timed out. Try to
adjust "timeout" parameter
Jan 27 16:38:02 ipa2 named-pkcs11[2516]: zone
example.com/IN: serial
(1611787052) write back to LDAP failed
Jan 27 16:38:12 ipa2 named-pkcs11[2516]: LDAP query timed out. Try to
adjust "timeout" parameter
Jan 27 16:38:12 ipa2 named-pkcs11[2516]: zone 16.172.in-addr.arpa/IN:
serial (1611787062) write back to LDAP failed
Which eventually became:
Jan 27 16:57:32 ipa2 named-pkcs11[2516]: zone
example.com/IN: serial
(1611788192) write back to LDAP failed
Jan 27 16:58:22 ipa2 named-pkcs11[2516]: timeout in
ldap_pool_getconnection(): try to raise 'connections' parameter; potential
deadlock?
This was happening in the krb5kdc.log on the replicas around the same time:
Jan 27 15:20:41
ipa2.example.com krb5kdc[26712](info): AS_REQ (8 etypes
{18 17 20 19 16 23 25 26}) 10.201.1.5: LOOKING_UP_CLIENT:
markp(a)EXAMPLE.COM for krbtgt/EXAMPLE.COM(a)EXAMPLE.COM, Server error
In dirsrv/slapd-EXAMPLE-COM/error in the same timeframe:
[27/Jan/2021:15:58:04.885131721 -0600] - ERR - NSMMReplicationPlugin -
bind_and_check_pwp - agmt="cn=ipa2.example.com-to-ipa3.example.com"
(ipa3:389) - Replication bind with GSSAPI auth failed: LDAP error -1 (Can't
contact LDAP server) ()
While not frequently these appeared again until I rebooted the replicas
this morning. I could restart with 'ipactl restart`, it would just hang. I
let it sit for ten minutes at one point. `ipactl status` consistently
showed everything running.
My topology looks like this (both ca and domain are the same)
------------------
5 segments matched
------------------
Segment name:
ipa1.example.com-to-ipa2.example.com
Left node:
ipa1.example.com
Right node:
ipa2.example.com
Connectivity: both
Segment name:
ipa1.example.com-to-ipa3.example.com
Left node:
ipa1.example.com
Right node:
ipa3.example.com
Connectivity: both
Segment name:
ipa1.example.com-to-ipa4.example.com
Left node:
ipa1.example.com
Right node:
ipa4.example.com
Connectivity: both
Segment name:
ipa3.example.com-to-ipa2.example.com
Left node:
ipa3.example.com
Right node:
ipa2.example.com
Connectivity: both
Segment name:
ipa4.example.com-to-ipa3.example.com
Left node:
ipa4.example.com
Right node:
ipa3.example.com
Connectivity: both
----------------------------
Number of entries returned 5
----------------------------
Since the play takes slightly more than 2 seconds to run when creating
with reverse and slightly under 2 seconds when creating without reverse I
don't see why this should ever overload anything but I will freely admit I
am not all that familiar with the way DNS is handled. If FreeIPA is sending
the entire zone file for every update and it all has to be written to the
DB then I can see why that would be an issue. I could kill the replication
agreements, load the rest of the entries, then re-add the agreements so the
zone only needs to be transferred once. But it's still a bit concerning due
to the scenario I described above.
If we have a power outage and need to boot ~9k machines, all of which will
run:
ipa-client-install -U -q -p <service account for adding hosts> \
-w <some really secure password> \
--domain=example.com \
--server=ipa1.example.com \
--server=ipa2.example.com \
--server=ipa3.example.com \
--server=ipa4.example.com \
--force-join \
--enable-dns-updates \
--ssh-trust-dns \
--automount-location=<appropriate map>
Are we going to see everything fail in a spectacular manner and is there
anything I can do to mitigate the failure with adding DNS entries as I
still need to complete the addition and have ~5k per zone left for two
zones.