I tried running `sudo service named-pkcs11 stop` before the yum update, but FreeIPA still returned NXDOMAIN responses temporarily.

It seems like these responses occur about 10 seconds after the last log entry in /var/log/ipaupgrade.log ("The ipa-server-upgrade command was successful"). Based on the IPA "posttrans" script from the RPM, it seems likely the NXDOMAIN responses are being returned while the `/bin/systemctl restart ipa.service` command is running, however I cannot reproduce the NXDOMAIN responses by running `/bin/systemctl restart ipa.service` on its own. Something in the yum upgrade or ipa-server-upgrade process seems to trigger this different behaviour.

On Tue, Oct 24, 2017 at 1:45 PM Rob Crittenden <rcritten@redhat.com> wrote:
Nicholas Hinds via FreeIPA-users wrote:
> During an upgrade from 4.5.0-21.el7.centos.1.2
> to 4.5.0-21.el7.centos.2.2 on a CentOS 7.4 machine, FreeIPA's DNS server
> briefly returned NXDOMAIN for records which existed in FreeIPA. These
> invalid responses were returned for a very short amount of time, but
> caused long-running issues with Java clients which tend to cache DNS
> responses. Upgraded packages included: 389-ds-base, 389-ds-base-libs,
> 389-ds-base-snmp, ipa-client, ipa-client-common, ipa-python-compat,
> ipa-server, ipa-server-common, ipa-server-dns, ipa-server-trust-ad,
> python2-ipa-server, and a dozen sss-related packages.
>
> I reproduced this in a FreeIPA test environment by running `while true;
> do dig some.dns.entry.managed.by.freeipa @ip.address.of.freeipa | tee -a
> a-log-file; done` from one server, and running `yum update` on the
> FreeIPA machine. The invalid NXDOMAIN responses were returned some time
> after the `yum update` logged 'Cleanup' for the RPMs, and seemed to be
> during the 'Verifying' phase.
>
> These NXDOMAIN responses claimed that an upstream nameserver
> (a.root-servers.net <http://a.root-servers.net>) was the authority for
> my zone:
>
> a-log-file-; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.37.rc1.el6_7.7 <<>>
> some.dns.entry.managed.by.freeipa @172.16.0.77 <http://172.16.0.77>
> a-log-file-;; global options: +cmd
> a-log-file-;; Got answer:
> a-log-file:;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 2889
> a-log-file-;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1,
> ADDITIONAL: 0
> a-log-file-
> a-log-file-;; QUESTION SECTION:
> a-log-file-;some.dns.entry.managed.by.freeipa. IN A
> a-log-file-
> a-log-file-;; AUTHORITY SECTION:
> a-log-file-.60INSOAa.root-servers.net <http://a.root-servers.net>.
> nstld.verisign-grs.com <http://nstld.verisign-grs.com>. 2017102400 1800
> 900 604800 86400
> a-log-file-
> a-log-file-;; Query time: 227 msec
> a-log-file-;; SERVER: 172.16.0.77#53(172.16.0.77)
> a-log-file-;; WHEN: Tue Oct 24 18:30:28 2017
> a-log-file-;; MSG SIZEĀ  rcvd: 130
>
> Usually when querying an invalid DNS entry, the dig output still claims
> that my FreeIPA server is authoritative for the zone:
> $ dig doesntexist.zone.managed.by.freeipa @172.16.0.77 <http://172.16.0.77>
>
> ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.37.rc1.el6_7.7 <<>>
> doesntexist.zone.managed.by.freeipa @172.16.0.77 <http://172.16.0.77>
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 59953
> ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
>
> ;; QUESTION SECTION:
> ;doesntexist.zone.managed.by.freeipa. IN A
>
> ;; AUTHORITY SECTION:
> zone.managed.by.freeipa.30 INSOAidm01.freeipa.
> hostmaster.zone.managed.by.freeipa. 1508869828 30 900 1209600 30
>
> ;; Query time: 0 msec
> ;; SERVER: 172.16.0.77#53(172.16.0.77)
> ;; WHEN: Tue Oct 24 19:27:12 2017
> ;; MSG SIZEĀ  rcvd: 113
>
>
> Is it possible that during a yum update, the FreeIPA DNS server
> temporarily forgets what zones it's authoritative for (or forgets all
> DNS records) and just delegates to the upstream DNS server for half a
> second or so? Or is something else going on here?
>
> I'm open to suggestions.

The LDAP server is brought down during upgrades which is likely the
issue. bind can't connect to its backend. Why it returns NXDOMAIN I
don't know.

You may be able to manually work around this by manually stopping bind
before updating IPA, then starting it again afterwards.

rob