Re: Join command 500 errors, timeouts
by Alfred Victor
Hi FreeIPA list,
I did verify UDP is making it but for some reason ntpd commands fail in the
join anyway (seemingly by timeout), I'm not sure why yet but for now we use
--no-ntp after setting time just after boot so this isn't an issue. Curious
if anyone can spot anything helpful in those logs above please, or how we
can debug further to find out why our joins fail? The 401 not authorized
seems to be really pretty mysterious, and I'm not sure if that "Matching
credential not found" log message is a clue. Likewise, the "No permission
to join" and "Cannot contact any KDC" are troubling, and I'm not sure if
this is a clue below. So far I haven't managed to identify a network
problem and log messages seem few and far between for the nodes which fail
joins. Shared previously:
./krb5kdc.log:389805:Jun 21 14:12:07 hipa0002.redac.com krb5kdc[4746](info):
DISPATCH: repeated (*retransmit*ted?) request from 10.2.11.17, resending
previous response
Alfred
On Tue, Jun 22, 2021 at 1:12 AM Angus Clarke <angus(a)charworth.com> wrote:
> Hello
>
> I've not been following the whole topic but you can replace your UDP
> listener with netcat and indeed use netcat to simulate a client initiation
> too. This could help tell you whether your network is permitting that
> traffic or not.
>
> Duckduckgo found me this which looks suitable
>
> https://fsferrara.github.io/2020/08/02/sending-and-receiving-packets-usin...
>
> Regards
> Angus
>
> ------------------------------
> *From:* Alfred Victor via FreeIPA-users <
> freeipa-users(a)lists.fedorahosted.org>
> *Sent:* Tuesday, June 22, 2021 12:24:32 AM
> *To:* Rob Crittenden <rcritten(a)redhat.com>
> *Cc:* FreeIPA users list <freeipa-users(a)lists.fedorahosted.org>; Alfred
> Victor <alvic266(a)gmail.com>
> *Subject:* [Freeipa-users] Re: Join command 500 errors, timeouts
>
> Thanks again. I've enabled trace and will see what I find. I do know there
> may be an issue with UDP because it would log ntp failures until I did
> --no-ntp, but the port is open on firewall both sides so I believe the
> network is failing to deliver UDP.
>
> Alfred
>
> On Mon, Jun 21, 2021 at 4:39 PM Rob Crittenden <rcritten(a)redhat.com>
> wrote:
>
> Alfred Victor wrote:
> > Hi Rob and FreeIPA list,
> >
> > I wish I was able to say we are seeing success but aside from no longer
> > having the krb5kdc.log log spam, we still experience a lot of join
> > failures. However rather than seeing a ton of timeouts (still a few but
> > much less), we get a lot of errors like below, which totally puzzle us
> > as the ipactl services all report up, and mach_join account should have
> > sufficient privilege. I will note we have our own non-IPA DNS solution
> > where we have added all the records IPA should need but I can't see why
> > this change would cause our errors (and indeed all the output of "ipa
> > *dns*-update-system-records --dry-run" looks ok, not that below errors
> > appear DNS related but its the only change I can think of aside from
> > having replaced our IPA hosts with re-replicated new builds.)
> >
> > This works fine on rerun of install, and I don't see any evidence of
> > service failure on the system this happened against, but neither am I
> > aware of any intermittent network problem (perhaps there is a network
> > problem on the very first connection to IPA we have not identified? but
> > I would expect different errors than the 401 under this for instance)
> >
> > 2021-06-21T19:12:05Z DEBUG Initializing principal
> > mach_join(a)redac.COM using password
> > 2021-06-21T19:12:05Z DEBUG Starting external process
> > 2021-06-21T19:12:05Z DEBUG args=/usr/bin/kinit mach_join(a)redac.COM
> > -c /tmp/krbcc_xp2E2/ccache
> > 2021-06-21T19:12:24Z DEBUG Process finished, return code=1
> > 2021-06-21T19:12:24Z DEBUG stdout=
> > 2021-06-21T19:12:24Z DEBUG stderr=kinit: Cannot contact any KDC for
> > realm 'redac.COM' while getting initial credentials
> > 2021-06-21T19:12:24Z INFO Please make sure the following ports are
> > opened in the firewall settings:
> > TCP: 80, 88, 389
> > UDP: 88 (at least one of TCP/UDP ports 88 has to be open)
> > Also note that following ports are necessary for ipa-client working
> > properly after enrollment:
> > TCP: 464
> > UDP: 464, 123 (if NTP enabled)
> > 2021-06-21T19:12:24Z ERROR Installation failed. Rolling back changes.
> >
> >
> >
> > But then we also see things like this predominantly, in fact this seems
> > now to be the largest category of failure we observe at the highest rate
> > and makes no sense to me, because the mach_join account privileges
> > should not be changing:
> >
> > 2021-06-17T19:29:11Z DEBUG args=/usr/sbin/ipa-join -s
> > hipa0003.redac.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhipa000...>
> <http://hipa0003.redac.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhipa000...>>
> -b dc=redac,dc=com -h
> > hred2-4-18.redac.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhred2-4...>
> <http://hred2-4-18.redac.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhred2-4...>>
> -f
> > 2021-06-17T19:29:31Z DEBUG Process finished, return code=17
> > 2021-06-17T19:29:31Z DEBUG stdout=
> > 2021-06-17T19:29:31Z DEBUG stderr=HTTP response code is 401, not 200
> > 2021-06-17T19:29:31Z ERROR Joining realm failed: HTTP response code
> > is 401, not 200
> >
> >
> >
> > Some more like this, but not quite the same error:
> >
> >
> > 2021-06-17T19:29:32Z DEBUG get_ca_certs_from_ldap() error:
> > Insufficient access: SASL(-1): generic failure: GSSAPI Error:
> > Unspecified GSS failure. Minor code may provide more information
> > (Cannot contact any KDC for realm 'redac.COM')
> > 2021-06-17T19:29:32Z DEBUG Insufficient access: SASL(-1): generic
> > failure: GSSAPI Error: Unspecified GSS failure. Minor code may
> > provide more information (Cannot contact any KDC for realm
> 'redac.COM')
> >
> >
> > And still another example:
> >
> >
> >
> >
> > 2021-06-17T22:25:37Z DEBUG args=/usr/sbin/ipa-join -s
> > hipa0001.redac.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhipa000...>
> <http://hipa0001.redac.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhipa000...>>
> -b dc=redac,dc=com -h
> > hred2-4-18.redac.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhred2-4...>
> <http://hred2-4-18.redac.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhred2-4...>>
> -f
> > 2021-06-17T22:25:41Z DEBUG Process finished, return code=17
> > 2021-06-17T22:25:41Z DEBUG stdout=
> > 2021-06-17T22:25:41Z DEBUG stderr=No permission to join this host to
> > the IPA domain.
> > 2021-06-17T22:25:41Z ERROR Joining realm failed: No permission to
> > join this host to the IPA domain.
> > 2021-06-17T22:25:41Z ERROR Installation failed. Rolling back changes.
> >
> > Any thoughts or ideas please?
>
> Some of these could be network issues. I'd run ipa-client-install with
> Kerberos tracing:
>
> KRB5_TRACE=/path/to/some/log ipa-client-install <options>
>
> So that on failure you can see what KDCs were available, whether one was
> contacted, etc.
>
> rob
>
> >
> > Alfred
> >
> >
> > On Tue, Jun 15, 2021 at 3:19 PM Rob Crittenden <rcritten(a)redhat.com
> > <mailto:rcritten@redhat.com>> wrote:
> >
> > Alfred Victor wrote:
> > > I don't see a directive equivalent of SECURE_NFS to add to
> > nfs.conf (all
> > > documentation seems to still refer to the sysconfig path), or is
> > it the
> > > same? Can I just disable rpcgssd? We have no nfs mounts which are
> > > kerberized yet, and disabling rpcgssd seems to solve our problem,
> > and I
> > > can kinit after disabling rpcgssd. It also does not seem that
> > disabling
> > > rpcgssd hurts running tasks or node, but would like to confirm it's
> > > limited to nfs in function. I still have to wonder if there's a
> > better way.
> >
> > I suspect that SECURE_NFS is a no-op these days. It was necessary in
> > RHEL 5/6 for sure.
> >
> > Yes, I think you can safely disable the service. In some versions of
> > RHEL/Fedora IIRC this is symlinked to nfs-secure-server so mask that
> as
> > well.
> >
> > rob
> >
> > >
> > > Alfred
> > >
> > > On Tue, Jun 15, 2021 at 10:31 AM Alexander Bokovoy
> > <abokovoy(a)redhat.com <mailto:abokovoy@redhat.com>
> > > <mailto:abokovoy@redhat.com <mailto:abokovoy@redhat.com>>> wrote:
> > >
> > > On ti, 15 kesä 2021, Alfred Victor via FreeIPA-users wrote:
> > > >Hi Rob,
> > > >
> > > >We attempted setting sec=sys on the mount, however to our
> > surprise
> > > found
> > > >this didn't work. We then figured out that IPA install is
> adding
> > > this to
> > > >/etc/sysconfig/nfs:
> > > >
> > > >SECURE_NFS=yes
> > > >
> > > >
> > > >We tried removing this to no avail and restarting all the
> related
> > > sytstemd
> > > >units (rpcgssd, nfs, etc). Any idea why sec=sys is being
> ignored?
> > > Should we
> > > >need to set SECURE_NFS=no? On non-IPA nodes this directive
> > does not
> > > exist
> > > >at all. For now, I have also totally disabled rpcgssd as I
> think
> > > this unit
> > > >may be responsible ( it seems that it does the upcall in
> > > >https://access.redhat.com/solutions/225783
> <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faccess...>
> ) so I will hope
> > that this
> > > >solves it, I don't believe anything else depends on rpcgssd
> but
> > > will soon
> > > >find out. Any suggestions please? :)
> > >
> > > Depends on nfs-utils version? I remember there had been a
> > change in
> > > configuration in upstream nfs-utils in 2019:
> > >
> > > commit c69875c8afdd877baf7139c0cd5241f70105cbd4
> > > Author: François Cami <fcami(a)redhat.com
> > <mailto:fcami@redhat.com> <mailto:fcami@redhat.com
> > <mailto:fcami@redhat.com>>>
> > > Date: Tue Feb 26 13:59:06 2019 +0100
> > >
> > > ipa-client-automount: handle NFS configuration file
> changes
> > >
> > > nfs-utils in Fedora 30 and later switched its
> configuration
> > > file from /etc/sysconfig/nfs to /etc/nfs.conf, providing a
> > > conversion service (nfs-convert.service) for upgrades.
> > > However, for new installs the original configuration file
> > > is missing. This change:
> > > * adds a tuple-based osinfo.version_number method to
> handle
> > > more kinds of OS versioning schemes
> > > * detects RHEL and Fedora versions with the the new
> nfs-utils
> > > behavior
> > > * avoids backing up the new NFS configuration file as we
> do
> > > not have to modify it.
> > >
> > > See: https://bugzilla.redhat.com/show_bug.cgi?id=1676981
> <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzil...>
> > >
> > > Fixes: https://pagure.io/freeipa/issue/7868
> <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpagure...>
> > >
> > >
> > > >
> > > >Alfred
> > > >
> > > >On Thu, Jun 10, 2021 at 2:17 PM Rob Crittenden
> > <rcritten(a)redhat.com <mailto:rcritten@redhat.com>
> > > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>>
> wrote:
> > > >
> > > >> Alfred Victor wrote:
> > > >> > Thanks very much Rob et al! I believe we have found our
> root
> > > cause and
> > > >> > the fix. If you like I'll provide some more details after
> > we're
> > > done
> > > >> > with everything.
> > > >>
> > > >> Yes, knowing the cause would be great and could be helpful
> > to others!
> > > >>
> > > >> cheers
> > > >>
> > > >> rob
> > > >>
> > > >> >
> > > >> > Alfred
> > > >> >
> > > >> > On Thu, Jun 10, 2021 at 11:02 AM Rob Crittenden
> > > <rcritten(a)redhat.com <mailto:rcritten@redhat.com>
> > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>
> > > >> > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>
> > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>>> wrote:
> > > >> >
> > > >> > Alfred Victor wrote:
> > > >> > > Hi all,
> > > >> > >
> > > >> > > Just curious if anyone has suggestions about that
> > please
> > > before I
> > > >> get
> > > >> > > going in a couple of hours with conversions to IPA
> > again?
> > > I did
> > > >> > the math
> > > >> > > and 1,097,471 log messages in 5 hours is about 60
> > times per
> > > >> second, so
> > > >> > > I'm gradually becoming more certain this is why we
> can
> > > only boot
> > > >> 20-30
> > > >> > > nodes at a time when we used to boot hundreds.
> However,
> > > this is
> > > >> still
> > > >> > > just a guess as I don't know the mechanism behind
> > why this
> > > >> interferes
> > > >> > > with IPA joins, some bottleneck with the KDC?
> > > >> >
> > > >> > It sure seems like this is a kerberized NFS request
> for a
> > > host that
> > > >> > doesn't provide it, or doesn't have an nfs principal.
> I
> > > think you'd
> > > >> need
> > > >> > to monitor the offending client to see what it is
> doing.
> > > >> >
> > > >> > If the KDC is being dramatically slowed down by this
> then
> > > yes, you
> > > >> could
> > > >> > see slower Apache performance because it needs to
> obtain
> > > tickets on
> > > >> > behalf of the user doing the join. Whether that would
> > represent
> > > >> itself
> > > >> > as a read timeout I don't know.
> > > >> >
> > > >> > rob
> > > >> >
> > > >> > >
> > > >> > > Alfred
> > > >> > >
> > > >> > > On Wed, Jun 9, 2021 at 3:49 PM Alfred Victor
> > > <alvic266(a)gmail.com <mailto:alvic266@gmail.com>
> > <mailto:alvic266@gmail.com <mailto:alvic266@gmail.com>>
> > > >> > <mailto:alvic266@gmail.com
> > <mailto:alvic266@gmail.com> <mailto:alvic266@gmail.com
> > <mailto:alvic266@gmail.com>>>
> > > >> > > <mailto:alvic266@gmail.com
> > <mailto:alvic266@gmail.com> <mailto:alvic266@gmail.com
> > <mailto:alvic266@gmail.com>>
> > > <mailto:alvic266@gmail.com <mailto:alvic266@gmail.com>
> > <mailto:alvic266@gmail.com <mailto:alvic266@gmail.com>>>>> wrote:
> > > >> > >
> > > >> > > Hi Rob,
> > > >> > >
> > > >> > > I have reduced that timeout and will tune it
> > further.
> > > >> > Regarding ISE
> > > >> > > errors, I think we can make the assumption that
> > this is
> > > >> > entirely an
> > > >> > > issue of the web timeouts, I haven't seen any
> > evidence
> > > >> > otherwise and
> > > >> > > will have another attempt at converting nodes
> > > tomorrow, and
> > > >> with a
> > > >> > > keener eye of what to look for I can make a
> better
> > > >> determination
> > > >> > > then. I am most concerned over what the
> underlying
> > > cause might
> > > >> be
> > > >> > > causing it to take too long and hit the
> > timeout, and
> > > don't
> > > >> want to
> > > >> > > engineer around this by changing Apache
> timeouts if
> > > we can
> > > >> instead
> > > >> > > address the root cause. I am suspicious of krb
> log
> > > messages
> > > >> > flooding
> > > >> > > our IPA systems about a service principal like
> so,
> > > but not
> > > >> sure if
> > > >> > > this is gumming up the works (or even why this
> > > message has
> > > >> started
> > > >> > > appearing since the rebuild):
> > > >> > >
> > > >> > > ./krb5kdc.log:Jun 09 10:38:51
> > redacted.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> <http://redacted.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > <http://redacted.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > <http://redacted.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > <http://redacted.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > krb5kdc[31187](info): TGS_REQ (4 etypes {18 17 16 23})
> > > 10.1.1.27
> > > >> > <http://10.1.1.27
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2F10.1.1....>>:
> LOOKING_UP_SERVER: authtime 0,
> > > >> > host/redacted.redacted.com(a)redacted.COM for
> > > >> > nfs/nfsserver.redacted.com(a)redacted.COM, Server not
> > found in
> > > >> > Kerberos database
> > > >> > >
> > > >> > >
> > > >> > > Just in the last 5 hours alone, this log
> > message and
> > > others
> > > >> > like it (main difference is just the nodename it
> > > references) has
> > > >> > appeared 1,097,471 times. Conceivably there is also
> some
> > > log write
> > > >> > locking or something going on that could be slowing
> IPA
> > > down and
> > > >> > leading to our symptom here?
> > > >> > >
> > > >> > > Alfred
> > > >> > >
> > > >> > >
> > > >> > > On Wed, Jun 9, 2021 at 9:19 AM Rob Crittenden
> > > >> > <rcritten(a)redhat.com <mailto:rcritten@redhat.com>
> > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>
> > > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>
> > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>>
> > > >> > > <mailto:rcritten@redhat.com
> > <mailto:rcritten@redhat.com>
> > > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>
> > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>
> > > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>>>>
> > > >> wrote:
> > > >> > >
> > > >> > > Alfred Victor wrote:
> > > >> > > > Hi Rob,
> > > >> > > >
> > > >> > > > We did revert to 60s - I seem to remember
> > some
> > > ldapsearch
> > > >> > > timing out
> > > >> > > > previously but maybe we could still
> greatly
> > > reduce this
> > > >> with
> > > >> > > no ill
> > > >> > > > effect. However, we saw no change in join
> > > success either
> > > >> way
> > > >> > > and I have
> > > >> > > > not changed anything in Apache as I would
> > need
> > > to find
> > > >> the
> > > >> > > exact values
> > > >> > > > in question and I think these directive
> > changes
> > > may get
> > > >> lost
> > > >> > > with an
> > > >> > > > update? The timeout/ISE issues are new -
> we
> > > previously
> > > >> > booted
> > > >> > > many nodes
> > > >> > > > concurrently (400+) without problems but
> > now it
> > > happens
> > > >> even
> > > >> > > booting as
> > > >> > > > few as 50 results in 10-20 timeouts or
> ISE.
> > > This has
> > > >> > been the
> > > >> > > case at
> > > >> > > > least since we rebuilt and re-replicated
> the
> > > environment
> > > >> by
> > > >> > > swapping IPA
> > > >> > > > members out of the old one, but maybe
> this is
> > > unrelated.
> > > >> > Is it
> > > >> > > possible
> > > >> > > > we landed on a newer version with
> > different timeout
> > > >> > values? Is it
> > > >> > > > possible that IPA is under slightly more
> load
> > > from the
> > > >> > higher
> > > >> > > number of
> > > >> > > > nodes and has some bottleneck since we
> > last did
> > > >> conversions?
> > > >> > > We have not
> > > >> > > > been able to substantiate the latter by
> > looking at
> > > >> > > CPU/memory/io trends.
> > > >> > > > What might we investigate next to see if
> > we may
> > > have
> > > >> missed
> > > >> > > some ongoing
> > > >> > > > issue that could for instance cause
> locking
> > > problems or
> > > >> > > something else
> > > >> > > > internally in IPA to explain our symptoms?
> > > Also, I have
> > > >> > > observed some
> > > >> > > > nodes which report a successful
> > installation of IPA
> > > >> client,
> > > >> > > but have in
> > > >> > > > fact a lot of failures, for instance with
> > > mounts not
> > > >> > working from
> > > >> > > > automount setup. We will need to try to
> > > reproduce this to
> > > >> > > understand
> > > >> > > > what happened I think as we have already
> > sorted
> > > those
> > > >> nodes
> > > >> > > elsewhere
> > > >> > > > and they have got going outside of IPA.
> > > >> > > >
> > > >> > > > I am interested that you suggested "at
> least
> > > some of the
> > > >> > > clients aren't
> > > >> > > > connecting at all and increasing the
> timeout
> > > could make
> > > >> this
> > > >> > > worse" -
> > > >> > > > this might indicate some sort of network
> > problem at
> > > >> > play, but
> > > >> > > as far as
> > > >> > > > I am aware, everything is working absent
> > IPA so
> > > I do not
> > > >> > > suspect it
> > > >> > > > presently.
> > > >> > >
> > > >> > > A 60 second LDAP timeout is still way too
> big.
> > > The default
> > > >> > is 2.
> > > >> > > Unless
> > > >> > > you are seeing timeouts I'd suggest
> > lowering it.
> > > >> > >
> > > >> > > I'm only seeing hints to what you're seeing
> and
> > > the scope
> > > >> so
> > > >> > > it's hard
> > > >> > > to make further suggestions. Are all the
> > internal
> > > errors
> > > >> the
> > > >> > > same, for
> > > >> > > example? Are some failing due to LDAP
> > timeouts or
> > > are they
> > > >> all
> > > >> > > wsgi read
> > > >> > > timeouts?
> > > >> > >
> > > >> > > The Apache request timeout can be
> configured in
> > > httpd.conf
> > > >> > which is
> > > >> > > independent of the config files that IPA
> > > currently writes
> > > >> > so it
> > > >> > > should
> > > >> > > survive upgrades.
> > > >> > >
> > > >> > > rob
> > > >> > > >
> > > >> > > > Alfred
> > > >> > > >
> > > >> > > > On Mon, Jun 7, 2021 at 4:42 PM Rob
> Crittenden
> > > >> > > <rcritten(a)redhat.com
> > <mailto:rcritten@redhat.com> <mailto:rcritten@redhat.com
> > <mailto:rcritten@redhat.com>>
> > > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>
> > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>>
> > > >> > <mailto:rcritten@redhat.com
> > <mailto:rcritten@redhat.com> <mailto:rcritten@redhat.com
> > <mailto:rcritten@redhat.com>>
> > > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>
> > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>>>
> > > >> > > > <mailto:rcritten@redhat.com
> > <mailto:rcritten@redhat.com>
> > > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>
> > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>
> > > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>>
> > > >> > <mailto:rcritten@redhat.com
> > <mailto:rcritten@redhat.com> <mailto:rcritten@redhat.com
> > <mailto:rcritten@redhat.com>>
> > > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>
> > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>>>>> wrote:
> > > >> > > >
> > > >> > > > Alfred Victor wrote:
> > > >> > > > > Actually, no change happened from
> > 300-> 600
> > > >> > timeout, the
> > > >> > > web portal
> > > >> > > > > itself gave me an ISE I hadn't
> noticed
> > > when I tried
> > > >> > > clicking save!
> > > >> > > >
> > > >> > > > I wasn't clear which log to look in.
> > You'll see
> > > >> details
> > > >> > > about where the
> > > >> > > > error is caught in IPA in the Apache
> > log.
> > > To see
> > > >> LDAP
> > > >> > > timeouts you look
> > > >> > > > for err=3 in the 389-ds access log.
> > > >> > > >
> > > >> > > > But since you posted a traceback this
> > time it's
> > > >> > clear that
> > > >> > > this is just
> > > >> > > > Apache waiting for client data to
> read so
> > > any tuning
> > > >> you
> > > >> > > do needs to be
> > > >> > > > in Apache. You could try tuning
> > Timeout which
> > > >> > defaults to
> > > >> > > 60 but this
> > > >> > > > doesn't seem likely to help since at
> > least
> > > some of
> > > >> the
> > > >> > > clients aren't
> > > >> > > > connecting at all and increasing the
> > > timeout could
> > > >> make
> > > >> > > this worse.
> > > >> > > >
> > > >> > > > Please revert the searchtimeout. 300
> > seconds is
> > > >> > orders of
> > > >> > > magnitude
> > > >> > > > too big.
> > > >> > > >
> > > >> > > > rob
> > > >> > > >
> > > >> > > > >
> > > >> > > > > Alfred
> > > >> > > > >
> > > >> > > > > On Mon, Jun 7, 2021 at 3:57 PM
> > Alfred Victor
> > > >> > > <alvic266(a)gmail.com
> > <mailto:alvic266@gmail.com> <mailto:alvic266@gmail.com
> > <mailto:alvic266@gmail.com>>
> > > <mailto:alvic266@gmail.com <mailto:alvic266@gmail.com>
> > <mailto:alvic266@gmail.com <mailto:alvic266@gmail.com>>>
> > > >> > <mailto:alvic266@gmail.com
> > <mailto:alvic266@gmail.com> <mailto:alvic266@gmail.com
> > <mailto:alvic266@gmail.com>>
> > > <mailto:alvic266@gmail.com <mailto:alvic266@gmail.com>
> > <mailto:alvic266@gmail.com <mailto:alvic266@gmail.com>>>>
> > > >> > > > <mailto:alvic266@gmail.com
> > <mailto:alvic266@gmail.com>
> > > <mailto:alvic266@gmail.com <mailto:alvic266@gmail.com>>
> > > >> > <mailto:alvic266@gmail.com
> > <mailto:alvic266@gmail.com> <mailto:alvic266@gmail.com
> > <mailto:alvic266@gmail.com>>>
> > > <mailto:alvic266@gmail.com <mailto:alvic266@gmail.com>
> > <mailto:alvic266@gmail.com <mailto:alvic266@gmail.com>>
> > > >> > <mailto:alvic266@gmail.com
> > <mailto:alvic266@gmail.com> <mailto:alvic266@gmail.com
> > <mailto:alvic266@gmail.com>>>>>
> > > >> > > > > <mailto:alvic266@gmail.com
> > <mailto:alvic266@gmail.com>
> > > <mailto:alvic266@gmail.com <mailto:alvic266@gmail.com>>
> > > >> > <mailto:alvic266@gmail.com
> > <mailto:alvic266@gmail.com> <mailto:alvic266@gmail.com
> > <mailto:alvic266@gmail.com>>>
> > > <mailto:alvic266@gmail.com <mailto:alvic266@gmail.com>
> > <mailto:alvic266@gmail.com <mailto:alvic266@gmail.com>>
> > > >> > <mailto:alvic266@gmail.com
> > <mailto:alvic266@gmail.com> <mailto:alvic266@gmail.com
> > <mailto:alvic266@gmail.com>>>>
> > > >> > > <mailto:alvic266@gmail.com
> > <mailto:alvic266@gmail.com>
> > > <mailto:alvic266@gmail.com <mailto:alvic266@gmail.com>>
> > <mailto:alvic266@gmail.com <mailto:alvic266@gmail.com>
> > > <mailto:alvic266@gmail.com <mailto:alvic266@gmail.com>>>
> > > >> > <mailto:alvic266@gmail.com
> > <mailto:alvic266@gmail.com> <mailto:alvic266@gmail.com
> > <mailto:alvic266@gmail.com>>
> > > <mailto:alvic266@gmail.com <mailto:alvic266@gmail.com>
> > <mailto:alvic266@gmail.com <mailto:alvic266@gmail.com>>>>>>> wrote:
> > > >> > > > >
> > > >> > > > > Hi FreeIPA list,
> > > >> > > > >
> > > >> > > > > I don't see any in error log
> that
> > > match `grep
> > > >> > -i "err=3"
> > > >> > > > > /var/log/httpd/error_log`. We
> > have tried
> > > >> raising
> > > >> > > > searchtimelimit as
> > > >> > > > > high as 120, then 300 (now are
> > trying
> > > 600) but
> > > >> > > observed no
> > > >> > > > > difference in the rate at which
> > nodes
> > > >> succeeded or
> > > >> > > failed in IPA
> > > >> > > > > joins. We are somewhat puzzled
> by
> > > this, as
> > > >> none of
> > > >> > > the other
> > > >> > > > values
> > > >> > > > > we are aware of might have
> changed,
> > > though it
> > > >> is
> > > >> > > possible that the
> > > >> > > > > IPA systems are under a little
> > higher
> > > demand
> > > >> > from client
> > > >> > > > systems, we
> > > >> > > > > have tried to mitigate this by
> > > shutting down
> > > >> some
> > > >> > > workflows and
> > > >> > > > > aren't sure whether we've seen
> any
> > > improvement.
> > > >> > > Short of adjusting
> > > >> > > > > apache/resource/process
> > timeouts it is
> > > >> > difficult to
> > > >> > > say what might
> > > >> > > > > be wrong. To give an example,
> > out of
> > > 250 nodes
> > > >> > > rebooted, only 112
> > > >> > > > > joined IPA successfully. Here
> > is some
> > > output
> > > >> from
> > > >> > > the error log,
> > > >> > > > > following what this looks like
> > in the ipa
> > > >> client
> > > >> > > install log
> > > >> > > > (error
> > > >> > > > > log output will match the node
> > attempt):
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > 2021-06-07T18:25:30Z DEBUG The
> > > >> ipa-client-install
> > > >> > > command
> > > >> > > > failed, exception: NetworkError:
> cannot
> > > connect to
> > > >> > > > 'https://redactednode.com/ipa/json
> <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredact...>
> > > >> > > >
> > <https://hauth0004.dug.com/ipa/json
> <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhauth0...>>':
> Internal
> > > >> > Server Error
> > > >> > > > >
> > > >> > > > > [Mon Jun 07 13:25:06.198259
> 2021]
> > > [core:error]
> > > >> > [pid
> > > >> > > 25020]
> > > >> > > > [client 10.1.24.48:47808
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2F10.1.24...>
> > <http://10.1.24.48:47808
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2F10.1.24...>
> >
> > > <http://10.1.24.48:47808
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2F10.1.24...>>
> <http://10.1.24.48:47808
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2F10.1.24...>
> >
> > > >> > <http://10.1.24.48:47808
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2F10.1.24...>
> >
> > > >> > > <http://10.1.24.48:47808
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2F10.1.24...>
> >
> > > >> > > > <http://10.1.24.48:47808
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2F10.1.24...>>]
> Script
> > timed out
> > > before
> > > >> > returning
> > > >> > > > headers: wsgi.py, referer:
> > > >> > > https://redacted.redacted.com/ipa/xml
> <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredact...>
> > > >> > > > <https://hauth0004.dug.com/ipa/xml
> <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhauth0...>
> >
> > > >> > > > >
> > > >> > > > > Different node, same time
> period:
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > [Mon Jun 07 13:24:02.178092
> 2021]
> > > [:error] [pid
> > > >> > > 25725] ipa:
> > > >> > > > INFO: [xmlserver]
> mach_join(a)redacted.COM:
> > > >> > > join(u'redacted.node.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> > <http://redacted.node.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > <http://redacted.node.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>>
> <http://redacted.node.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > <http://redacted.node.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > > > <http://redacted.node.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > <http://redacted.node.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> > > >> >',
> > > >> > > > nshardwareplatform=u'x86_64',
> > > >> > > >
> > > >> nsosversion=u'3.10.0-1062.18.1.1.el7.redacted.x86_64',
> > > >> > > > version=u'2.51'): TimeLimitExceeded
> > > >> > > > >
> > > >> > > > > I also saw this:
> > > >> > > > >
> > > >> > > > > [Mon Jun 07 13:25:07.103503
> 2021]
> > > [:error] [pid
> > > >> > > 25725] ipa:
> > > >> > > > ERROR: non-public: IOError: request
> data
> > > read error
> > > >> > > > > [Mon Jun 07 13:25:07.103529
> 2021]
> > > [:error]
> > > >> > [pid 25725]
> > > >> > > > Traceback (most recent call last):
> > > >> > > > > [Mon Jun 07 13:25:07.103536
> 2021]
> > > [:error] [pid
> > > >> > > 25725] File
> > > >> > > >
> > > >> >
> > "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py",
> > > >> > > line 360,
> > > >> > > > in wsgi_execute
> > > >> > > > > [Mon Jun 07 13:25:07.103542
> 2021]
> > > [:error] [pid
> > > >> > > 25725]
> > > >> > > > data = read_input(environ)
> > > >> > > > > [Mon Jun 07 13:25:07.103548
> 2021]
> > > [:error] [pid
> > > >> > > 25725] File
> > > >> > > >
> > > >> >
> > "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py",
> > > >> > > line 200,
> > > >> > > > in read_input
> > > >> > > > > [Mon Jun 07 13:25:07.103553
> 2021]
> > > [:error] [pid
> > > >> > > 25725]
> > > >> > > > return
> > > >> > environ['wsgi.input'].read(length).decode('utf-8')
> > > >> > > > > [Mon Jun 07 13:25:07.103559
> 2021]
> > > [:error]
> > > >> > [pid 25725]
> > > >> > > > IOError: request data read error
> > > >> > > > > [Mon Jun 07 13:25:07.103826
> 2021]
> > > [:error] [pid
> > > >> > > 25725] ipa:
> > > >> > > > INFO: [xmlserver]
> > mach_join(a)redacted.COM: None:
> > > >> > InternalError
> > > >> > > > > [Mon Jun 07 13:25:07.149962
> 2021]
> > > [:error] [pid
> > > >> > > 25726] ipa:
> > > >> > > > ERROR: non-public: IOError: request
> data
> > > read error
> > > >> > > > > [Mon Jun 07 13:25:07.149984
> 2021]
> > > [:error]
> > > >> > [pid 25726]
> > > >> > > > Traceback (most recent call last):
> > > >> > > > > [Mon Jun 07 13:25:07.149991
> 2021]
> > > [:error] [pid
> > > >> > > 25726] File
> > > >> > > >
> > > >> >
> > "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py",
> > > >> > > line 360,
> > > >> > > > in wsgi_execute
> > > >> > > > > [Mon Jun 07 13:25:07.149997
> 2021]
> > > [:error] [pid
> > > >> > > 25726]
> > > >> > > > data = read_input(environ)
> > > >> > > > > [Mon Jun 07 13:25:07.150002
> 2021]
> > > [:error] [pid
> > > >> > > 25726] File
> > > >> > > >
> > > >> >
> > "/usr/lib/python2.7/site-packages/ipaserver/rpcserver.py",
> > > >> > > line 200,
> > > >> > > > in read_input
> > > >> > > > > [Mon Jun 07 13:25:07.150008
> 2021]
> > > [:error] [pid
> > > >> > > 25726]
> > > >> > > > return
> > > >> > environ['wsgi.input'].read(length).decode('utf-8')
> > > >> > > > > [Mon Jun 07 13:25:07.150013
> 2021]
> > > [:error]
> > > >> > [pid 25726]
> > > >> > > > IOError: request data read error
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > After setting the timeout to
> > 600 and
> > > rebooting
> > > >> the
> > > >> > > remaining
> > > >> > > > 139 nodes from the initial set of
> 250, 83
> > > joined of
> > > >> the
> > > >> > > 139 and we
> > > >> > > > still had ISE occurring. In some
> > cases, it
> > > would ISE
> > > >> on
> > > >> > > the first
> > > >> > > > attempt, try another IPA system, and
> > > succeed. I'm
> > > >> > not sure
> > > >> > > that even
> > > >> > > > such a long timeout as 600 has helped.
> > > >> > > > >
> > > >> > > > > Alfred
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Thu, Jun 3, 2021 at 7:51 PM
> Rob
> > > Crittenden
> > > >> > > > <rcritten(a)redhat.com
> > <mailto:rcritten@redhat.com>
> > > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>
> > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>
> > > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>>
> > > >> > <mailto:rcritten@redhat.com
> > <mailto:rcritten@redhat.com> <mailto:rcritten@redhat.com
> > <mailto:rcritten@redhat.com>>
> > > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>
> > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>>>
> > > >> > > <mailto:rcritten@redhat.com
> > <mailto:rcritten@redhat.com>
> > > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>
> > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>
> > > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>>
> > > >> > <mailto:rcritten@redhat.com
> > <mailto:rcritten@redhat.com> <mailto:rcritten@redhat.com
> > <mailto:rcritten@redhat.com>>
> > > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>
> > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>>>>
> > > >> > > > > <mailto:rcritten@redhat.com
> > <mailto:rcritten@redhat.com>
> > > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>
> > > >> > <mailto:rcritten@redhat.com
> > <mailto:rcritten@redhat.com> <mailto:rcritten@redhat.com
> > <mailto:rcritten@redhat.com>>>
> > > >> > > <mailto:rcritten@redhat.com
> > <mailto:rcritten@redhat.com>
> > > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>
> > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>
> > > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>>>
> > > >> > <mailto:rcritten@redhat.com
> > <mailto:rcritten@redhat.com> <mailto:rcritten@redhat.com
> > <mailto:rcritten@redhat.com>>
> > > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>
> > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>>
> > > >> > > <mailto:rcritten@redhat.com
> > <mailto:rcritten@redhat.com>
> > > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>
> > > >> > <mailto:rcritten@redhat.com <mailto:
> rcritten(a)redhat.com>
> > > <mailto:rcritten@redhat.com <mailto:rcritten@redhat.com>>>>>>>
> > wrote:
> > > >> > > > >
> > > >> > > > > Alfred Victor via
> FreeIPA-users
> > > wrote:
> > > >> > > > > > Hi FreeIPA list,
> > > >> > > > > >
> > > >> > > > > > We are having an issue
> > with our IPA
> > > >> > > environment of 4
> > > >> > > > > replicated FreeIPA
> > > >> > > > > > systems serving linux
> compute
> > > clients
> > > >> > > which join from a
> > > >> > > > command in
> > > >> > > > > > rc.local after boot. This
> > > worked in the
> > > >> > past,
> > > >> > > but the system
> > > >> > > > > has been
> > > >> > > > > > rebuilt since and the
> > join command
> > > >> changed
> > > >> > > > slightly. Unfortunately
> > > >> > > > > > booting a few dozen nodes
> > at a
> > > time,
> > > >> though
> > > >> > > they each
> > > >> > > > talk to a
> > > >> > > > > > different IPA system by
> > design,
> > > leads to
> > > >> > > problems such as
> > > >> > > > > these - though
> > > >> > > > > > 40-100 nodes can boot ok
> at a
> > > time there
> > > >> are
> > > >> > > always many
> > > >> > > > > stragglers, and
> > > >> > > > > > the more we attempt to
> > boot at
> > > once the
> > > >> more
> > > >> > > fail to
> > > >> > > > join IPA
> > > >> > > > > (if we try
> > > >> > > > > > to boot 500 nodes, we are
> > lucky
> > > if we
> > > >> get a
> > > >> > > fifth of that
> > > >> > > > > joining IPA).
> > > >> > > > > > Can you please advise on
> this
> > > output?
> > > >> > Here is
> > > >> > > our join
> > > >> > > > command in
> > > >> > > > > > compute node rc.local:
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > ipa-client-install -U
> > -q -p
> > > >> mach_join \
> > > >> > > > > > -w <redacted> \
> > > >> > > > > > --force-join \
> > > >> > > > > > --no-dns-sshfp \
> > > >> > > > > >
> > > --automount-location=redacted-node
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > And here is some log
> > output of
> > > the 500
> > > >> > error:
> > > >> > > > > >
> > > >> > > > > > ProtocolError:
> > > <ProtocolError for
> > > >> > > > >
> > redacted.redacted.com/ipa/json
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> <http://redacted.redacted.com/ipa/json
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > <http://redacted.redacted.com/ipa/json
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > <http://redacted.redacted.com/ipa/json
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > > <http://redacted.redacted.com/ipa/json
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > > > <
> http://redacted.redacted.com/ipa/json
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > > > >
> > > <http://redacted.redacted.com/ipa/json
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > > > >
> > > <http://redacted.redacted.com/ipa/json
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >:
> > > >> 500
> > > >> > > Internal
> > > >> > > > Server Error>
> > > >> > > > > > Cannot connect to the
> > > server due to
> > > >> > > generic error:
> > > >> > > > cannot
> > > >> > > > > connect to
> > > >> > 'https://redacted.redacted.com/ipa/json
> <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredact...>
> > > >> > > > >
> > > <https://hauth0003.dug.com/ipa/json
> <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhauth0...>
> >':
> > > >> > Internal
> > > >> > > Server Error
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > As well as:
> > > >> > > > > >
> > > >> > > > > > 2021-06-02T21:39:11Z
> > DEBUG
> > > Starting
> > > >> > > external process
> > > >> > > > > > 2021-06-02T21:39:11Z
> > DEBUG
> > > >> > > args=/usr/sbin/ipa-join -s
> > > >> > > > > > redacted.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> > <http://redacted.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > <http://redacted.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > <http://redacted.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > > <http://redacted.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>>
> <
> > > >> http://redacted.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > > > <http://redacted.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > > > >
> > <http://redacted.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>>
> -b
> > > >> > > > > > dc=redacted,dc=com -h
> > > >> > > redactednode.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> > <http://redactednode.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > <http://redactednode.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > <http://redactednode.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > <http://redactednode.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> > > >> >
> > > >> > > > <http://redactednode.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > > > >
> > <http://redactednode.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > > > > >
> > > <http://redactednode.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> -f
> > > >> > > > > > 2021-06-02T21:40:13Z
> > DEBUG
> > > Process
> > > >> > > finished, return
> > > >> > > > code=17
> > > >> > > > > > 2021-06-02T21:40:13Z
> > DEBUG
> > > stdout=
> > > >> > > > > > 2021-06-02T21:40:13Z
> > DEBUG
> > > >> stderr=HTTP
> > > >> > > response code is
> > > >> > > > > 500, not 200
> > > >> > > > > > 2021-06-02T21:40:13Z
> > ERROR
> > > Joining
> > > >> realm
> > > >> > > failed: HTTP
> > > >> > > > > response code
> > > >> > > > > > is 500, not 200
> > > >> > > > > >
> > > >> > > > > > And we also see timeouts
> > happen:
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > 2021-06-02T22:08:50Z
> > DEBUG
> > > >> > > args=/usr/sbin/ipa-join -s
> > > >> > > > > > redacted.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> > <http://redacted.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > <http://redacted.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > <http://redacted.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > > <http://redacted.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>>
> <
> > > >> http://redacted.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > > > <http://redacted.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > > > >
> > <http://redacted.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>>
> -b
> > > >> > > > > > dc=redacted,dc=com -h
> > > >> > > redactednode.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> > <http://redactednode.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > <http://redactednode.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > <http://redactednode.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > <http://redactednode.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> > > >> >
> > > >> > > > <http://redactednode.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > > > >
> > <http://redactednode.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> > > > > >
> > > <http://redactednode.redacted.com
> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fredacte...>
> >
> > > >> -f
> > > >> > > > > > 2021-06-02T22:09:01Z
> > DEBUG
> > > Process
> > > >> > > finished, return
> > > >> > > > code=17
> > > >> > > > > > 2021-06-02T22:09:01Z
> > DEBUG
> > > stdout=
> > > >> > > > > > 2021-06-02T22:09:01Z
> > DEBUG
> > > stderr=RPC
> > > >> > > failed at server.
> > > >> > > > > Configured
> > > >> > > > > > time limit exceeded
> > > >> > > > > > 2021-06-02T22:09:01Z
> > ERROR
> > > Joining
> > > >> realm
> > > >> > > failed: RPC
> > > >> > > > failed at
> > > >> > > > > > server. Configured
> > time limit
> > > >> exceeded
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > And we also see later
> > timeouts
> > > near the
> > > >> > end of
> > > >> > > the log
> > > >> > > > in some
> > > >> > > > �� > cases though are able to
> > > authenticate and
> > > >> it
> > > >> > > didn't back
> > > >> > > > out the
> > > >> > > > > install, but never got
> > going healthy
> > > >> either:
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > 2021-06-03T19:20:13Z
> > DEBUG The
> > > >> > > ipa-client-install
> > > >> > > > command
> > > >> > > > > failed,
> > > >> > > > > > exception:
> > TimeLimitExceeded:
> > > >> Configured
> > > >> > > time limit
> > > >> > > > exceeded
> > > >> > > > >
> > > >> > > > > When you see Internal Error
> > look
> > > to the
> > > >> Apache
> > > >> > > error log
> > > >> > > > on the
> > > >> > > > > server
> > > >> > > > > for more information.
> > > >> > > > >
> > > >> > > > > In this case an LDAP search
> is
> > > failing
> > > >> because
> > > >> > > the server
> > > >> > > > is too
> > > >> > > > > busy.
> > > >> > > > > Look for queries failing
> with
> > > err=3 to get
> > > >> an
> > > >> > > idea of how long
> > > >> > > > > it is taking.
> > > >> > > > >
> > > >> > > > > To increase the timeout
> > use: ipa
> > > config-mod
> > > >> > > > --searchtimelimit=INT
> > > >> > > > >
> > > >> > > > > The default is 2 seconds.
> > > >> > > > >
> > > >> > > > > You can pick a time at
> > random but
> > > could see
> > > >> > > failures again.
> > > >> > > > >
> > > >> > > > > rob
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > >
> > >
> > >
> > >
> > > --
> > > / Alexander Bokovoy
> > > Sr. Principal Software Engineer
> > > Security / Identity Management Engineering
> > > Red Hat Limited, Finland
> > >
> >
>
>
2 years, 3 months
Simulate the AD 'Computer' Object
by Joseph Fry
I support a product that natively supports ingesting computers from Active Directory via an LDAP query.
We are able to ingest users from FreeIPA (actually Redhat IDM) without a problem, but hosts cannot be ingested because the objectClass and attributes are different than what is used by ActiveDirectory.
Is there a simple solution in FreeIPA to have the host objects simulate an AD 'computer' object?
2 years, 3 months
Join command 500 errors, timeouts
by Alfred Victor
Hi FreeIPA list,
We are having an issue with our IPA environment of 4 replicated FreeIPA
systems serving linux compute clients which join from a command in rc.local
after boot. This worked in the past, but the system has been rebuilt since
and the join command changed slightly. Unfortunately booting a few dozen
nodes at a time, though they each talk to a different IPA system by design,
leads to problems such as these - though 40-100 nodes can boot ok at a time
there are always many stragglers, and the more we attempt to boot at once
the more fail to join IPA (if we try to boot 500 nodes, we are lucky if we
get a fifth of that joining IPA). Can you please advise on this output?
Here is our join command in compute node rc.local:
ipa-client-install -U -q -p mach_join \
> -w <redacted> \
> --force-join \
> --no-dns-sshfp \
> --automount-location=redacted-node
And here is some log output of the 500 error:
ProtocolError: <ProtocolError for redacted.redacted.com/ipa/json: 500
Internal Server Error>
> Cannot connect to the server due to generic error: cannot connect to 'https://redacted.redacted.com/ipa/json <https://hauth0003.dug.com/ipa/json>': Internal Server Error
>
>
As well as:
2021-06-02T21:39:11Z DEBUG Starting external process
> 2021-06-02T21:39:11Z DEBUG args=/usr/sbin/ipa-join -s
> redacted.redacted.com -b dc=redacted,dc=com -h redactednode.redacted.com
> -f
> 2021-06-02T21:40:13Z DEBUG Process finished, return code=17
> 2021-06-02T21:40:13Z DEBUG stdout=
> 2021-06-02T21:40:13Z DEBUG stderr=HTTP response code is 500, not 200
> 2021-06-02T21:40:13Z ERROR Joining realm failed: HTTP response code is
> 500, not 200
And we also see timeouts happen:
>
> 2021-06-02T22:08:50Z DEBUG args=/usr/sbin/ipa-join -s redacted.redacted.com -b dc=redacted,dc=com -h redactednode.redacted.com -f
> 2021-06-02T22:09:01Z DEBUG Process finished, return code=17
> 2021-06-02T22:09:01Z DEBUG stdout=
> 2021-06-02T22:09:01Z DEBUG stderr=RPC failed at server. Configured time limit exceeded
> 2021-06-02T22:09:01Z ERROR Joining realm failed: RPC failed at server. Configured time limit exceeded
And we also see later timeouts near the end of the log in some cases though
are able to authenticate and it didn't back out the install, but never got
going healthy either:
>
> 2021-06-03T19:20:13Z DEBUG The ipa-client-install command failed,
> exception: TimeLimitExceeded: Configured time limit exceeded
Donald
2 years, 3 months
ipactl status hangs
by Ronald Wimmer
On some servers I can see that ipactl status hangs from time to time. I
can see SSSD reporting the backend to be offline and dirsrv not running
for some reason. ipactl -d status reveals that it hangs when issuing
systemctl is-active dirsrv. Instead of hanging I would expect ipcactl
status to report a problem (= dirsrv not running).
Cheers,
Ronald
2 years, 3 months
sssd version 2.2.3 issues with AD Trust View
by iulian roman
Hello everybody,
I have an IPA setup with AD trust configured and Trust View defined on the IPA server. Everything works properly on Ubuntu 18 clients with sssd 1.16.1 but it doesn't on Ubuntu 20 with sssd version 2.2.3. I can list /query the AD accounts which are not part of the default Trust View, but not those accounts which have the id overriden in the Trust View.
Is that a known issue, or any idea what do I need to change /where to look ?
2 years, 3 months
Consumer failed to replay change Operations error (1)
by Alfred Victor
Hi FreeIPA,
We have some replication messages in our slapd errors log which look very
like the ones discussed here:
https://bugzilla.redhat.com/show_bug.cgi?id=1574602
I took a look and we do have the MemberOf plugin, but our version of 389-ds
newer:
*389-ds-base-1.3.10.2-10.el7_9.x86_64*
Hoping someone might have a suggestion for what we might do to get rid of
these log messages, or what the root cause may be/impact? They've been
going since at least a couple of weeks ago:
[15/Jun/2021:18:57:26.362094959 -0500] - WARN - NSMMReplicationPlugin
- repl5_inc_update_from_op_result -
agmt="cn=redactedauth0001.redacted.com-to-redactedauth0003.redacted.com"
(redactedauth0003:389): Consumer failed to replay change (uniqueid
d5896001-39a111eb-8868efc8-91dc0b98, CSN 60c93bc2000400250000):
Operations error (1). Will retry later.
I looked for this same uniqueid (they are ALL the same uniqueID) and
found this which is interesting and references a specific cn and
"optype":
>
> [03/Jun/2021:15:45:43.332068775 -0500] - ERR - NSMMReplicationPlugin - write_changelog_and_ruv - Can't add a change for cn=admin,cn=groups,cn=accounts,dc=redacted,dc=com (uniqid: d5896001-39a111eb-8868efc8-91dc0b98, optype: 8) to changelog csn 60b93f93005200230000
Alfred
2 years, 3 months
IPA RA expired, other certificates renewed
by Jan Bundesmann
Hi there,
I need some suggestions for a certificate related problem.
The setup has 2 servers, let's call them ldap1 and ldap2 with ldap1 being the primary system with the CA.
The certificates were to expire on june 15.
I checked on june 1st and on ldap1 certmonger had renewed all certificates, on ldap2 certmonger was not running.
So, I restarted the certmonger service and it began its work. `getcert list` shows three certificates (it's ipa 4.4, so that's probably correct)
Quite soon, the first certificate was renewed (HTTP/ldap2, ...) I assume that's the one for the web UI. A second one (ldap/ldap2...) is still valid until december. I assume that's why all the ldap related stuff and replication is still working.
But the cn=IPA RA expired one week ago (may 24th).
I have no ipa-certs-fix, would setting back the system clock still work? The HTTP/ldap2 certificate was not yet valid when the IPA RA certificate expired.
Or put the the other round: what happens if i don't renew this certificate - that's not quite clear to me. Currently, the system ist working fine, replication works and in 2022 the hardware will be replaced, so we will setup new replicas anyways. But, that's after the expiration date of the ldap/ldap2 certificate.
I hope this is understandable and thanks in advance for any hint.
2 years, 3 months
Doc suggestion: explicitly advise 'non-desktop' spins for freeipa-server*
by Harry G. Coin
Might the 'edition' (server, desktop, iot, whatnot) of the distribution
used in testing freeipa-server* be explicitly stated in the 'getting
started' docs as being 'approved' for freeipa-server use? The better
to avoid interactions with un-interaction-tested packages / security
libraries generally seen only in user/special-purpose distros. (re:
dnssec / bind9 / smart-card interaction)
HC
2 years, 3 months
AD Trust Types
by Ronald Wimmer
Quite some time ago I added a trust to another AD domain. IIRC I added
an "external trust" for a reason I do not remember.
What is the "Non-transitive external trust to a domain in another Active
Directory forest" trust type for? Could I not just have added another
"Active Directory domain" trust?
Any clarification on this matter would be highly appreciated!
Cheers,
Ronald
2 years, 3 months