Situation: Two servers housed in 2 different AWS regions are completely disconnected and totally out of sync. I can't fix replication at all so I'm looking for clues or tips ..
Backstory: - Complex Active Directory needs including transitive trusts across multiple child domains of the AD Forrest - This means that we've been constantly upgrading IPA and subsystems like sssd* given the speed at which AD integration is being improved/fixed - We've been doing "yum update" and "ipa-server-upgrade" commands all the way from ipa-3.x to current v4.4.0 - Due to incremental upgrades over time we've been at "domain level 0" until very recently
Issues - Two servers work but they are islands to their own - no replication seems to be occurring - IPA connection-check scripts seem to all pass - IPA replication-manage "list" commands seem to work fine - forcing replication or forcing a complete reinit has zero effect - IPA topologysegment-find domain commands seem to show the proper segments - BUT -- the topology-verify command clearly shows broken topology and disconnected state
It was only recently that I discovered the broken topology status - had spent too much time in the weeds looking at debug output trying to figure out why replication was not working .
I'm wondering what the best next-step is to regaining a unified IPA view. From reading the admin guide I'm thinking that I need to bring up new IPA servers so that I have more "nodes" to play with when potentially connecting and fixing the topology segments -- seems easier to fix segments when you have more nodes to play with.
I'm not sure what to fix first -- is the broken topology segment the cause for broken replication or is something wrong in the replication internals that results in a disconnected topology?
Guidance appreciated. I'm appending some redacted command output below.
Regards, Chris
###
# ipa-replica-manage list us-idmp001.COMPANYidm.org: master eu-idmp001.COMPANYidm.org: master
# ipa topologysegment-find domain ----------------- 1 segment matched ----------------- Segment name: us-idmp001.COMPANYidm.org-to-eu-idmp001.COMPANYidm.org Left node: us-idmp001.COMPANYidm.org Right node: eu-idmp001.COMPANYidm.org Connectivity: left-right ---------------------------- Number of entries returned 1 ----------------------------
#ipa topologysegment-find domain ----------------- 1 segment matched ----------------- Segment name: eu-idmp001.COMPANYidm.org-to-us-idmp001.COMPANYidm.org Left node: eu-idmp001.COMPANYidm.org Right node: us-idmp001.COMPANYidm.org Connectivity: left-right ---------------------------- Number of entries returned 1 ---------------------------- [root@eu-idmp001 centos]#
# ipa topologysuffix-verify domain ======================================================== Replication topology of suffix "domain" contains errors. ======================================================== ------------------------ Topology is disconnected ------------------------ Server eu-idmp001.COMPANYidm.org can't contact servers: us-idmp001.COMPANYidm.org [root@us-idmp001 centos]#
# ipa topologysuffix-verify domain ======================================================== Replication topology of suffix "domain" contains errors. ======================================================== ------------------------ Topology is disconnected ------------------------ Server us-idmp001.COMPANYidm.org can't contact servers: eu-idmp001.COMPANYidm.org [root@eu-idmp001 centos]# [root@eu-idmp001 centos]#
# /usr/sbin/ipa-replica-conncheck --replica eu-idmp001.COMPANYidm.org Check connection from master to remote replica 'eu-idmp001.COMPANYidm.org': Directory Service: Unsecure port (389): OK Directory Service: Secure port (636): OK Kerberos KDC: TCP (88): OK Kerberos KDC: UDP (88): WARNING Kerberos Kpasswd: TCP (464): OK Kerberos Kpasswd: UDP (464): WARNING HTTP Server: Unsecure port (80): OK HTTP Server: Secure port (443): OK The following UDP ports could not be verified as open: 88, 464 This can happen if they are already bound to an application and ipa-replica-conncheck cannot attach own UDP responder.
Connection from master to replica is OK.
ipa-replica-conncheck --master us-idmp001.COMPANYidm.org Check connection from replica to remote master 'us-idmp001.COMPANYidm.org': Directory Service: Unsecure port (389): OK Directory Service: Secure port (636): OK Kerberos KDC: TCP (88): OK Kerberos Kpasswd: TCP (464): OK HTTP Server: Unsecure port (80): OK HTTP Server: Secure port (443): OK
The following list of ports use UDP protocol and would need to be checked manually: Kerberos KDC: UDP (88): SKIPPED Kerberos Kpasswd: UDP (464): SKIPPED
Connection from replica to master is OK. Start listening on required ports for remote master check Listeners are started. Use CTRL+C to terminate the listening part after the test.
Please run the following command on remote master: /usr/sbin/ipa-replica-conncheck --replica eu-idmp001.COMPANYidm.org
looks like you have a one directional topology segment on each server, they are created from existing replication agreements when raising the domain lvel, they should be replicated and merged to one bi-directional segment - so it looks like replication was not working already back then.
to investigate the replication state we would have to look into ds error logs, examine the replication agreements and ruvs.
as you suggested, you could add a new replica from one of the existing servers, then connect this new one to the other old one and remove the dangling segments.
if you were running frequent upgrades and were doing upgrades in parallel, you could also have replication conflict entries complicating things
Ludwig
On 06/01/2017 02:27 PM, Chris Dagdigian via FreeIPA-users wrote:
Situation: Two servers housed in 2 different AWS regions are completely disconnected and totally out of sync. I can't fix replication at all so I'm looking for clues or tips ..
Backstory:
- Complex Active Directory needs including transitive trusts across
multiple child domains of the AD Forrest
- This means that we've been constantly upgrading IPA and subsystems
like sssd* given the speed at which AD integration is being improved/fixed
- We've been doing "yum update" and "ipa-server-upgrade" commands all
the way from ipa-3.x to current v4.4.0
- Due to incremental upgrades over time we've been at "domain level
0" until very recently
Issues
- Two servers work but they are islands to their own - no replication
seems to be occurring
- IPA connection-check scripts seem to all pass
- IPA replication-manage "list" commands seem to work fine
- forcing replication or forcing a complete reinit has zero effect
- IPA topologysegment-find domain commands seem to show the proper
segments
- BUT -- the topology-verify command clearly shows broken topology
and disconnected state
It was only recently that I discovered the broken topology status - had spent too much time in the weeds looking at debug output trying to figure out why replication was not working .
I'm wondering what the best next-step is to regaining a unified IPA view. From reading the admin guide I'm thinking that I need to bring up new IPA servers so that I have more "nodes" to play with when potentially connecting and fixing the topology segments -- seems easier to fix segments when you have more nodes to play with.
I'm not sure what to fix first -- is the broken topology segment the cause for broken replication or is something wrong in the replication internals that results in a disconnected topology?
Guidance appreciated. I'm appending some redacted command output below.
Regards, Chris
###
# ipa-replica-manage list us-idmp001.COMPANYidm.org: master eu-idmp001.COMPANYidm.org: master
# ipa topologysegment-find domain
1 segment matched
Segment name: us-idmp001.COMPANYidm.org-to-eu-idmp001.COMPANYidm.org Left node: us-idmp001.COMPANYidm.org Right node: eu-idmp001.COMPANYidm.org Connectivity: left-right
Number of entries returned 1
#ipa topologysegment-find domain
1 segment matched
Segment name: eu-idmp001.COMPANYidm.org-to-us-idmp001.COMPANYidm.org Left node: eu-idmp001.COMPANYidm.org Right node: us-idmp001.COMPANYidm.org Connectivity: left-right
Number of entries returned 1
[root@eu-idmp001 centos]#
# ipa topologysuffix-verify domain
Replication topology of suffix "domain" contains errors.
Topology is disconnected
Server eu-idmp001.COMPANYidm.org can't contact servers: us-idmp001.COMPANYidm.org [root@us-idmp001 centos]#
# ipa topologysuffix-verify domain
Replication topology of suffix "domain" contains errors.
Topology is disconnected
Server us-idmp001.COMPANYidm.org can't contact servers: eu-idmp001.COMPANYidm.org [root@eu-idmp001 centos]# [root@eu-idmp001 centos]#
# /usr/sbin/ipa-replica-conncheck --replica eu-idmp001.COMPANYidm.org Check connection from master to remote replica 'eu-idmp001.COMPANYidm.org': Directory Service: Unsecure port (389): OK Directory Service: Secure port (636): OK Kerberos KDC: TCP (88): OK Kerberos KDC: UDP (88): WARNING Kerberos Kpasswd: TCP (464): OK Kerberos Kpasswd: UDP (464): WARNING HTTP Server: Unsecure port (80): OK HTTP Server: Secure port (443): OK The following UDP ports could not be verified as open: 88, 464 This can happen if they are already bound to an application and ipa-replica-conncheck cannot attach own UDP responder.
Connection from master to replica is OK.
ipa-replica-conncheck --master us-idmp001.COMPANYidm.org Check connection from replica to remote master 'us-idmp001.COMPANYidm.org': Directory Service: Unsecure port (389): OK Directory Service: Secure port (636): OK Kerberos KDC: TCP (88): OK Kerberos Kpasswd: TCP (464): OK HTTP Server: Unsecure port (80): OK HTTP Server: Secure port (443): OK
The following list of ports use UDP protocol and would need to be checked manually: Kerberos KDC: UDP (88): SKIPPED Kerberos Kpasswd: UDP (464): SKIPPED
Connection from replica to master is OK. Start listening on required ports for remote master check Listeners are started. Use CTRL+C to terminate the listening part after the test.
Please run the following command on remote master: /usr/sbin/ipa-replica-conncheck --replica eu-idmp001.COMPANYidm.org
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org To unsubscribe send an email to freeipa-users-leave@lists.fedorahosted.org
Thanks for this.
I suspect something is fundamentally broken in replication for me, possibly due to a missing user or bad auth in the LDAP subsystem due to our constant chasing of incremental upgrades -- but based on your advice and a re-read of the Admin guide I'm going to see if I can deploy some fresh servers and get any sort of replication going at all with connected segments -- if that works I'll be able to add new segments, merge all the IPA data and then delete/drop the orphaned systems.
-Chris
Ludwig Krispenz via FreeIPA-users wrote:
looks like you have a one directional topology segment on each server, they are created from existing replication agreements when raising the domain lvel, they should be replicated and merged to one bi-directional segment - so it looks like replication was not working already back then.
to investigate the replication state we would have to look into ds error logs, examine the replication agreements and ruvs.
as you suggested, you could add a new replica from one of the existing servers, then connect this new one to the other old one and remove the dangling segments.
if you were running frequent upgrades and were doing upgrades in parallel, you could also have replication conflict entries complicating things
freeipa-users@lists.fedorahosted.org