Re: Ongoing CA access issues
by Bret Wortman
Still trying to get my CA working.
In the IPA web UI, under Authentication -> Certificates, I can see a
number of certs listed as VALID, EXPIRED, or REVOKED_EXPIRED. But I can
also see many more that are greyed out, and whose "Issuing CA" and
"Status" columns are empty. Does this begin to point toward a solvable
problem?
On 05/18/2017 08:58 AM, Bret Wortman wrote:
>
> Oops, the slapd messages are arriving every 60s, not 5m.
>
>
> On 05/18/2017 08:56 AM, Bret Wortman wrote:
>>
>> httpd_error seems to give the most information. When i try to use ipa
>> cert-show:
>>
>> ipa: INFO: [jsonserver_kerb] admin(a)DAMASCUSGRP.COM: ping(): SUCCESS
>> (111)Connection refused: AH00957: AJP: attempt to connect to
>> 127.0.0.1:8009 (localhost) failed
>> AH00959: ap_proxy_connect_backend disabling worker for (locahost) for 60s
>> [client 192.168.208.54:52714] AH00896: failed to make connection to
>> backend: localhost
>> ipa: ERROR: ra.get_certificate(): Unable to communicate with CMS (503)
>> ipa: INFO: [jsonserver_kerb] admin(a)DAMASCUSGRP.COM:
>> cert_show/1(u'895', version=u'2.213'): CertificateOperationError
>>
>> /var/log/pki/pki-tomcat/ca/debug just loops through the same set of
>> messages every 5 minutes or so but doesn't seem to error.
>>
>> /var/log/pki/localhost_access_log.2017-05-18.txt is basically empty
>> except for a single entry (for a POST to /ca/admin/ca/getStatus)
>>
>> Nothing shows up in dirsrv/slapd-DAMASCUSGRP-COM/errors or access
>> when I issue the request, but periodic messages do appear about every
>> 5 minutes or so.
>>
>>
>> On 05/18/2017 08:43 AM, Bret Wortman wrote:
>>> On 04/26/2017 06:02 PM, Rob Crittenden wrote:
>>>> Bret Wortman wrote:
>>>>> So I can see my certs using cert-find, but can't get details using
>>>>> cert-show or add new ones using cert-request.
>>>>>
>>>>> # ipa cert-find
>>>>> :
>>>>> ------------------------------
>>>>> Number of entries returned 385
>>>>> ------------------------------
>>>>> # ipa cert-show 895
>>>>> ipa: ERROR: Certificate operation cannot be completed: Unable to
>>>>> communicate with CMS (503)
>>>>> # ipa cert-show 1 (which does not exist)
>>>>> ipa: ERROR: Certificate operation cannot be completed: Unable to
>>>>> communicate with CMS (503)
>>>>> # ipa cert-status 895
>>>>> ipa: ERROR: Certificate operation cannot be completed: Unable to
>>>>> communicate with CMS (503)
>>>>> #
>>>>>
>>>>> Is this an IPV6 thing? Because ipactl shows everything green and
>>>>> certmonger is running.
>>>> Doubtful.
>>>>
>>>> cert-find and cert-show use different APIs in dogtag. cert-find
>>>> uses the
>>>> newer RESTful API and cert-show uses the older XML-based API (and is
>>>> authenticated). I'm guessing that is where the issue lies.
>>>>
>>>> What I'd recommend doing is noting the time, restarting the CA, and
>>>> then
>>>> plow through the debug log looking for failures. It could be that
>>>> the CA
>>>> is only partially up (and I'd check your CA subsystem certs as well).
>>> Which debug log, specifically, do you think will help? I'm also not
>>> sure what you mean by, "check your CA subsystem certs." We still
>>> have pending CSRs that we can't grant until I get this working again.
>>>> rob
>>>>
>>>>> Bret
>>>>>
>>>>>
>>>>> On 04/26/2017 09:03 AM, Bret Wortman wrote:
>>>>>> Digging still deeper:
>>>>>>
>>>>>> # ipa cert-request f.f
>>>>>> --principal=HTTP/`hostname`(a)DAMASCUSGRP.COM
>>>>>> ipa: ERROR: Certificate operation cannot be completed:
>>>>>> Unable to
>>>>>> communicate with CMS (503)
>>>>>>
>>>>>> Looks like this is an HTTP error; so is it possible that my IPA
>>>>>> thinks
>>>>>> it has a CA but there's no CMS available?
>>>>>>
>>>>>>
>>>>>> On 04/26/2017 08:41 AM, Bret Wortman wrote:
>>>>>>> Using the firefox debugger, I get these errors when trying to
>>>>>>> pop up
>>>>>>> the New Certificate dialog:
>>>>>>>
>>>>>>> Empty string passed to getElementById(). (5)
>>>>>>> jquery.js:4:1060
>>>>>>> TypeError: u is undefined
>>>>>>> app.js:1:362059
>>>>>>> Empty string passed to getElementById(). (5)
>>>>>>> jquery.js:4:1060
>>>>>>> TypeError: t is undefined
>>>>>>> app.js:1:217432
>>>>>>>
>>>>>>> I'm definitely not a web kind of guy so I'm not sure if this is
>>>>>>> helpful or not. This is on 4.4.0, API Version 2.213.
>>>>>>>
>>>>>>>
>>>>>>> Bret
>>>>>>>
>>>>>>>
>>>>>>> On 04/26/2017 08:35 AM, Bret Wortman wrote:
>>>>>>>> Good news. One of my servers _does_ have CA installed. So why does
>>>>>>>> "Action -> New Certificate" not do anything on this or any
>>>>>>>> other server?
>>>>>>>>
>>>>>>>>
>>>>>>>> Bret
>>>>>>>>
>>>>>>>>
>>>>>>>> On 04/25/2017 02:52 PM, Bret Wortman wrote:
>>>>>>>>> I recently had to upgrade all my Fedora IPA servers to C7. It
>>>>>>>>> went
>>>>>>>>> well, and we've been up and running nicely on 4.4.0 on C7 for the
>>>>>>>>> past month or so.
>>>>>>>>>
>>>>>>>>> Today, someone came and asked me to generate a new certificate
>>>>>>>>> for
>>>>>>>>> their web server. All was good until I went to the IPA UI and
>>>>>>>>> tried
>>>>>>>>> to perform Actions->New Certificate, which did nothing. I tried
>>>>>>>>> each of our 3 servers in turn. All came back with no popup window
>>>>>>>>> and no error, either.
>>>>>>>>>
>>>>>>>>> I suspect the problem might be that we no longer have a CA server
>>>>>>>>> due to the method I used to upgrade the servers. I likely
>>>>>>>>> missed a
>>>>>>>>> "--setup-ca" in there somewhere, so my rolling update rolled over
>>>>>>>>> the CA.
>>>>>>>>>
>>>>>>>>> What's my best hope of recovery? I never ran this before, so I'm
>>>>>>>>> not sure if this shows that I'm missing a CA or not:
>>>>>>>>>
>>>>>>>>> # ipa ca-find
>>>>>>>>> ------------
>>>>>>>>> 1 CA matched
>>>>>>>>> ------------
>>>>>>>>> Name: ipa
>>>>>>>>> Description IPA CA
>>>>>>>>> Authority ID: 3ce3346[...]
>>>>>>>>> Subject DN: CN=Certificate Authority, O=DAMASCUSGRP.COM
>>>>>>>>> Issuer DN: CN=Certificate Authority,O=DAMASCUSGRP.COM
>>>>>>>>> ----------------------------
>>>>>>>>> Number of entries returned 1
>>>>>>>>> ----------------------------
>>>>>>>>> # ipa ca-add dg --desc "Damascus Group" --subject "CN=DG CA,
>>>>>>>>> O=DAMASCUSGRP.COM"
>>>>>>>>> ipa: ERROR: Failed to authenticate to CA REST API
>>>>>>>>> # klist
>>>>>>>>> Ticket cache: KEYRING:persistent:0:0
>>>>>>>>> Default principal: admin(a)DAMASCUSGRP.COM
>>>>>>>>>
>>>>>>>>> Valid starting Expires Service principal
>>>>>>>>> 04/25/2017 18:48:26 04/26/2017 18:48:21
>>>>>>>>> krbtgt/DAMASCUSGRP.COM(a)DAMASCUSGRP.COM
>>>>>>>>> #
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What's my best path of recovery?
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Bret Wortman*
>>>>>>>>> The Damascus Group
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>>
>>
>>
>
6 years, 4 months
Re: SSSD Cache and Service Tickets
by Sumit Bose
On Tue, May 16, 2017 at 11:30:25AM +0200, Ronald Wimmer wrote:
> On 2017-05-15 21:27, Jakub Hrozek wrote:
> > [...]
> >
> > On Mon, May 15, 2017 at 03:54:22PM +0200, Ronald Wimmer wrote:
> > > Hi,
> > >
> > > I am confronted with a behaviour for which I do not have an explanation for.
> > >
> > > I am using NFS4 Kerberos automounted homeshares and and recently I got a
> > > permission denied (reproducible when I restart autofs on the server I want
> > > to connect to) from the Windows Domain. So here's what I tried:
> > >
> > > 1) Connected via PuTTY from a Windows Machine in the windows domain
> > > Kerberos-based login works but I get a "Permission Denied" on my home
> > > directory; klist shows no tickets
> > No tickets at all? Not even an expired ticket?
> Unfortunately no tickets.
Did you ‘Allow GSSAPI credential delegation’ in the putty configuration?
Additionally the internal Windows Kerberos handling only allows
delegation to host which have the ok-to-delegate flag set in the
Kerberos service ticket.
Please check with 'ipa host-show hostname' if 'Trusted for delegation:
True', if not please try 'ipa host-mod hostname --ok-as-delegate=True'.
HTH
bye,
Sumit
> > Does running klist in cmd.exe show anything?
> Yes, it does:
> -bash-4.2$ klist
> klist: Credentials cache keyring 'persistent:1073895519:1073895519' not
> found
>
> And again... If I connect from my linux machine (within the ipa domain),
> tickets are there:
>
> -bash-4.2$ klist
> Ticket cache: KEYRING:persistent:1073895519:1073895519
> Default principal: myuser(a)MYWINDOWDOMAIN.AT
>
> Valid starting Expires Service principal
> 2017-05-16 11:29:04 2017-05-16 15:43:45
> nfs/ipanfs.myipadomain.at(a)MYIPADOMAIN.AT
> 2017-05-16 11:25:09 2017-05-16 15:43:45
> krbtgt/MYWINDOWDOMAIN.AT(a)MYWINDOWDOMAIN.AT
> renew until 2017-05-16 15:43:45
>
> From this point on login from windows (AD domain) does - of course - work.
>
> Any ideas how to bring some light into this?
>
> --
> Manage your subscription for the Freeipa-users mailing list:
> https://www.redhat.com/mailman/listinfo/freeipa-users
> Go to http://freeipa.org for more info on the project
6 years, 4 months
one-way replication problem
by Bob Hinton
Hi,
We inadvertently created a replica on a VM that was scheduled to be
powered-down.
This scheduling has since been stopped.
Replication to it works -
-sh-4.2$ ipa-replica-manage list -v ipa006.mgmt.prod.local.lan
p11-kit: couldn't open and map file:
/etc/pki/ca-trust/source/ipa.p11-kit: Permission denied
ipa001.mgmt.prod.local.lan: replica
last init status: None
last init ended: 1970-01-01 00:00:00+00:00
last update status: Error (0) Replica acquired successfully:
Incremental update succeeded
last update ended: 2017-05-30 19:47:21+00:00
ipa005.mgmt.prod.local.lan: replica
last init status: None
last init ended: 1970-01-01 00:00:00+00:00
last update status: Error (0) Replica acquired successfully:
Incremental update succeeded
last update ended: 2017-05-30 19:47:21+00:00
ipa006.mgmt.nonprod.local.lan: replica
last init status: None
last init ended: 1970-01-01 00:00:00+00:00
last update status: Error (0) Replica acquired successfully:
Incremental update succeeded
last update ended: 2017-05-30 19:47:21+00:00
ipa007.mgmt.prod.local.lan: replica
last init status: None
last init ended: 1970-01-01 00:00:00+00:00
last update status: Error (0) Replica acquired successfully:
Incremental update succeeded
last update ended: 2017-05-30 19:47:21+00:00
ipa009.mgmt.prod.local.lan: replica
last init status: None
last init ended: 1970-01-01 00:00:00+00:00
last update status: Error (0) Replica acquired successfully:
Incremental update succeeded
last update ended: 2017-05-30 19:47:21+00:00
But from it doesn't ...
-sh-4.2$ ipa-replica-manage list -v ipa006.mgmt.nonprod.local.lan
p11-kit: couldn't open and map file:
/etc/pki/ca-trust/source/ipa.p11-kit: Permission denied
ipa006.mgmt.prod.local.lan: replica
last init status: None
last init ended: 1970-01-01 00:00:00+00:00
last update status: Error (49) Problem connecting to replica - LDAP
error: Invalid credentials (connection error)
last update ended: 1970-01-01 00:00:00+00:00
-sh-4.2$
This is presumably because the DS keytab is invalid ...
[root@ipa006 ~]# klist -kt /etc/dirsrv/ds.keytab
Keytab name: FILE:/etc/dirsrv/ds.keytab
KVNO Timestamp Principal
---- -----------------
--------------------------------------------------------
1 25/04/17 11:59:54 ldap/ipa006.mgmt.nonprod.local.lan(a)LOCAL.LAN
1 25/04/17 11:59:54 ldap/ipa006.mgmt.nonprod.local.lan(a)LOCAL.LAN
[root@ipa006 ~]# kinit -kt /etc/dirsrv/ds.keytab ldap/`hostname`
[root@ipa006 ~]# klist
Ticket cache: KEYRING:persistent:1629600114:krb_ccache_MMFShL3
Default principal: ldap/ipa006.mgmt.nonprod.local.lan(a)LOCAL.LAN
Valid starting Expires Service principal
30/05/17 20:56:19 31/05/17 20:56:19 krbtgt/LOCAL.LAN(a)LOCAL.LAN
[root@ipa006 ~]# ldapsearch -Y GSSAPI -h `hostname` -b "" -s base
SASL/GSSAPI authentication started
ldap_sasl_interactive_bind_s: Invalid credentials (49)
[root@ipa006 ~]# ldapsearch -Y GSSAPI -h ipa006.mgmt.prod.local.lan -b
"" -s base
SASL/GSSAPI authentication started
ldap_sasl_interactive_bind_s: Invalid credentials (49)
[root@ipa006 ~]#
[root@ipa006 ~]# kvno ldap/ipa006.mgmt.nonprod.local.lan(a)LOCAL.LAN
ldap/ipa006.mgmt.nonprod.local.lan(a)LOCAL.LAN: kvno = 1
[root@ipa006 ~]#
[31/May/2017:10:10:42.624019970 +0100] set_krb5_creds - Could not get
initial credentials for principal
[ldap/ipa006.mgmt.nonprod.local.lan(a)LOCAL.LAN] in keytab
[FILE:/etc/dirsrv/ds.keytab]: -1765328228 (Cannot contact any KDC for
requested realm)
[31/May/2017:10:10:42.626991209 +0100] set_krb5_creds - Could not get
initial credentials for principal
[ldap/ipa006.mgmt.nonprod.local.lan(a)LOCAL.LAN] in keytab
[FILE:/etc/dirsrv/ds.keytab]: -1765328228 (Cannot contact any KDC for
requested realm)
[31/May/2017:10:10:42.641172130 +0100] slapd started. Listening on All
Interfaces port 389 for LDAP requests
[31/May/2017:10:10:42.643809873 +0100] Listening on All Interfaces port
636 for LDAPS requests
[31/May/2017:10:10:42.645563686 +0100] Listening on
/var/run/slapd-LOCAL.LAN.socket for LDAPI requests
[31/May/2017:10:10:42.649079604 +0100] schema-compat-plugin -
schema-compat-plugin tree scan will start in about 5 seconds!
[31/May/2017:10:10:56.172281973 +0100] schema-compat-plugin - warning:
no entries set up under cn=computers, cn=compat,dc=b428,dc=dvla,dc=gov,dc=uk
[31/May/2017:10:10:56.195279906 +0100] schema-compat-plugin - Finished
plugin initialization.
[31/May/2017:10:11:00.165352662 +0100] NSMMReplicationPlugin -
agmt="cn=ipa006.mgmt.nonprod.local.lan-to-ipa006.mgmt.prod.local.lan"
(ipa006:389): Replication bind with GSSAPI auth failed: LDAP error 49
(Invalid credentials) ()
[31/May/2017:10:11:00.188136320 +0100] NSMMReplicationPlugin -
agmt="cn=ipa006.mgmt.nonprod.local.lan-to-ipa006.mgmt.prod.local.lan"
(ipa006:389): Replication bind with GSSAPI auth failed: LDAP error 49
(Invalid credentials) ()
[31/May/2017:10:11:06.083497332 +0100] slapd_poll(73) timed out
Tried to update ds.keytab by editing /etc/krb5.conf and changing kdc,
master_kdc and admin_server to a valid replica then restart ipa and ...
[root@ipa006 ~]# kinit admin
Password for admin(a)LOCAL.LAN:
[root@ipa006 ~]# ipa-getkeytab -k /etc/dirsrv/ds.keytab -p
ldap/`hostname` -s ipa006.mgmt.prod.local.lan
Failed to parse result: PrincipalName not found.
Retrying with pre-4.0 keytab retrieval method...
Failed to parse result: PrincipalName not found.
Failed to get keytab!
Failed to get keytab
[root@ipa006 ~]#
Put everything back and try to delete the replica again...
-sh-4.2$ ipa-replica-manage del ipa006.mgmt.nonprod.local.lan
'NoneType' object is not iterable
IPA is V 4.4.0
Could someone please tell either how to fix the ds.keytab or delete the
ipa006.mgmt.nonprod.local.lan replica.
Many thanks
Bob
6 years, 4 months
getcert list -d /etc/httpd/alias -n "Server-Cert" status: CA_UNREACHABLE
by Jake
I am trying to renew the last certificate for the IPA masters (previous email) and am coming across this issue on my original IPA master (first server)
getcert list -d /etc/httpd/alias -n "Server-Cert"
Number of certificates and requests being tracked: 8.
Request ID '20170428162941':
status: CA_UNREACHABLE
ca-error: Server at https://ipa01.ipa.example.com/ipa/xml failed request, will retry: 4001 (RPC failed at server. nss certificate db: user not found).
stuck: no
key pair storage: type=NSSDB,location='/etc/httpd/alias',nickname='Server-Cert',token='NSS Certificate DB',pinfile='/etc/httpd/alias/pwdfile.txt'
certificate: type=NSSDB,location='/etc/httpd/alias',nickname='Server-Cert',token='NSS Certificate DB'
CA: IPA
issuer: CN=Certificate Authority,O=IPA. EXAMPLE.COM
subject: CN=ipa01.ipa.example.com,O=IPA.EXAMPLE.COM
expires: 2018-07-30 13:08:58 UTC
key usage: digitalSignature,nonRepudiation,keyEncipherment,dataEncipherment
eku: id-kp-serverAuth,id-kp-clientAuth
pre-save command:
post-save command: /usr/libexec/ipa/certmonger/restart_httpd
track: yes
auto-renew: yes
This server was 4.2.0 originally, then upgraded to 4.4.0, I tried https://www.redhat.com/archives/freeipa-users/2016-February/msg00441.html but that doesn't seem to make a difference.
If possible, can I stop tracking and regenerate this certificate?
All other masters (7 out of 8) did not have an issue renewing their certificates.
Thanks!!
-Jake
6 years, 4 months
Compat tree question
by Robert Johnson
Red Hat Enterprise Linux Server release 7.3
ipa-server-4.4.0-14.el7_3.4.x86_64
389-ds-base-1.3.5.10-15.el7_3.x86_64
sssd-1.14.0-43.el7_3.11.x86_64
When looking at entries in the "cn=groups,cn=compat" tree, I noticed that
the entries for windows groups have the realm portion of the group name in
all caps. This is true for the comment, the dn and the cn.
example:
# domain users(a)WIN.MYDOMAIN.COM, groups, compat, ipa.mydomain.com
dn: cn=domain users(a)WIN.MYDOMAIN.COM
,cn=groups,cn=compat,dc=ipa,dc=mydomain,dc=com
memberUid: 123456(a)win.mydomain.com
cn: domain users(a)WIN.MYDOMAIN.COM
When I look at the entries in the "cn=users,cn=compat" tree, the realm
portion of the user name is all lower case. Incidentally, these same user
names are also all lowercase in the "memberUid" option on the groups above.
example:
# 123456(a)win.mydomain.com, users, compat, ipa.mydomain.com
dn: uid=123456(a)win.mydomain.com,cn=users,cn=compat,dc=ipa,dc=mydomain,dc=com
homeDirectory: /home/win.mydomain.com/123456
uid: 123456(a)win.mydomain.com
Was this by design ?
The reason I ask, is that when I try to use the "kinit" feature on our
Solaris 10 systems (which is joined to the IPA domain) for this windows
user, I get an error.
[~]$ kinit
Password for 123456(a)win.mydomain.com:
kinit(v5): KDC reply did not match expectations while getting initial
credentials
If I run it like this:
[~]$ kinit 123456(a)WIN.MYDOMAIN.COM
Password for 123456(a)WIN.MYDOMAIN.COM:
[~]$ klist
Ticket cache: FILE:/tmp/krb5cc_1683378846
Default principal: 123456(a)WIN.MYDOMAIN.COM
Valid starting Expires Service principal
05/30/17 11:44:35 05/30/17 21:44:40 krbtgt/
WIN.MYDOMAIN.COM(a)WIN.MYDOMAIN.COM
renew until 06/06/17 11:44:35
I believe this is due to the fact that the Solaris 10 system is using the
lowercase entry in the compat tree above. Here is the result of the ID
command on this user:
[~]$ id
uid=1683378846(123456(a)win.mydomain.com) gid=1683378846(
123456(a)WIN.MYDOMAIN.COM)
I know this is a work around but I would prefer to make this easier on the
end users. Any suggestions ?
Robert Johnson
6 years, 4 months
Kerberos and load balancing
by Ronald Wimmer
Hi,
I have a load balancer in front of three Webservers. I have read about
the possible scenarios on https://ssimo.org/blog/id_019.html and I am
already sucessfully using one common SPN for the loadbalancer.
My question is: Will it work if the domains are not the same? (i.e. if
the loadbalancer SPN is HTTP/someservice.mydomain.at(a)MYDOMAIN.AT but the
servers are in a subdomain like for instance webserver1.linux.mydomain.at)
Regards,
Ronald
6 years, 4 months
Re: Replica cannot be reinitialized after upgrade
by Goran Marik
Thanks Ludwig. I’ve open the issue #6990 with the logs and files requested.
In the past few days I’ve managed to remove the stale replicas running the cleanruv task via ldif, and tried to resync again few times, but the error logs still keep happening. You mentioned that there is the nsds5ReplicaIgnoreMissingChange option, but can you specify the steps on how to set/enable that option?
Thanks,
Goran
> On May 19, 2017, at 3:49 AM, Ludwig Krispenz <lkrispen(a)redhat.com> wrote:
>
>
> On 05/18/2017 10:13 PM, Goran Marik wrote:
>> Thanks Ludwig for the suggestion and thanks to Maciej for the confirmation from his end. This issue is happening for us for several weeks, so I don’t think this is a transient problem.
>>
>> What is the best way to sanitize the logs without removing useful info before sending them your way? Will the files mentioned on "https://www.freeipa.org/page/Files_to_be_attached_to_bug_report -> Directory server failed" be sufficient?
> yes, but we need soem additional info on the replication config and state, you could add /etc/dirsrv/slapd-*/dse.ldif
> and the result of these query
>
> ldapsearch -o ldif-wrap=no .................... -D "cn=directory manager" ... -b "cn=config" "objectclass=nsds5replica" \* nsds50ruv
>
> But looking again at the csn reorted missing it is from June, 2016. So I wonder if this is for an stale/removed replica and cleaning the ruvs would help
>>
>> I’ve also run the ipa_consistency_check script, and the output shows that something is indeed wrong with the sync:
>> “””
>> FreeIPA servers: inf01 inf01 inf02 inf02 STATE
>> =============================================================
>> Active Users 15 15 15 15 OK
>> Stage Users 0 0 0 0 OK
>> Preserved Users 3 3 3 3 OK
>> User Groups 9 9 9 9 OK
>> Hosts 45 45 45 46 FAIL
>> Host Groups 7 7 7 7 OK
>> HBAC Rules 6 6 6 6 OK
>> SUDO Rules 7 7 7 7 OK
>> DNS Zones 33 33 33 33 OK
>> LDAP Conflicts NO NO NO NO OK
>> Ghost Replicas 2 2 2 2 FAIL
>> Anonymous BIND YES YES YES YES OK
>> Replication Status inf01.prod 0inf01.dev 0inf01.dev 0inf01.dev 0
>> inf02.dev 0inf02.dev 0inf01.prod 0inf01.prod 0
>> inf02.prod 0inf02.prod 0inf02.prod 0inf02.dev 0
>> =============================================================
>> “””
>>
>> Thanks,
>> Goran
>>
>>> On May 15, 2017, at 6:35 AM, Ludwig Krispenz <lkrispen(a)redhat.com> wrote:
>>>
>>> The messages you see could be transient messages, and if replication is working than this seems to be the case. If not we would need more data to investigate: deployment info, relicaIDs of all servers, ruvs, logs,.....
>>>
>>> Here is some background info: there are some scenarios where a csn could not be found in the changelog, eg if updates were aplied on the supplier during a total init, they could be part of the data and database ruv, but not in the changelog of the initialized replica.
>>> ds did try to use an alternative csn in cases where it could not be found, but this had the risk of missing updates, so we decided to change it and make this misssing csn a non fatal error, backoff and retry, if another supplier would have updated the replica in between, the starting csn could have changed and be found. so if the reported missing csns change and replication continues everything is ok, although I think the messages should stop at some point.
>>>
>>> There is a configuration parameter for a replciation agreement to trigger the previous behaviour of picking an alternative csn:
>>> nsds5ReplicaIgnoreMissingChange
>>> with potential values "once", "always".
>>>
>>> where "once" just tries to kickstart replication by using another csn and "always" changes the default behaviour
>>>
>>>
>>> On 05/11/2017 06:53 PM, Goran Marik wrote:
>>>> Hi,
>>>>
>>>> After an upgrade to Centos 7.3.1611 with “yum update", we started seeing the following messages in the logs:
>>>> “””
>>>> May 9 21:58:28 inf01 ns-slapd[4323]: [09/May/2017:21:58:28.519724479 +0000] NSMMReplicationPlugin - changelog program - agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): CSN 576b34e8000a050f0000 not found, we aren't as up to date, or we purged
>>>> May 9 21:58:28 inf01 ns-slapd[4323]: [09/May/2017:21:58:28.550459233 +0000] NSMMReplicationPlugin - agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): Data required to update replica has been purged from the changelog. The replica must be reinitialized.
>>>> May 9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.588245476 +0000] agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389) - Can't locate CSN 576b34e8000a050f0000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized.
>>>> May 9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.611400689 +0000] NSMMReplicationPlugin - changelog program - agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): CSN 576b34e8000a050f0000 not found, we aren't as up to date, or we purged
>>>> May 9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.642226385 +0000] NSMMReplicationPlugin - agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): Data required to update replica has been purged from the changelog. The replica must be reinitialized.
>>>> “””
>>>>
>>>> The log messages are pretty frequently, every few seconds, and report few different CSN numbers that cannot be located.
>>>>
>>>> This happens only on one replica out of 4. We’ve tried "ipa-replica-manage re-initialize —from” and “ipa-csreplica-manage re-initialize —from” several times, but while both commands report success, the log messages continue to happen. The server was rebooted and “systemctl restart ipa” was done few times as well.
>>>>
>>>> The replica seems to be working fine despite the errors, but I’m worried that the logs indicate underlaying problem we are not fully detecting. I would like to understand better what is triggering this behaviour and how to fix it, and if someone else saw them after a recent upgrades.
>>>>
>>>> The software versions are 389-ds-base-1.3.5.10-20.el7_3.x86_64 and ipa-server-4.4.0-14.el7.centos.7.x86_64
>>>>
>>>> Thanks,
>>>> Goran
>>>>
>>>> --
>>>> Goran Marik
>>>> Senior Systems Developer
>>>>
>>>> ecobee
>>>> 250 University Ave, Suite 400
>>>> Toronto, ON M5H 3E5
>>>>
>>>>
>>>>
>>>>
>>> --
>>> Red Hat GmbH,
>>> http://www.de.redhat.com/
>>> , Registered seat: Grasbrunn,
>>> Commercial register: Amtsgericht Muenchen, HRB 153243,
>>> Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric Shander
>>>
>>> --
>>> Manage your subscription for the Freeipa-users mailing list:
>>> https://www.redhat.com/mailman/listinfo/freeipa-users
>>> Go to http://freeipa.org for more info on the project
>> --
>> Goran Marik
>> Senior Systems Developer
>>
>> ecobee
>> 250 University Ave, Suite 400
>> Toronto, ON M5H 3E5
>>
>>
>
> --
> Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
> Commercial register: Amtsgericht Muenchen, HRB 153243,
> Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric Shander
>
--
Goran Marik
Senior Systems Developer
ecobee
250 University Ave, Suite 400
Toronto, ON M5H 3E5
6 years, 4 months
Stale/ghost RUVs that cannot be removed
by Goran Marik
Hi,
We are troubleshooting an sync issue that started after a freeipa upgrade (yum update) to the latest Centos 7.3 mid-April. One of the problems that happen is that we see stale RUVs that cannot be deleted. Here is the output of list-ruv and clean-run:
“””
ipa-replica-manage list-ruv
Directory Manager password:
unable to decode: {replica 3} 57020ed9000600030000 57020ed9000600030000
unable to decode: {replica 4} 5702fe5b000500040000 5702fe5b000500040000
Replica Update Vectors:
inf01.prod.ecobee.com:389: 6
inf02.dev.ecobee.com:389: 8
inf01.dev.ecobee.com:389: 7
inf02.prod.ecobee.com:389: 5
Certificate Server Replica Update Vectors:
inf02.prod.ecobee.com:389: 1095
inf01.dev.ecobee.com:389: 1295
inf02.dev.ecobee.com:389: 1190
inf01.prod.ecobee.com:389: 1195
“””
“””
ipa-replica-manage clean-ruv 3
Directory Manager password:
unable to decode: {replica 3} 57020ed9000600030000 57020ed9000600030000
unable to decode: {replica 4} 5702fe5b000500040000 5702fe5b000500040000
Replica ID 3 not found
“””
The ipa_consistecy_script reports this issue as two ghost replicas ("Ghost Replicas 2 2 2 2 FAIL”). The clean-dangling-ruv reports that that are no dangling RUVs.
Our version is VERSION: 4.4.0, API_VERSION: 2.213, on Centos 7.3.1611
In the list archives, I found one case from 2015 that sound similar and was possible fixed, but not confirmed, with a script cleanallruv.pl, but I haven’t been able to find more info on that. Any further help would be appreciated.
Thanks,
Goran
--
Goran Marik
Senior Systems Developer
ecobee
250 University Ave, Suite 400
Toronto, ON M5H 3E5
6 years, 4 months
named-pkcs11 systemd service
by Sigbjorn Lie
Hi,
I have experienced named stopping unexpectedly from time to time. After moving to RHEL 7 the I made use of a handy feature in systemd, “Restart=always”, to make sure named is kept alive.
This has kept named alive for me, and I was wondering if this perhaps would be a useful addition to the default "named-pkcs11.service” shipped in RHEL?
The changes I have done is to copy the file /usr/lib/systemd/system/named-pkcs11.service to /etc/systemd/system/named-pkcs11.service, and adding the following to [service] section:
---
Restart=always
RestartSec=3
—
The underlying issue of why named is crashing would of course also needs to be investigated separately.
What do you think?
Regards,
Siggi
6 years, 4 months