So, I have what I think seems to be a slightly odd problem. And I think
I've worked out what the solution might be - but not the root cause. In any
case, I wanted to run it by you all and see whether you agree or have any
insight into it.
running 6 directory servers 4.5.0-21 on CentOS 7.4.1708, 3 of which have
the CA role. I've been running the directory blissfully uneventfully for
7ish months now. We have experimented a little bit with the CA features,
but nothing that can't be done trivially with the web interface (on
reflection I'm sure it probably is trivial to revoke your primary
certificate authority with the web interface, but you know what I mean).
In the past few days I've had the occasion to try to create a new replica
but on each attempt, the process fails around this time:
[4/4]: configuring ipa-custodia to start on boot
Done configuring ipa-custodia.
The ipa-replica-install command failed, exception: HTTPError: 404 Client
Error: Not Found
404 Client Error: Not Found
The ipa-replica-install command failed. See /var/log/ipareplica-install.log
for more information
Now, I've learned a fair amount over the past few days digging into this,
like what ipa-custodia is, and how to poke it.
It seems that at this point, the process is still actually actively doing
things - it appears to be generating some kind of NSS certificate/key
store. And that process is failing, because apparently it can't find the
key for the entry "auditSigningCert cert-pki-ca" - specifically in
custodiainstance.__get_keys the call to cli.fetch_key is failing for this
nickname (but no others).
So, more digging, and I find that yes indeed, the private key appears to be
missing from the cert database on one of the directory servers
(specifically the "first" directory server).
I haven't quite joined the dots on how custodia is working here, but using
the following command:
sudo certutil -L -d /etc/pki/pki-tomcat/alias
I can determine that on the first directory server, the trust attributes
for this cert are ",,P" whereas on the other two CA directory servers, the
trust attributes are "u,u,uP", and that indeed the key is missing from the
first directory server in this database.
I also note that the cert databases seem to be divergent in other ways
between the CA servers. Which I find interesting.
But anyway, so my next action is to copy the cert databases to another
machine and to try to import the cert/key from a "good" CA db to the
CA db using pk12util.
This gives me a segmentation fault.
So, I try with a new DB. I export all the cert/key pairs from the "bad" CA
individually and import them into a new DB, replicating the trust
attributes. So far so good. I also export the missing cert/key from a
"good" CA and import that into the same new DB. Also apparently good.
So, at this point, I feel relatively confident that I have constructed a
good DB and I should be able to perform some surgery to remove the old
"bad" DB and replace it with this "good" DB.
My questions are:
1. Does this approach seem reasonable or am I oversimplifying?
2. If this is a reasonable approach: what's my best method for performing
the surgery? ipactl stop, move bad db directory out of way, move "good" db
in, don't forget the selinux stuff, then ipactl start again?
3. How could this even happen in the first place? Is it a known issue?
4. Shouldn't the CA databases basically all look the same between servers
created at the same time? Why might they diverge?
5. Do you have any other comments or questions which you feel might be
Thanks in advance for any input or insights shared.
Andrew Stubbs, PhD
Head of Technical Operations