We have an air-gapped network of RHEL7 hosts that use sssd to perform PKINIT (smartcard + Kerberos) authentication against Windows Server 2016 domain controllers.
Setting this up properly entailed setting pkinit_anchors, pkinit_pool, pkinit_cert_match, et. al. in the krb5.conf file, and enabling smartcard authentication in gdm. It also entailed adding individual certificates to each user object’s userCertificate property, which our Windows guys grumbled about.
(The way Windows performs PKINIT is to find the certificate on the card that has a Microsoft User Principal Name X509v3 Subject Alternative Name, extract that value, and then look for the AD user object that has the same userPrincipalName. But the version of sssd that shipped with RHEL7 can’t do that SAN/userPrincipalName matching.)
For the most part, this has worked, and worked well. Once again, sssd has been an invaluable tool.
But.
For some accounts, smartcard authentication does not work, *even though* you can use kinit to perform PKINIT against the card (e.g., if you login via password authentication, then insert the smartcard once you have a shell window to play with).
For the accounts where smartcard authentication works, after you enter your username in gdm, the card blinks for a few seconds, and then you are prompted to enter the PIN as follows:
<CN> PIN:
…where <CN> is the value of the CN= field of the certificate Subject of the certificate on the smartcard that contains the Microsoft UPN SAN. E.g.:
LASTNAME.FIRSTNAME.123456789 PIN:
For the accounts where smartcard authentication fails, after you enter your username in gdm, the card blinks for a few seconds, and then you are prompted to enter the PIN as follows:
PIN for Smartcard:
That PIN prompt is the kiss of death: even if you enter the correct PIN, authentication will always fail.
We know that our Kerberos configuration (e.g., pkinit_cert_match) correctly yields one (and only one candidate certificate) from the smartcard, which is the correct certificate:
pkinit_cert_match = &&<SAN>.*@.*
And running kinit (with PKINIT) against the smartcard works just fine. But logins fail for some users and not others. Which almost certainly means that something is derailing sssd. But it’s not obvious what it is. We’ve double-checked that the userCertificate objects are correct in AD (that is, they match the smartcard).
Even more confusingly, the accounts for which smartcard authentication works versus doesn’t work can change over time. For example, a few weeks ago, my own account worked for smartcard login; now it doesn’t. But we know we made no configuration changes and applied no package updates to the host.
I have also had the situation where I got the “PIN for Smartcard” gdm prompt, rebooted the host, and then got the “<CN> PIN” gdm prompt. That almost implies an sssd caching issue, or inconsistent data/behavior between our (two) domain controllers.
Again, these are air-gapped systems, so I can provide no logs; we are going to have to slog through the sssd logs and figure it out on our own.
Questions for the list:
* Does this sound familiar to anyone? Have you already been down this path? If so, what did you discover?
* sssd logging can be quite voluminous (particularly at higher debugging levels), to the point where I fear I might miss the needle in the haystack that is indicating the problem. Can anyone provide some tips on specific areas where I should focus?
Thanks in advance for any tips/advice.
Hi James,
I'll try to include questions/comments/suggestions in-line below.
We have an air-gapped network of RHEL7 hosts that use sssd to perform PKINIT (smartcard + Kerberos) authentication against Windows Server 2016 domain controllers.
Setting this up properly entailed setting pkinit_anchors, pkinit_pool, pkinit_cert_match, et. al. in the krb5.conf file, and enabling smartcard authentication in gdm. It also entailed adding individual certificates to each user object’s userCertificate property, which our Windows guys grumbled about.
And I'm guessing the AD servers have the root and issuing CA certificates imported and trusted right?
Since you're problem is intermittent I would guess CA certificates missing isn't your issue. But, it can be a common one (at least during initial setup or during CA moves/retirements).
(The way Windows performs PKINIT is to find the certificate on the card that has a Microsoft User Principal Name X509v3 Subject Alternative Name, extract that value, and then look for the AD user object that has the same userPrincipalName. But the version of sssd that shipped with RHEL7 can’t do that SAN/userPrincipalName matching.)
For the most part, this has worked, and worked well. Once again, sssd has been an invaluable tool.
But.
For some accounts, smartcard authentication does not work, *even though* you can use kinit to perform PKINIT against the card (e.g., if you login via password authentication, then insert the smartcard once you have a shell window to play with).
When you're testing with kinit, are you running something like this:
kinit -X X509_user_identity=PKCS11:module_name=/usr/lib64/opensc-pkcs11.so principal@REALM
Just want to make sure I'm thinking of the correct test here.
For the accounts where smartcard authentication works, after you enter your username in gdm, the card blinks for a few seconds, and then you are prompted to enter the PIN as follows:
<CN> PIN:
…where <CN> is the value of the CN= field of the certificate Subject of the certificate on the smartcard that contains the Microsoft UPN SAN. E.g.:
LASTNAME.FIRSTNAME.123456789 PIN:
For the accounts where smartcard authentication fails, after you enter your username in gdm, the card blinks for a few seconds, and then you are prompted to enter the PIN as follows:
PIN for Smartcard:
That PIN prompt is the kiss of death: even if you enter the correct PIN, authentication will always fail.
This may be an indication that SSSD is timing out during a step but, I'm not 100% sure.
We know that our Kerberos configuration (e.g., pkinit_cert_match) correctly yields one (and only one candidate certificate) from the smartcard, which is the correct certificate:
pkinit_cert_match = &&<SAN>.*@.*
And running kinit (with PKINIT) against the smartcard works just fine. But logins fail for some users and not others. Which almost certainly means that something is derailing sssd. But it’s not obvious what it is. We’ve double-checked that the userCertificate objects are correct in AD (that is, they match the smartcard).
And this makes me think that SSSD is timing out while trying to talk to the AD server for Kerberos communications.
Even more confusingly, the accounts for which smartcard authentication works versus doesn’t work can change over time. For example, a few weeks ago, my own account worked for smartcard login; now it doesn’t. But we know we made no configuration changes and applied no package updates to the host.
I have also had the situation where I got the “PIN for Smartcard” gdm prompt, rebooted the host, and then got the “<CN> PIN” gdm prompt. That almost implies an sssd caching issue, or inconsistent data/behavior between our (two) domain controllers.
Can you try setting a couple timeouts to see if this helps? I'd suggest trying the following:
1. add kerberos timeout to the [domain/whatever] section of the sssd.conf:
krb5_auth_timeout = 60
2. add a p11_child timeout to the pam section (less likely to be your issue from the symptoms):
p11_child_timeout = 60
Again, these are air-gapped systems, so I can provide no logs; we are going to have to slog through the sssd logs and figure it out on our own.
Can you give version numbers in case there were known bugs we might be able to identify here?
One other question related to being air-gapped, do the certificates on the cards have OCSP/CRL info/urls set? If so, SSSD may be trying to check that if not disabled. So, if your certificates set OCSP, you may need to disable. You can test this with something like:
3. Disable OCSP verifications in the [sssd] section of the sssd.conf file:
certificate_verification = no_ocsp
FYI, in RHEL8 we have "soft" fail options for OCSP/CRL but, those didn't make it into RHEL7's version of SSSD.
certificate_verification = soft_ocsp,soft_crl
Questions for the list:
- Does this sound familiar to anyone? Have you already been down this path? If so, what did you discover?
Maybe, I'm hoping this is a simple timeout issue and the suggestions above work. From most of your symptoms, I think it may be the kerberos timeout issue. The OCSP issue is probably not your problem but, I've heard of (not seen personally) issues with unreliable network connectivity to OCSP servers. So if you have something in your air-gapped network that is acting as an OCSP server, it may be something to look into.
- sssd logging can be quite voluminous (particularly at higher debugging levels), to the point where I fear I might miss the needle in the haystack that is indicating the problem. Can anyone provide some tips on specific areas where I should focus?
Yes, there is a LOT of data in sssd logs especially when using "debug_level = 9".
I usually start with the p11_child.log to make sure that SSSD properly identified the card. This is also where you should see OCSP failures disable use of a certificate on the card IIRC. If it finds the certificate, you might see kerberos timeouts in the krb5_child.log file. After that, you can look through the sssd_pam.log file.
One method of sorting through the logs to find smart card related issues that I've also used is to find a timestamp of failed attempt in /var/log/secure (if setup) or the journal and just grep for that in /var/log/sssd and just sort through those.
Thanks in advance for any tips/advice.
I hope that helps, Scott
On Thu, Jul 15, 2021 at 9:37 AM Arthur Scott Poore spoore@fedoraproject.org wrote:
We managed to figure it out before I saw your reply, but you were on the right track:
One other question related to being air-gapped, do the certificates on the cards have OCSP/CRL info/urls set? If so, SSSD may be trying to check that if not disabled.
We tracked the problem down to do_verification() in src/p11_child/p11_child_nss.c. The call to CERT_VerifyCertificateNow() was returning -8102 (SEC_ERROR_INADEQUATE_KEY_USAGE; "Certificate key usage inadequate for attempted operation").
On a hunch, we set certificate_verification = no_ocsp, and the problems went away.
<rant>
NSS throwing SEC_ERROR_INADEQUATE_KEY_USAGE when it can't reach an OCSP server is the most unhelpful thing in the history of unhelpful things. This error message suggests that it is some quality of the certificate itself (KU, EKU, encryption algorithm, key signing algorithm, whatever) that NSS objects to.
I could understand that if NSS didn't have any OCSP-related error codes. But it has literally 19 of them (1):
SEC_ERROR_OCSP_UNKNOWN_RESPONSE_TYPE SEC_ERROR_OCSP_BAD_HTTP_RESPONSE SEC_ERROR_OCSP_MALFORMED_REQUEST SEC_ERROR_OCSP_SERVER_ERROR SEC_ERROR_OCSP_TRY_SERVER_LATER SEC_ERROR_OCSP_REQUEST_NEEDS_SIG SEC_ERROR_OCSP_UNAUTHORIZED_REQUEST SEC_ERROR_OCSP_UNKNOWN_RESPONSE_STATUS SEC_ERROR_OCSP_UNKNOWN_CERT SEC_ERROR_OCSP_NOT_ENABLED SEC_ERROR_OCSP_NO_DEFAULT_RESPONDER SEC_ERROR_OCSP_MALFORMED_RESPONSE SEC_ERROR_OCSP_UNAUTHORIZED_RESPONSE SEC_ERROR_OCSP_FUTURE_RESPONSE SEC_ERROR_OCSP_OLD_RESPONSE SEC_ERROR_OCSP_INVALID_SIGNING_CERT SEC_ERROR_REVOKED_CERTIFICATE_OCSP SEC_ERROR_OCSP_RESPONDER_CERT_INVALID SEC_ERROR_OCSP_BAD_SIGNATURE
But apparently, no one thought that *this* error code might actually be useful:
SEC_ERROR_OCSP_SERVER_UNREACHABLE
Gah.
If we hadn't already suspected something external (the problems were intermittent, even though nothing had changed on the hosts), who knows how far into the weeds SEC_ERROR_INADEQUATE_KEY_USAGE would have taken us.
I'm glad that for RHEL8, sssd moved from NSS to OpenSSL, because to paraphrase Theo de Raadt: OpenSSL might suck, but everything else sucks far more.
</rant>
Anyway, thanks for your reply. Hopefully this thread (especially your suggestions) will be useful to others who encounter mysterious certificate verification issues.
(1) https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/SSL_functions/...
sssd-users@lists.fedorahosted.org