I'm trying to get a multi-threaded PAM app working to log in users using the 'login' service to generate client load:
https://github.com/freeipa/freeipa-perftest/blob/master/src/pamtest.c
The load is generated by running multiple client VMs and on each client running the pam tester with multiple threads.
It works about 99% of the time and I'm struggling to find out why it occasionally fails with: Authentication failure (7)
The environment is IPA with pre-created users all set with the same known LDAP and Kerberos password. The password is not marked as expired.
The journal is reporting a different PAM error:
Jan 24 16:11:57 client003.ipa.test pamtest[4285]: pam_sss(login:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=user19client003.ipa.test Jan 24 16:11:57 client003.ipa.test pamtest[4285]: pam_sss(login:auth): received for user user19client003.ipa.test: 6 (Permission denied) Jan 24 16:11:59 client003.ipa.test pamtest[4285]: pam_unix(login:session): session opened for user user19client003.ipa.test(uid=1751200171) by (uid=0) Jan 24 16:14:01 client003.ipa.test pamtest[4285]: pam_unix(login:session): session closed for user user19client003.ipa.test
The sssd pam log reports:
* (2022-01-24 16:11:50): [pam] [pam_cmd_authenticate] (0x0100): entering pam_cmd_authenticate * (2022-01-24 16:11:50): [pam] [sss_domain_get_state] (0x1000): Domain ipa.test is Active [ short snip of parse_name ] * (2022-01-24 16:11:50): [pam] [pd_set_primary_name] (0x0400): User's primary name is user19client003.ipa.test@ipa.test * (2022-01-24 16:11:50): [pam] [pam_initgr_check_timeout] (0x2000): User [user19client003.ipa.test] found in PAM cache. * (2022-01-24 16:11:50): [pam] [pam_dp_send_req] (0x0100): Sending request [CID #2] with the following data: * (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2] command: SSS_PAM_AUTHENTICATE * (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2] domain: ipa.test * (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2] user: user19client003.ipa.test@ipa.test * (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2] service: login * (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2] tty: not set * (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2] ruser: not set * (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2] rhost: not set * (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2] authtok type: 1 (Password) * (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2] newauthtok type: 0 (No authentication token available) * (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2] priv: 1 * (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2] cli_pid: 4285 * (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2] logon name: user19client003.ipa.test * (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2] flags: 2 * (2022-01-24 16:11:50): [pam] [pam_dom_forwarder] (0x0100): pam_dp_send_req returned 0 * (2022-01-24 16:11:53): [pam] [pam_initgr_cache_remove] (0x2000): [user20client003.ipa.test] removed from PAM initgroup cache * (2022-01-24 16:11:55): [pam] [pam_initgr_cache_remove] (0x2000): [user19client003.ipa.test] removed from PAM initgroup cache * (2022-01-24 16:11:57): [pam] [sbus_dispatch] (0x4000): Dispatching. * (2022-01-24 16:11:57): [pam] [pam_dp_send_req_done] (0x0200): received: [6 (Permission denied)][ipa.test][CID #2] * (2022-01-24 16:11:57): [pam] [pam_reply] (0x4000): pam_reply initially called with result [6]: Permission denied. this result might be changed during processing
I just can't tell what the permission denied means. Permission to do what? I can provide fuller logs if needed.
I suspect it may be a timeout as there is a several second delay in some of the processing.
There is nothing wrong with this particular user. I can manually re-run the tool and it will pass 100% but the server is under significantly less stress.
Hoping for suggestions on where to start looking.
thanks
rob
Am Mon, Jan 24, 2022 at 01:36:28PM -0500 schrieb Rob Crittenden:
I'm trying to get a multi-threaded PAM app working to log in users using the 'login' service to generate client load:
https://github.com/freeipa/freeipa-perftest/blob/master/src/pamtest.c
The load is generated by running multiple client VMs and on each client running the pam tester with multiple threads.
It works about 99% of the time and I'm struggling to find out why it occasionally fails with: Authentication failure (7)
The environment is IPA with pre-created users all set with the same known LDAP and Kerberos password. The password is not marked as expired.
The journal is reporting a different PAM error:
Jan 24 16:11:57 client003.ipa.test pamtest[4285]: pam_sss(login:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=user19client003.ipa.test Jan 24 16:11:57 client003.ipa.test pamtest[4285]: pam_sss(login:auth): received for user user19client003.ipa.test: 6 (Permission denied) Jan 24 16:11:59 client003.ipa.test pamtest[4285]: pam_unix(login:session): session opened for user user19client003.ipa.test(uid=1751200171) by (uid=0) Jan 24 16:14:01 client003.ipa.test pamtest[4285]: pam_unix(login:session): session closed for user user19client003.ipa.test
The sssd pam log reports:
- (2022-01-24 16:11:50): [pam] [pam_cmd_authenticate] (0x0100):
entering pam_cmd_authenticate
- (2022-01-24 16:11:50): [pam] [sss_domain_get_state] (0x1000):
Domain ipa.test is Active [ short snip of parse_name ]
- (2022-01-24 16:11:50): [pam] [pd_set_primary_name] (0x0400):
User's primary name is user19client003.ipa.test@ipa.test
- (2022-01-24 16:11:50): [pam] [pam_initgr_check_timeout] (0x2000):
User [user19client003.ipa.test] found in PAM cache.
- (2022-01-24 16:11:50): [pam] [pam_dp_send_req] (0x0100): Sending
request [CID #2] with the following data:
- (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2]
command: SSS_PAM_AUTHENTICATE
- (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2]
domain: ipa.test
- (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2]
user: user19client003.ipa.test@ipa.test
- (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2]
service: login
- (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2]
tty: not set
- (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2]
ruser: not set
- (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2]
rhost: not set
- (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2]
authtok type: 1 (Password)
- (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2]
newauthtok type: 0 (No authentication token available)
- (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2]
priv: 1
- (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2]
cli_pid: 4285
- (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2]
logon name: user19client003.ipa.test
- (2022-01-24 16:11:50): [pam] [pam_print_data] (0x0100): [CID #2]
flags: 2
- (2022-01-24 16:11:50): [pam] [pam_dom_forwarder] (0x0100):
pam_dp_send_req returned 0
- (2022-01-24 16:11:53): [pam] [pam_initgr_cache_remove] (0x2000):
[user20client003.ipa.test] removed from PAM initgroup cache
- (2022-01-24 16:11:55): [pam] [pam_initgr_cache_remove] (0x2000):
[user19client003.ipa.test] removed from PAM initgroup cache
- (2022-01-24 16:11:57): [pam] [sbus_dispatch] (0x4000): Dispatching.
- (2022-01-24 16:11:57): [pam] [pam_dp_send_req_done] (0x0200):
received: [6 (Permission denied)][ipa.test][CID #2]
Hi,
the PAM_PERM_DENIED error is returned from the backend. Please check the backend log and krb5_child.log. I agree that it looks a bit like a timeout issue since the default krb5_auth_timeout is 6s but I would expect a different return code in this case.
It looks like currently PAM_PERM_DENIED is returned during authentication only if the KDC returns KRB5KDC_ERR_CLIENT_REVOKED, i.e. the account is locked or similar.
bye, Sumit
- (2022-01-24 16:11:57): [pam] [pam_reply] (0x4000): pam_reply
initially called with result [6]: Permission denied. this result might be changed during processing
I just can't tell what the permission denied means. Permission to do what? I can provide fuller logs if needed.
I suspect it may be a timeout as there is a several second delay in some of the processing.
There is nothing wrong with this particular user. I can manually re-run the tool and it will pass 100% but the server is under significantly less stress.
Hoping for suggestions on where to start looking.
thanks
rob _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Am Mon, Jan 24, 2022 at 01:36:28PM -0500 schrieb Rob Crittenden:
Hi,
the PAM_PERM_DENIED error is returned from the backend. Please check the backend log and krb5_child.log. I agree that it looks a bit like a timeout issue since the default krb5_auth_timeout is 6s but I would expect a different return code in this case.
It looks like currently PAM_PERM_DENIED is returned during authentication only if the KDC returns KRB5KDC_ERR_CLIENT_REVOKED, i.e. the account is locked or similar.
Looks like the backend is detected as offline.
Here is a hopefully not too-hacked representation from the backend log: * (2022-01-24 16:11:50): [be[ipa.test]] [sbus_senders_lookup] (0x2000): Looking for identity of sender [sssd.pam] * (2022-01-24 16:11:50): [be[ipa.test]] [dp_pam_handler_send] (0x0100): Got request with the following data * (2022-01-24 16:11:50): [be[ipa.test]] [pam_print_data] (0x0100): [CID #2] command: SSS_PAM_AUTHENTICATE * (2022-01-24 16:11:50): [be[ipa.test]] [pam_print_data] (0x0100): [CID #2] domain: ipa.test * (2022-01-24 16:11:50): [be[ipa.test]] [pam_print_data] (0x0100): [CID #2] user: user19client003.ipa.test@ipa.test * (2022-01-24 16:11:50): [be[ipa.test]] [pam_print_data] (0x0100): [CID #2] service: login [snip but it's a password auth] * (2022-01-24 16:11:50): [be[ipa.test]] [dp_attach_req] (0x0400): [RID#171] DP Request [PAM Authenticate #171]: REQ_TRACE: New request. [sssd.pam CID #2] Flags [0000]. * (2022-01-24 16:11:50): [be[ipa.test]] [dp_attach_req] (0x0400): [RID#171] Number of active DP request: 1 * (2022-01-24 16:11:50): [be[ipa.test]] [sss_domain_get_state] (0x1000): [RID#171] Domain ipa.test is Active * (2022-01-24 16:11:50): [be[ipa.test]] [sss_domain_get_state] (0x1000): [RID#171] Domain ipa.test is Active * (2022-01-24 16:11:50): [be[ipa.test]] [krb5_auth_queue_send] (0x1000): [RID#171] Wait queue of user [user19client003.ipa.test@ipa.test] is empty, running request [0x5614e40a64d0] immediately. * (2022-01-24 16:11:50): [be[ipa.test]] [krb5_setup] (0x4000): [RID#171] No mapping for: user19client003.ipa.test@ipa.test * (2022-01-24 16:11:50): [be[ipa.test]] [krb5_auth_prepare_ccache_name] (0x1000): [RID#171] No ccache file for user [user19client003.ipa.test@ipa.test] found. * (2022-01-24 16:11:50): [be[ipa.test]] [fo_resolve_service_send] (0x0100): [RID#171] Trying to resolve service 'IPA' * (2022-01-24 16:11:50): [be[ipa.test]] [get_server_status] (0x1000): [RID#171] Status of server 'server.ipa.test' is 'working' * (2022-01-24 16:11:50): [be[ipa.test]] [get_port_status] (0x1000): [RID#171] Port status of port 0 for server 'server.ipa.test' is 'working' * (2022-01-24 16:11:50): [be[ipa.test]] [fo_resolve_service_activate_timeout] (0x2000): [RID#171] Resolve timeout [dns_resolver_timeout] set to 6 seconds * (2022-01-24 16:11:50): [be[ipa.test]] [get_server_status] (0x1000): [RID#171] Status of server 'server.ipa.test' is 'working' * (2022-01-24 16:11:50): [be[ipa.test]] [be_resolve_server_process] (0x1000): [RID#171] Saving the first resolved server * (2022-01-24 16:11:50): [be[ipa.test]] [be_resolve_server_process] (0x0200): [RID#171] Found address for server server.ipa.test: [192.168.193.244] TTL 7200 * (2022-01-24 16:11:50): [be[ipa.test]] [ipa_resolve_callback] (0x0400): [RID#171] Constructed uri 'ldap://server.ipa.test' * (2022-01-24 16:11:50): [be[ipa.test]] [krb5_add_krb5info_offline_callback] (0x4000): [RID#171] Removal callback already available for service [IPA]. * (2022-01-24 16:11:50): [be[ipa.test]] [unlink_dbg] (0x2000): [RID#171] File already removed: [/var/lib/sss/pubconf/.krb5info_dummy_hY7Urp] * (2022-01-24 16:11:50): [be[ipa.test]] [sss_domain_get_state] (0x1000): [RID#171] Domain ipa.test is Active * (2022-01-24 16:11:50): [be[ipa.test]] [child_handler_setup] (0x2000): [RID#171] Setting up signal handler up for pid [4444] * (2022-01-24 16:11:50): [be[ipa.test]] [child_handler_setup] (0x2000): [RID#171] Signal handler set up for pid [4444] * (2022-01-24 16:11:50): [be[ipa.test]] [write_pipe_handler] (0x0400): [RID#171] All data has been sent! * (2022-01-24 16:11:56): [be[ipa.test]] [krb5_auth_done] (0x0020): [RID#171] child timed out! ********************** BACKTRACE DUMP ENDS HERE *********************************
(2022-01-24 16:11:56): [be[ipa.test]] [fo_resolve_service_send] (0x0020): [RID#171] No available servers for service 'IPA'
and subsequent backtraces:
********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING BACKTRACE: * (2022-01-24 16:11:56): [be[ipa.test]] [_be_fo_set_port_status] (0x8000): [RID#171] Setting status: PORT_NOT_WORKING. Called from: src/providers/krb5/krb5_auth.c: krb5_auth_done: 862
and all the expected ports not available, etc. and no available servers for IPA (single server install).
The child is eventually killed leading to:
(2022-01-24 16:11:57): [be[ipa.test]] [krb5_auth_cache_creds] (0x0020): [RID#171] Offline authentication failed
Which makes sense since this is the first login for the user so there are no stored creds.
So if this is indeed a timeout, it is good for me. I'm explicitly trying to find the limit of how many users/clients can authenticate at more or less the same time. It seems like I've managed to push the server enough that it isn't responsive to the default sssd timeouts (and 6 seconds feels like forever if you're just trying to log in). The interesting bit is that the server isn't really stressing at all, using maybe 25% of CPU. So it must be something else that is restrictive: # of threads, connections, etc.
I'll preserve these client and server logs in case we want to do further analysis but I'll try decreasing the number of logins to continue my search for the sweet spot.
``` * (2022-01-24 16:11:50): [be[ipa.test]] [child_handler_setup] (0x2000): [RID#171] Signal handler set up for pid [4444] * (2022-01-24 16:11:50): [be[ipa.test]] [write_pipe_handler] (0x0400): [RID#171] All data has been sent! * (2022-01-24 16:11:56): [be[ipa.test]] [krb5_auth_done] (0x0020): [RID#171] child timed out! ``` -- what is in the corresponding `krb5_child.log`? (You might need to enable debug_level=9 in the domain section to get krb5_child.log populated.)
On Tue, Jan 25, 2022 at 5:30 PM Rob Crittenden rcritten@fedoraproject.org wrote:
Am Mon, Jan 24, 2022 at 01:36:28PM -0500 schrieb Rob Crittenden:
Hi,
the PAM_PERM_DENIED error is returned from the backend. Please check the backend log and krb5_child.log. I agree that it looks a bit like a timeout issue since the default krb5_auth_timeout is 6s but I would expect a different return code in this case.
It looks like currently PAM_PERM_DENIED is returned during authentication only if the KDC returns KRB5KDC_ERR_CLIENT_REVOKED, i.e. the account is locked or similar.
Looks like the backend is detected as offline.
Here is a hopefully not too-hacked representation from the backend log:
- (2022-01-24 16:11:50): [be[ipa.test]] [sbus_senders_lookup]
(0x2000): Looking for identity of sender [sssd.pam]
- (2022-01-24 16:11:50): [be[ipa.test]] [dp_pam_handler_send]
(0x0100): Got request with the following data
- (2022-01-24 16:11:50): [be[ipa.test]] [pam_print_data] (0x0100):
[CID #2] command: SSS_PAM_AUTHENTICATE
- (2022-01-24 16:11:50): [be[ipa.test]] [pam_print_data] (0x0100):
[CID #2] domain: ipa.test
- (2022-01-24 16:11:50): [be[ipa.test]] [pam_print_data] (0x0100):
[CID #2] user: user19client003.ipa.test@ipa.test
- (2022-01-24 16:11:50): [be[ipa.test]] [pam_print_data] (0x0100):
[CID #2] service: login [snip but it's a password auth]
- (2022-01-24 16:11:50): [be[ipa.test]] [dp_attach_req] (0x0400):
[RID#171] DP Request [PAM Authenticate #171]: REQ_TRACE: New request. [sssd.pam CID #2] Flags [0000].
- (2022-01-24 16:11:50): [be[ipa.test]] [dp_attach_req] (0x0400):
[RID#171] Number of active DP request: 1
- (2022-01-24 16:11:50): [be[ipa.test]] [sss_domain_get_state]
(0x1000): [RID#171] Domain ipa.test is Active
- (2022-01-24 16:11:50): [be[ipa.test]] [sss_domain_get_state]
(0x1000): [RID#171] Domain ipa.test is Active
- (2022-01-24 16:11:50): [be[ipa.test]] [krb5_auth_queue_send]
(0x1000): [RID#171] Wait queue of user [user19client003.ipa.test@ipa.test] is empty, running request [0x5614e40a64d0] immediately.
- (2022-01-24 16:11:50): [be[ipa.test]] [krb5_setup] (0x4000):
[RID#171] No mapping for: user19client003.ipa.test@ipa.test
- (2022-01-24 16:11:50): [be[ipa.test]]
[krb5_auth_prepare_ccache_name] (0x1000): [RID#171] No ccache file for user [user19client003.ipa.test@ipa.test] found.
- (2022-01-24 16:11:50): [be[ipa.test]] [fo_resolve_service_send]
(0x0100): [RID#171] Trying to resolve service 'IPA'
- (2022-01-24 16:11:50): [be[ipa.test]] [get_server_status] (0x1000):
[RID#171] Status of server 'server.ipa.test' is 'working'
- (2022-01-24 16:11:50): [be[ipa.test]] [get_port_status] (0x1000):
[RID#171] Port status of port 0 for server 'server.ipa.test' is 'working'
- (2022-01-24 16:11:50): [be[ipa.test]]
[fo_resolve_service_activate_timeout] (0x2000): [RID#171] Resolve timeout [dns_resolver_timeout] set to 6 seconds
- (2022-01-24 16:11:50): [be[ipa.test]] [get_server_status] (0x1000):
[RID#171] Status of server 'server.ipa.test' is 'working'
- (2022-01-24 16:11:50): [be[ipa.test]] [be_resolve_server_process]
(0x1000): [RID#171] Saving the first resolved server
- (2022-01-24 16:11:50): [be[ipa.test]] [be_resolve_server_process]
(0x0200): [RID#171] Found address for server server.ipa.test: [192.168.193.244] TTL 7200
- (2022-01-24 16:11:50): [be[ipa.test]] [ipa_resolve_callback]
(0x0400): [RID#171] Constructed uri 'ldap://server.ipa.test'
- (2022-01-24 16:11:50): [be[ipa.test]]
[krb5_add_krb5info_offline_callback] (0x4000): [RID#171] Removal callback already available for service [IPA].
- (2022-01-24 16:11:50): [be[ipa.test]] [unlink_dbg] (0x2000):
[RID#171] File already removed: [/var/lib/sss/pubconf/.krb5info_dummy_hY7Urp]
- (2022-01-24 16:11:50): [be[ipa.test]] [sss_domain_get_state]
(0x1000): [RID#171] Domain ipa.test is Active
- (2022-01-24 16:11:50): [be[ipa.test]] [child_handler_setup]
(0x2000): [RID#171] Setting up signal handler up for pid [4444]
- (2022-01-24 16:11:50): [be[ipa.test]] [child_handler_setup]
(0x2000): [RID#171] Signal handler set up for pid [4444]
- (2022-01-24 16:11:50): [be[ipa.test]] [write_pipe_handler] (0x0400):
[RID#171] All data has been sent!
- (2022-01-24 16:11:56): [be[ipa.test]] [krb5_auth_done] (0x0020):
[RID#171] child timed out! ********************** BACKTRACE DUMP ENDS HERE
(2022-01-24 16:11:56): [be[ipa.test]] [fo_resolve_service_send] (0x0020): [RID#171] No available servers for service 'IPA'
and subsequent backtraces:
********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING BACKTRACE:
- (2022-01-24 16:11:56): [be[ipa.test]] [_be_fo_set_port_status]
(0x8000): [RID#171] Setting status: PORT_NOT_WORKING. Called from: src/providers/krb5/krb5_auth.c: krb5_auth_done: 862
and all the expected ports not available, etc. and no available servers for IPA (single server install).
The child is eventually killed leading to:
(2022-01-24 16:11:57): [be[ipa.test]] [krb5_auth_cache_creds] (0x0020): [RID#171] Offline authentication failed
Which makes sense since this is the first login for the user so there are no stored creds.
So if this is indeed a timeout, it is good for me. I'm explicitly trying to find the limit of how many users/clients can authenticate at more or less the same time. It seems like I've managed to push the server enough that it isn't responsive to the default sssd timeouts (and 6 seconds feels like forever if you're just trying to log in). The interesting bit is that the server isn't really stressing at all, using maybe 25% of CPU. So it must be something else that is restrictive: # of threads, connections, etc.
I'll preserve these client and server logs in case we want to do further analysis but I'll try decreasing the number of logins to continue my search for the sweet spot. _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Am Tue, Jan 25, 2022 at 04:30:10PM -0000 schrieb Rob Crittenden:
Am Mon, Jan 24, 2022 at 01:36:28PM -0500 schrieb Rob Crittenden:
Hi,
the PAM_PERM_DENIED error is returned from the backend. Please check the backend log and krb5_child.log. I agree that it looks a bit like a timeout issue since the default krb5_auth_timeout is 6s but I would expect a different return code in this case.
It looks like currently PAM_PERM_DENIED is returned during authentication only if the KDC returns KRB5KDC_ERR_CLIENT_REVOKED, i.e. the account is locked or similar.
Looks like the backend is detected as offline.
Here is a hopefully not too-hacked representation from the backend log:
- (2022-01-24 16:11:50): [be[ipa.test]] [sbus_senders_lookup] (0x2000): Looking for identity of sender [sssd.pam]
- (2022-01-24 16:11:50): [be[ipa.test]] [dp_pam_handler_send] (0x0100): Got request with the following data
- (2022-01-24 16:11:50): [be[ipa.test]] [pam_print_data] (0x0100): [CID #2] command: SSS_PAM_AUTHENTICATE
- (2022-01-24 16:11:50): [be[ipa.test]] [pam_print_data] (0x0100): [CID #2] domain: ipa.test
- (2022-01-24 16:11:50): [be[ipa.test]] [pam_print_data] (0x0100): [CID #2] user: user19client003.ipa.test@ipa.test
- (2022-01-24 16:11:50): [be[ipa.test]] [pam_print_data] (0x0100): [CID #2] service: login
[snip but it's a password auth]
- (2022-01-24 16:11:50): [be[ipa.test]] [dp_attach_req] (0x0400): [RID#171] DP Request [PAM Authenticate #171]: REQ_TRACE: New request. [sssd.pam CID #2] Flags [0000].
- (2022-01-24 16:11:50): [be[ipa.test]] [dp_attach_req] (0x0400): [RID#171] Number of active DP request: 1
- (2022-01-24 16:11:50): [be[ipa.test]] [sss_domain_get_state] (0x1000): [RID#171] Domain ipa.test is Active
- (2022-01-24 16:11:50): [be[ipa.test]] [sss_domain_get_state] (0x1000): [RID#171] Domain ipa.test is Active
- (2022-01-24 16:11:50): [be[ipa.test]] [krb5_auth_queue_send] (0x1000): [RID#171] Wait queue of user [user19client003.ipa.test@ipa.test] is empty, running request [0x5614e40a64d0] immediately.
- (2022-01-24 16:11:50): [be[ipa.test]] [krb5_setup] (0x4000): [RID#171] No mapping for: user19client003.ipa.test@ipa.test
- (2022-01-24 16:11:50): [be[ipa.test]] [krb5_auth_prepare_ccache_name] (0x1000): [RID#171] No ccache file for user [user19client003.ipa.test@ipa.test] found.
- (2022-01-24 16:11:50): [be[ipa.test]] [fo_resolve_service_send] (0x0100): [RID#171] Trying to resolve service 'IPA'
- (2022-01-24 16:11:50): [be[ipa.test]] [get_server_status] (0x1000): [RID#171] Status of server 'server.ipa.test' is 'working'
- (2022-01-24 16:11:50): [be[ipa.test]] [get_port_status] (0x1000): [RID#171] Port status of port 0 for server 'server.ipa.test' is 'working'
- (2022-01-24 16:11:50): [be[ipa.test]] [fo_resolve_service_activate_timeout] (0x2000): [RID#171] Resolve timeout [dns_resolver_timeout] set to 6 seconds
- (2022-01-24 16:11:50): [be[ipa.test]] [get_server_status] (0x1000): [RID#171] Status of server 'server.ipa.test' is 'working'
- (2022-01-24 16:11:50): [be[ipa.test]] [be_resolve_server_process] (0x1000): [RID#171] Saving the first resolved server
- (2022-01-24 16:11:50): [be[ipa.test]] [be_resolve_server_process] (0x0200): [RID#171] Found address for server server.ipa.test: [192.168.193.244] TTL 7200
- (2022-01-24 16:11:50): [be[ipa.test]] [ipa_resolve_callback] (0x0400): [RID#171] Constructed uri 'ldap://server.ipa.test'
- (2022-01-24 16:11:50): [be[ipa.test]] [krb5_add_krb5info_offline_callback] (0x4000): [RID#171] Removal callback already available for service [IPA].
- (2022-01-24 16:11:50): [be[ipa.test]] [unlink_dbg] (0x2000): [RID#171] File already removed: [/var/lib/sss/pubconf/.krb5info_dummy_hY7Urp]
- (2022-01-24 16:11:50): [be[ipa.test]] [sss_domain_get_state] (0x1000): [RID#171] Domain ipa.test is Active
- (2022-01-24 16:11:50): [be[ipa.test]] [child_handler_setup] (0x2000): [RID#171] Setting up signal handler up for pid [4444]
- (2022-01-24 16:11:50): [be[ipa.test]] [child_handler_setup] (0x2000): [RID#171] Signal handler set up for pid [4444]
- (2022-01-24 16:11:50): [be[ipa.test]] [write_pipe_handler] (0x0400): [RID#171] All data has been sent!
- (2022-01-24 16:11:56): [be[ipa.test]] [krb5_auth_done] (0x0020): [RID#171] child timed out!
********************** BACKTRACE DUMP ENDS HERE *********************************
(2022-01-24 16:11:56): [be[ipa.test]] [fo_resolve_service_send] (0x0020): [RID#171] No available servers for service 'IPA'
and subsequent backtraces:
********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING BACKTRACE:
- (2022-01-24 16:11:56): [be[ipa.test]] [_be_fo_set_port_status] (0x8000): [RID#171] Setting status: PORT_NOT_WORKING. Called from: src/providers/krb5/krb5_auth.c: krb5_auth_done: 862
and all the expected ports not available, etc. and no available servers for IPA (single server install).
The child is eventually killed leading to:
(2022-01-24 16:11:57): [be[ipa.test]] [krb5_auth_cache_creds] (0x0020): [RID#171] Offline authentication failed
Which makes sense since this is the first login for the user so there are no stored creds.
Hi,
yes, PAM_PERM_DENIED is returned if there is no cached password for the current user and I guess you are using 'cache_credentials=False' (or not even set it since it is the default) to make sure every authentication is hitting the server.
bye, Sumit
So if this is indeed a timeout, it is good for me. I'm explicitly trying to find the limit of how many users/clients can authenticate at more or less the same time. It seems like I've managed to push the server enough that it isn't responsive to the default sssd timeouts (and 6 seconds feels like forever if you're just trying to log in). The interesting bit is that the server isn't really stressing at all, using maybe 25% of CPU. So it must be something else that is restrictive: # of threads, connections, etc.
I'll preserve these client and server logs in case we want to do further analysis but I'll try decreasing the number of logins to continue my search for the sweet spot. _______________________________________________ sssd-users mailing list -- sssd-users@lists.fedorahosted.org To unsubscribe send an email to sssd-users-leave@lists.fedorahosted.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.o... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
sssd-users@lists.fedorahosted.org