URL:
https://github.com/SSSD/sssd/pull/5712
Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented:
"""
Some issues I have noticed testing more use cases:
1. Login attempts with a bad password, log tracking is not really helpful to identify
this:
~~~
# sssctl analyze request --cid=1 --pam
<snip>
...
<snip>
(2021-07-28 14:48:20): [be[ad.vm]] [unlink_dbg] (0x2000): [RID#11] File already removed:
[/var/lib/sss/pubconf/.krb5info_dummy_YlLWP4]
(2021-07-28 14:48:20): [be[ad.vm]] [sss_domain_get_state] (0x1000): [RID#11] Domain ad.vm
is Active
(2021-07-28 14:48:20): [be[ad.vm]] [child_handler_setup] (0x2000): [RID#11] Setting up
signal handler up for pid [369475]
(2021-07-28 14:48:20): [be[ad.vm]] [child_handler_setup] (0x2000): [RID#11] Signal handler
set up for pid [369475]
(2021-07-28 14:48:20): [be[ad.vm]] [write_pipe_handler] (0x0400): [RID#11] All data has
been sent!
(2021-07-28 14:48:20): [be[ad.vm]] [read_pipe_handler] (0x0400): [RID#11] EOF received,
client finished
(2021-07-28 14:48:20): [be[ad.vm]] [check_wait_queue] (0x1000): [RID#11] Wait queue for
user [vagrant(a)ad.vm] is empty.
(2021-07-28 14:48:20): [be[ad.vm]] [krb5_auth_queue_done] (0x1000): [RID#11]
krb5_auth_queue request [0x15c9da0] done.
(2021-07-28 14:48:20): [be[ad.vm]] [dp_req_done] (0x0400): [RID#11] DP Request [PAM
Authenticate #11]: Request handler finished [0]: Success
(2021-07-28 14:48:20): [be[ad.vm]] [_dp_req_recv] (0x0400): [RID#11] DP Request [PAM
Authenticate #11]: Receiving request data.
(2021-07-28 14:48:20): [be[ad.vm]] [dp_req_destructor] (0x0400): [RID#11] DP Request [PAM
Authenticate #11]: Request removed.
(2021-07-28 14:48:20): [be[ad.vm]] [dp_req_destructor] (0x0400): [RID#11] Number of active
DP request: 0
(2021-07-28 14:48:20): [be[ad.vm]] [dp_method_enabled] (0x0400): [RID#11] Target selinux
is not configured
(2021-07-28 14:48:20): [be[ad.vm]] [child_sig_handler] (0x1000): [RID#11] Waiting for
child [369475].
(2021-07-28 14:48:20): [be[ad.vm]] [child_sig_handler] (0x0100): [RID#11] child [369475]
finished successfully.
~~~
-- The actual errors are in the krb5_child log
~~~
(2021-07-28 14:48:20): [krb5_child[369475]] [tgt_req_child] (0x1000): Attempting to get a
TGT
(2021-07-28 14:48:20): [krb5_child[369475]] [get_and_save_tgt] (0x0400): Attempting kinit
for realm [AD.VM]
(2021-07-28 14:48:20): [krb5_child[369475]] [sss_krb5_get_init_creds_password] (0x0020):
1647: [-1765328360][Preauthentication failed]
(2021-07-28 14:48:20): [krb5_child[369475]] [k5c_send_data] (0x0200): Received error code
1432158222
(2021-07-28 14:48:20): [krb5_child[369475]] [pack_response_packet] (0x2000): response
packet size: [4]
~~~
-- Or in the journal
~~~
Jul 28 14:48:20 master.client.vm sshd[369464]: pam_sss(sshd:auth): authentication failure;
logname= uid=0 euid=0 tty=ssh ruser= rhost=::1 user=vagrant(a)ad.vm
Jul 28 14:48:20 master.client.vm sshd[369464]: pam_sss(sshd:auth): received for user
vagrant(a)ad.vm: 7 (Authentication failure)
Jul 28 14:48:21 master.client.vm sshd[369464]: Failed password for vagrant(a)ad.vm from ::1
port 58940 ssh2
Jul 28 14:48:20 master.client.vm krb5_child[369475]: Preauthentication failed
~~~
2. If the SSSD backend is offline (down backend server), failed `id username` command does
not indicate this
~~~
# sssctl analyze request --list
ldb: unable to dlopen /usr/lib64/ldb/modules/ldb/memberof.la :
/usr/lib64/ldb/modules/ldb/memberof.la: invalid ELF header
******** Listing nss client requests ********
(2021-07-28 14:39:56: CID #1: id
- User by name
- administrator(a)ad.vm
# sssctl analyze request --cid=1
ldb: unable to dlopen /usr/lib64/ldb/modules/ldb/memberof.la :
/usr/lib64/ldb/modules/ldb/memberof.la: invalid ELF header
******** Checking nss responder for Client ID 1 *******
(2021-07-28 14:39:56): [nss] [cache_req_send] (0x0400): CR #0: REQ_TRACE: New request [CID
#1] 'User by name'
********* Checking Backend for Client ID 1 ********
(2021-07-28 14:39:56): [be[ad.vm]] [dp_attach_req] (0x0400): [RID#2] DP Request [Account
#2]: REQ_TRACE: New request. [sssd.nss CID #1] Flags [0x0001].
(2021-07-28 14:39:56): [be[ad.vm]] [dp_attach_req] (0x0400): [RID#2] Number of active DP
request: 1
(2021-07-28 14:39:56): [be[ad.vm]] [sss_domain_get_state] (0x1000): [RID#2] Domain ad.vm
is Active
(2021-07-28 14:39:56): [be[ad.vm]] [dp_attach_req] (0x0400): [RID#3] DP Request [Account
#3]: REQ_TRACE: New request. [sssd.nss CID #1] Flags [0x0001].
(2021-07-28 14:39:56): [be[ad.vm]] [dp_attach_req] (0x0400): [RID#3] Number of active DP
request: 1
(2021-07-28 14:39:56): [be[ad.vm]] [sss_domain_get_state] (0x1000): [RID#3] Domain ad.vm
is Active
~~~
-- This is because the tool is grepping for the tevent chain ID numbers, but the offline
log message does not include the tevent request ID
~~~
(2021-07-28 14:35:05): [be[ad.vm]] [sbus_issue_request_done] (0x0040):
sssd.dataprovider.getAccountInfo: Error [1432158212]: SSSD is offline
~~~
3. If the SSSD backend is offline (down backend server), SSH attempts do not show up in
the --list --pam, as the actual user lookup part failed. This might be a bad user
experience issue.
4. Due to the nature of how files provider requests originate, and the lack of tevent
chain ID RID messages in the files provider, the parsing tool will be pretty much useless
for files provider request tracking. I think this is an acceptable limitation.
"""
See the full comment at
https://github.com/SSSD/sssd/pull/5712#issuecomment-888383631