URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: opened
PR body: """ * Print client command request list, NSS (default), or PAM ~~~ $ sssctl analyze request --list $ sssctl analyze request --list --pam ~~~
* Track individual NSS request ~~~ $ sssctl analyze request --cid=20 ~~~
* Track individual NSS request including cache request logs ~~~ $ sssctl analyze request --cid=20 --cachereq ~~~
* Track individual PAM request ~~~ $ sssctl analyze request --cid=20 --pam ~~~
* Supports --logger=journald, pending tevent chain ID support for logger=journald ~~~ $ sssctl analyze --source=journald request --list ~~~
* Analyze logs extracted, or sent from another user (CEE use case). Logs must be from compatible SSSD version with REQ_TRACE log messages and tevent chain ID support ~~~ $ sssctl analyze --logdir=/path/to/sosreport/var/log/sssd request --list ~~~ """
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented: """
I think update of `man sssctl` with new feature description is missing.
Thank you for the review. I don't see the actual sssctl commands listed in `man sssctl` could you clarify this point? """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-881639488
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
Label: +Waiting for review
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
Label: +Bugzilla
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
alexey-tikhonov commented: """ RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1940517 """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-885101451
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
alexey-tikhonov commented: """ Hi,
I played with a tool a bit.
1) About `--cid` parsing: ``` # sssctl analyze request --cid usage: sss_analyze request [-h] [--list] [--cid CID] [--pam] [--cachereq] sss_analyze request: error: argument --cid: expected one argument -- ok
# sssctl analyze request --cid= -- nothing, feels weird
# sssctl analyze request --cid=z ******** Checking nss responder for Client ID{client_id} ******* ********* Checking Backend for Client ID z ******** -- wouldn't it make sense to say something like "id 'z' not found"? ```
2) It seems `{client_id}` is printed instead of var value: ``` # sssctl analyze request --cid=711 ******** Checking nss responder for Client ID{client_id} ******* (2021-07-22 19:58:05:232246): [nss] [cache_req_send] (0x0400): CR #1111: REQ_TRACE: New request [CID #711] 'Service by name' ********* Checking Backend for Client ID 711 ******** ```
3) Sometime line is dulicated in backend logs: ``` ********* Checking Backend for Client ID 711 ******** (2021-07-22 19:58:05): [be[ipasubid2.test]] [dp_attach_req] (0x0400): [RID#200] DP Request [Account #200]: REQ_TRACE: New request. [sssd.nss CID #711] Flags [0x0001]. (2021-07-22 19:58:05): [be[ipasubid2.test]] [dp_attach_req] (0x0400): [RID#200] DP Request [Account #200]: REQ_TRACE: New request. [sssd.nss CID #711] Flags [0x0001]. (2021-07-22 19:58:05): [be[ipasubid2.test]] [dp_attach_req] (0x0400): [RID#200] Number of active DP request: 1 (2021-07-22 19:58:05): [be[ipasubid2.test]] [sss_domain_get_state] (0x1000): [RID#200] Domain ipasubid2.test is Active ``` (`grep` yields single "DP Request [Account..." line)
4) When responder is restarted, `CID` is reset and log can contain multiple unrelated requests with the same CID. Tool greps everything and this might be misleading. There is enough info in the logs to catch this, but not very simple.
5) Probably `--merge` option to merge responder and backend logs into single log (sorted by timestamp) might make some sense (not sure).
6) What are we going to do with long running clients that issue thousands of requests per single connection? Currently tool will grep everything. It definitely makes sense for case like `id user`, but not for a long running process like db. Probably `--cr=N` option would make sense?
7) (Most important) Currently `--list` output looks like: ``` (2021-07-22 21:33:21:956195): [nss] [accept_fd_handler] (0x0400): Client [CID #3][cmd id][0xd18ac0][27] connected! ``` This looks very similar to `grep` output. At the very least I would shorten this to: ``` 2021-07-22 21:33:21:956195: CID #3: cmd='id' ``` -- all those `[nss] [accept_fd_handler] (0x0400)` doesn't add value here but makes reading more difficult. Ideally output could look like following: ``` 2021-07-22 21:33:21:956195: CID #3: cmd='id', cache requests: - 'User by name' : 'rpc' - 'Group by ID' : '32' - 'Initgroups by name' : 'rpc' ... ``` But this is, of course, not very trivial and again, we need to handle case of long running process that does a lot of requests under the same CID.
"""
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-885206857
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
Label: +Changes requested
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
Label: -Waiting for review
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented: """
Hi,
I played with a tool a bit.
Thanks for taking the time to do so.
1. When responder is restarted, `CID` is reset and log can contain multiple unrelated requests with the same CID. Tool greps everything and this might be misleading. There is enough info in the logs to catch this, but not very simple.
I am not sure how best to handle this, but open to ideas. Default behavior could be to only grep the more recent CID logs, if a duplicate is detected ( and print a warning about this duplicate)
2. Probably `--merge` option to merge responder and backend logs into single log (sorted by timestamp) might make some sense (not sure).
I added a `--merge` option but it simply sorts the combined output by timestamp, it does not interweave the backend logs within each cache request lookup. This interweaved logs would require more complicated logic of linking requests. If the cache request number was sent across sbus to the backend then it could be done easier.
3. What are we going to do with long running clients that issue thousands of requests per single connection? Currently tool will grep everything. It definitely makes sense for case like `id user`, but not for a long running process like db. Probably `--cr=N` option would make sense?
This --cr=X option could be added but I wonder if this is a use case we need to handle. If you have 1000+ CR numbers logged in the responder for a single CID, it will be difficult to know which CR # you actually care about investigating.
4. (Most important) Currently `--list` output looks like:
(2021-07-22 21:33:21:956195): [nss] [accept_fd_handler] (0x0400): Client [CID #3][cmd id][0xd18ac0][27] connected!
This looks very similar to `grep` output. At the very least I would shorten this to:
2021-07-22 21:33:21:956195: CID #3: cmd='id'
-- all those `[nss] [accept_fd_handler] (0x0400)` doesn't add value here but makes reading more difficult. Ideally output could look like following:
2021-07-22 21:33:21:956195: CID #3: cmd='id', cache requests: - 'User by name' : 'rpc' - 'Group by ID' : '32' - 'Initgroups by name' : 'rpc' ...
But this is, of course, not very trivial and again, we need to handle case of long running process that does a lot of requests under the same CID.
I improved the `--list` output, please see below:
~~~
# sssctl analyze request --list ******** Listing nss client requests ******** (2021-07-26 16:12:02: CID #1: id - User by name - administrator@ad.vm - Group by ID - GID:1690200513@ad.vm - Initgroups by name - administrator - Group by ID - GID:1690200520@ad.vm - Group by ID - GID:1690200519@ad.vm - Group by ID - GID:1690200512@ad.vm - Group by ID - GID:1690200518@ad.vm - Group by ID - GID:1690200572@ad.vm (2021-07-26 16:13:35: CID #2: vim - User by ID - UID:0@ad.vm (2021-07-26 16:17:04: CID #3: vim - User by ID - UID:0@ad.vm (2021-07-26 16:38:20: CID #4: /usr/bin/bash - User by ID - UID:0@ad.vm
# sssctl analyze request --list --pam ******** Listing pam client requests ******** (2021-07-26 19:38:00: CID #1: sshd: administrator@ad.vm - Initgroups by name - administrator@ad.vm - Initgroups by name - administrator@ad.vm (2021-07-26 19:53:50: CID #2: sshd: administrator@ad.vm - Initgroups by name - administrator@ad.vm - Initgroups by name - administrator@ad.vm ~~~
"""
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-887656608
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented: """
Hi,
I played with a tool a bit.
Thanks for taking the time to do so.
1. When responder is restarted, `CID` is reset and log can contain multiple unrelated requests with the same CID. Tool greps everything and this might be misleading. There is enough info in the logs to catch this, but not very simple.
I am not sure how best to handle this, but open to ideas. Default behavior could be to only grep the more recent CID logs, if a duplicate is detected ( and print a warning about this duplicate)
2. Probably `--merge` option to merge responder and backend logs into single log (sorted by timestamp) might make some sense (not sure).
I added a `--merge` option but it simply sorts the combined output by timestamp, it does not interweave the backend logs within each cache request lookup. This interweaved logs would require more complicated logic of linking requests. If the cache request number was sent across sbus to the backend then it could be done easier.
3. What are we going to do with long running clients that issue thousands of requests per single connection? Currently tool will grep everything. It definitely makes sense for case like `id user`, but not for a long running process like db. Probably `--cr=N` option would make sense?
This --cr=X option could be added but I wonder if this is a use case we need to handle. If you have 1000+ CR numbers logged in the responder for a single CID, it will be difficult to know which CR # you actually care about investigating.
4. (Most important) Currently `--list` output looks like:
(2021-07-22 21:33:21:956195): [nss] [accept_fd_handler] (0x0400): Client [CID #3][cmd id][0xd18ac0][27] connected!
This looks very similar to `grep` output. At the very least I would shorten this to:
2021-07-22 21:33:21:956195: CID #3: cmd='id'
-- all those `[nss] [accept_fd_handler] (0x0400)` doesn't add value here but makes reading more difficult. Ideally output could look like following:
2021-07-22 21:33:21:956195: CID #3: cmd='id', cache requests: - 'User by name' : 'rpc' - 'Group by ID' : '32' - 'Initgroups by name' : 'rpc' ...
But this is, of course, not very trivial and again, we need to handle case of long running process that does a lot of requests under the same CID.
I improved the `--list` output, please see below:
~~~
# sssctl analyze request --list ******** Listing nss client requests ******** (2021-07-26 16:12:02: CID #1: id - User by name - administrator@ad.vm - Group by ID - GID:1690200513@ad.vm - Initgroups by name - administrator - Group by ID - GID:1690200520@ad.vm - Group by ID - GID:1690200519@ad.vm - Group by ID - GID:1690200512@ad.vm - Group by ID - GID:1690200518@ad.vm - Group by ID - GID:1690200572@ad.vm (2021-07-26 16:13:35: CID #2: vim - User by ID - UID:0@ad.vm (2021-07-26 16:17:04: CID #3: vim - User by ID - UID:0@ad.vm (2021-07-26 16:38:20: CID #4: /usr/bin/bash - User by ID - UID:0@ad.vm
# sssctl analyze request --list --pam ******** Listing pam client requests ******** (2021-07-26 19:38:00: CID #1: sshd: administrator@ad.vm - Initgroups by name - administrator@ad.vm - Initgroups by name - administrator@ad.vm (2021-07-26 19:53:50: CID #2: sshd: administrator@ad.vm - Initgroups by name - administrator@ad.vm - Initgroups by name - administrator@ad.vm ~~~
Other items mentioned are fixed now. """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-887656608
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented: """ Some issues I have noticed testing more use cases:
1. Login attempts with a bad password, log tracking is not really helpful to identify this: ~~~ # sssctl analyze request --cid=1 --pam <snip> ... <snip> (2021-07-28 14:48:20): [be[ad.vm]] [unlink_dbg] (0x2000): [RID#11] File already removed: [/var/lib/sss/pubconf/.krb5info_dummy_YlLWP4] (2021-07-28 14:48:20): [be[ad.vm]] [sss_domain_get_state] (0x1000): [RID#11] Domain ad.vm is Active (2021-07-28 14:48:20): [be[ad.vm]] [child_handler_setup] (0x2000): [RID#11] Setting up signal handler up for pid [369475] (2021-07-28 14:48:20): [be[ad.vm]] [child_handler_setup] (0x2000): [RID#11] Signal handler set up for pid [369475] (2021-07-28 14:48:20): [be[ad.vm]] [write_pipe_handler] (0x0400): [RID#11] All data has been sent! (2021-07-28 14:48:20): [be[ad.vm]] [read_pipe_handler] (0x0400): [RID#11] EOF received, client finished (2021-07-28 14:48:20): [be[ad.vm]] [check_wait_queue] (0x1000): [RID#11] Wait queue for user [vagrant@ad.vm] is empty. (2021-07-28 14:48:20): [be[ad.vm]] [krb5_auth_queue_done] (0x1000): [RID#11] krb5_auth_queue request [0x15c9da0] done. (2021-07-28 14:48:20): [be[ad.vm]] [dp_req_done] (0x0400): [RID#11] DP Request [PAM Authenticate #11]: Request handler finished [0]: Success (2021-07-28 14:48:20): [be[ad.vm]] [_dp_req_recv] (0x0400): [RID#11] DP Request [PAM Authenticate #11]: Receiving request data. (2021-07-28 14:48:20): [be[ad.vm]] [dp_req_destructor] (0x0400): [RID#11] DP Request [PAM Authenticate #11]: Request removed. (2021-07-28 14:48:20): [be[ad.vm]] [dp_req_destructor] (0x0400): [RID#11] Number of active DP request: 0 (2021-07-28 14:48:20): [be[ad.vm]] [dp_method_enabled] (0x0400): [RID#11] Target selinux is not configured (2021-07-28 14:48:20): [be[ad.vm]] [child_sig_handler] (0x1000): [RID#11] Waiting for child [369475]. (2021-07-28 14:48:20): [be[ad.vm]] [child_sig_handler] (0x0100): [RID#11] child [369475] finished successfully. ~~~ -- The actual errors are in the krb5_child log ~~~ (2021-07-28 14:48:20): [krb5_child[369475]] [tgt_req_child] (0x1000): Attempting to get a TGT (2021-07-28 14:48:20): [krb5_child[369475]] [get_and_save_tgt] (0x0400): Attempting kinit for realm [AD.VM] (2021-07-28 14:48:20): [krb5_child[369475]] [sss_krb5_get_init_creds_password] (0x0020): 1647: [-1765328360][Preauthentication failed] (2021-07-28 14:48:20): [krb5_child[369475]] [k5c_send_data] (0x0200): Received error code 1432158222 (2021-07-28 14:48:20): [krb5_child[369475]] [pack_response_packet] (0x2000): response packet size: [4] ~~~ -- Or in the journal ~~~ Jul 28 14:48:20 master.client.vm sshd[369464]: pam_sss(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=::1 user=vagrant@ad.vm Jul 28 14:48:20 master.client.vm sshd[369464]: pam_sss(sshd:auth): received for user vagrant@ad.vm: 7 (Authentication failure) Jul 28 14:48:21 master.client.vm sshd[369464]: Failed password for vagrant@ad.vm from ::1 port 58940 ssh2 Jul 28 14:48:20 master.client.vm krb5_child[369475]: Preauthentication failed ~~~
2. If the SSSD backend is offline (down backend server), failed `id username` command does not indicate this ~~~ # sssctl analyze request --list ldb: unable to dlopen /usr/lib64/ldb/modules/ldb/memberof.la : /usr/lib64/ldb/modules/ldb/memberof.la: invalid ELF header ******** Listing nss client requests ******** (2021-07-28 14:39:56: CID #1: id - User by name - administrator@ad.vm # sssctl analyze request --cid=1 ldb: unable to dlopen /usr/lib64/ldb/modules/ldb/memberof.la : /usr/lib64/ldb/modules/ldb/memberof.la: invalid ELF header ******** Checking nss responder for Client ID 1 ******* (2021-07-28 14:39:56): [nss] [cache_req_send] (0x0400): CR #0: REQ_TRACE: New request [CID #1] 'User by name' ********* Checking Backend for Client ID 1 ******** (2021-07-28 14:39:56): [be[ad.vm]] [dp_attach_req] (0x0400): [RID#2] DP Request [Account #2]: REQ_TRACE: New request. [sssd.nss CID #1] Flags [0x0001]. (2021-07-28 14:39:56): [be[ad.vm]] [dp_attach_req] (0x0400): [RID#2] Number of active DP request: 1 (2021-07-28 14:39:56): [be[ad.vm]] [sss_domain_get_state] (0x1000): [RID#2] Domain ad.vm is Active (2021-07-28 14:39:56): [be[ad.vm]] [dp_attach_req] (0x0400): [RID#3] DP Request [Account #3]: REQ_TRACE: New request. [sssd.nss CID #1] Flags [0x0001]. (2021-07-28 14:39:56): [be[ad.vm]] [dp_attach_req] (0x0400): [RID#3] Number of active DP request: 1 (2021-07-28 14:39:56): [be[ad.vm]] [sss_domain_get_state] (0x1000): [RID#3] Domain ad.vm is Active ~~~ -- This is because the tool is grepping for the tevent chain ID numbers, but the offline log message does not include the tevent request ID ~~~ (2021-07-28 14:35:05): [be[ad.vm]] [sbus_issue_request_done] (0x0040): sssd.dataprovider.getAccountInfo: Error [1432158212]: SSSD is offline ~~~
3. If the SSSD backend is offline (down backend server), SSH attempts do not show up in the --list --pam, as the actual user lookup part failed. This might be a bad user experience issue. """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-888383631
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented: """ Some issues I have noticed testing more use cases:
1. Login attempts with a bad password, log tracking is not really helpful to identify this: ~~~ # sssctl analyze request --cid=1 --pam <snip> ... <snip> (2021-07-28 14:48:20): [be[ad.vm]] [unlink_dbg] (0x2000): [RID#11] File already removed: [/var/lib/sss/pubconf/.krb5info_dummy_YlLWP4] (2021-07-28 14:48:20): [be[ad.vm]] [sss_domain_get_state] (0x1000): [RID#11] Domain ad.vm is Active (2021-07-28 14:48:20): [be[ad.vm]] [child_handler_setup] (0x2000): [RID#11] Setting up signal handler up for pid [369475] (2021-07-28 14:48:20): [be[ad.vm]] [child_handler_setup] (0x2000): [RID#11] Signal handler set up for pid [369475] (2021-07-28 14:48:20): [be[ad.vm]] [write_pipe_handler] (0x0400): [RID#11] All data has been sent! (2021-07-28 14:48:20): [be[ad.vm]] [read_pipe_handler] (0x0400): [RID#11] EOF received, client finished (2021-07-28 14:48:20): [be[ad.vm]] [check_wait_queue] (0x1000): [RID#11] Wait queue for user [vagrant@ad.vm] is empty. (2021-07-28 14:48:20): [be[ad.vm]] [krb5_auth_queue_done] (0x1000): [RID#11] krb5_auth_queue request [0x15c9da0] done. (2021-07-28 14:48:20): [be[ad.vm]] [dp_req_done] (0x0400): [RID#11] DP Request [PAM Authenticate #11]: Request handler finished [0]: Success (2021-07-28 14:48:20): [be[ad.vm]] [_dp_req_recv] (0x0400): [RID#11] DP Request [PAM Authenticate #11]: Receiving request data. (2021-07-28 14:48:20): [be[ad.vm]] [dp_req_destructor] (0x0400): [RID#11] DP Request [PAM Authenticate #11]: Request removed. (2021-07-28 14:48:20): [be[ad.vm]] [dp_req_destructor] (0x0400): [RID#11] Number of active DP request: 0 (2021-07-28 14:48:20): [be[ad.vm]] [dp_method_enabled] (0x0400): [RID#11] Target selinux is not configured (2021-07-28 14:48:20): [be[ad.vm]] [child_sig_handler] (0x1000): [RID#11] Waiting for child [369475]. (2021-07-28 14:48:20): [be[ad.vm]] [child_sig_handler] (0x0100): [RID#11] child [369475] finished successfully. ~~~ -- The actual errors are in the krb5_child log ~~~ (2021-07-28 14:48:20): [krb5_child[369475]] [tgt_req_child] (0x1000): Attempting to get a TGT (2021-07-28 14:48:20): [krb5_child[369475]] [get_and_save_tgt] (0x0400): Attempting kinit for realm [AD.VM] (2021-07-28 14:48:20): [krb5_child[369475]] [sss_krb5_get_init_creds_password] (0x0020): 1647: [-1765328360][Preauthentication failed] (2021-07-28 14:48:20): [krb5_child[369475]] [k5c_send_data] (0x0200): Received error code 1432158222 (2021-07-28 14:48:20): [krb5_child[369475]] [pack_response_packet] (0x2000): response packet size: [4] ~~~ -- Or in the journal ~~~ Jul 28 14:48:20 master.client.vm sshd[369464]: pam_sss(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=::1 user=vagrant@ad.vm Jul 28 14:48:20 master.client.vm sshd[369464]: pam_sss(sshd:auth): received for user vagrant@ad.vm: 7 (Authentication failure) Jul 28 14:48:21 master.client.vm sshd[369464]: Failed password for vagrant@ad.vm from ::1 port 58940 ssh2 Jul 28 14:48:20 master.client.vm krb5_child[369475]: Preauthentication failed ~~~
2. If the SSSD backend is offline (down backend server), failed `id username` command does not indicate this ~~~ # sssctl analyze request --list ldb: unable to dlopen /usr/lib64/ldb/modules/ldb/memberof.la : /usr/lib64/ldb/modules/ldb/memberof.la: invalid ELF header ******** Listing nss client requests ******** (2021-07-28 14:39:56: CID #1: id - User by name - administrator@ad.vm # sssctl analyze request --cid=1 ldb: unable to dlopen /usr/lib64/ldb/modules/ldb/memberof.la : /usr/lib64/ldb/modules/ldb/memberof.la: invalid ELF header ******** Checking nss responder for Client ID 1 ******* (2021-07-28 14:39:56): [nss] [cache_req_send] (0x0400): CR #0: REQ_TRACE: New request [CID #1] 'User by name' ********* Checking Backend for Client ID 1 ******** (2021-07-28 14:39:56): [be[ad.vm]] [dp_attach_req] (0x0400): [RID#2] DP Request [Account #2]: REQ_TRACE: New request. [sssd.nss CID #1] Flags [0x0001]. (2021-07-28 14:39:56): [be[ad.vm]] [dp_attach_req] (0x0400): [RID#2] Number of active DP request: 1 (2021-07-28 14:39:56): [be[ad.vm]] [sss_domain_get_state] (0x1000): [RID#2] Domain ad.vm is Active (2021-07-28 14:39:56): [be[ad.vm]] [dp_attach_req] (0x0400): [RID#3] DP Request [Account #3]: REQ_TRACE: New request. [sssd.nss CID #1] Flags [0x0001]. (2021-07-28 14:39:56): [be[ad.vm]] [dp_attach_req] (0x0400): [RID#3] Number of active DP request: 1 (2021-07-28 14:39:56): [be[ad.vm]] [sss_domain_get_state] (0x1000): [RID#3] Domain ad.vm is Active ~~~ -- This is because the tool is grepping for the tevent chain ID numbers, but the offline log message does not include the tevent request ID ~~~ (2021-07-28 14:35:05): [be[ad.vm]] [sbus_issue_request_done] (0x0040): sssd.dataprovider.getAccountInfo: Error [1432158212]: SSSD is offline ~~~
3. If the SSSD backend is offline (down backend server), SSH attempts do not show up in the --list --pam, as the actual user lookup part failed. This might be a bad user experience issue.
4. Due to the nature of how files provider requests originate, and the lack of tevent chain ID RID messages in the files provider, the parsing tool will be pretty much useless for files provider request tracking. I think this is an acceptable limitation. """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-888383631
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
alexey-tikhonov commented: """
- When responder is restarted, `CID` is reset and log can contain multiple unrelated requests with the same CID.
I am not sure how best to handle this, but open to ideas. Default behavior could be to only grep the more recent CID logs, if a duplicate is detected ( and print a warning about this duplicate)
Sounds ok.
I added a `--merge` option but it simply sorts the combined output by timestamp
Thank you.
This --cr=X option could be added but I wonder if this is a use case we need to handle. If you have 1000+ CR numbers logged in the responder for a single CID, it will be difficult to know which CR # you actually care about investigating.
Ok.
I improved the `--list` output, please see below:
Looks good.
"""
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-890023617
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
alexey-tikhonov commented: """
- When responder is restarted, `CID` is reset and log can contain multiple unrelated requests with the same CID.
I am not sure how best to handle this, but open to ideas. Default behavior could be to only grep the more recent CID logs, if a duplicate is detected ( and print a warning about this duplicate)
Sounds ok.
This --cr=X option could be added but I wonder if this is a use case we need to handle. If you have 1000+ CR numbers logged in the responder for a single CID, it will be difficult to know which CR # you actually care about investigating.
Ok.
I improved the `--list` output, please see below:
Looks good.
"""
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-890023617
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented: """
Some issues I have noticed testing more use cases:
1. If the SSSD backend is offline (down backend server), failed `id username` command does not indicate this
# sssctl analyze request --list ldb: unable to dlopen /usr/lib64/ldb/modules/ldb/memberof.la : /usr/lib64/ldb/modules/ldb/memberof.la: invalid ELF header ******** Listing nss client requests ******** (2021-07-28 14:39:56: CID #1: id - User by name - administrator@ad.vm # sssctl analyze request --cid=1 ldb: unable to dlopen /usr/lib64/ldb/modules/ldb/memberof.la : /usr/lib64/ldb/modules/ldb/memberof.la: invalid ELF header ******** Checking nss responder for Client ID 1 ******* (2021-07-28 14:39:56): [nss] [cache_req_send] (0x0400): CR #0: REQ_TRACE: New request [CID #1] 'User by name' ********* Checking Backend for Client ID 1 ******** (2021-07-28 14:39:56): [be[ad.vm]] [dp_attach_req] (0x0400): [RID#2] DP Request [Account #2]: REQ_TRACE: New request. [sssd.nss CID #1] Flags [0x0001]. (2021-07-28 14:39:56): [be[ad.vm]] [dp_attach_req] (0x0400): [RID#2] Number of active DP request: 1 (2021-07-28 14:39:56): [be[ad.vm]] [sss_domain_get_state] (0x1000): [RID#2] Domain ad.vm is Active (2021-07-28 14:39:56): [be[ad.vm]] [dp_attach_req] (0x0400): [RID#3] DP Request [Account #3]: REQ_TRACE: New request. [sssd.nss CID #1] Flags [0x0001]. (2021-07-28 14:39:56): [be[ad.vm]] [dp_attach_req] (0x0400): [RID#3] Number of active DP request: 1 (2021-07-28 14:39:56): [be[ad.vm]] [sss_domain_get_state] (0x1000): [RID#3] Domain ad.vm is Active
-- This is because the tool is grepping for the tevent chain ID numbers, but the offline log message does not include the tevent request ID
(2021-07-28 14:35:05): [be[ad.vm]] [sbus_issue_request_done] (0x0040): sssd.dataprovider.getAccountInfo: Error [1432158212]: SSSD is offline
@pbrezina Is there a way to get the tevent chain ID logged in this sbus `SSSD is offline` message shown above? I guess not based on the comment in https://github.com/SSSD/sssd/blob/master/src/sbus/router/sbus_router_handler... ?
If not I would need to parse the timestamps returned from the RID backend messages, and then search for regex like `$timestamp.*SSSD is offline` but that is prone to errors. Or does someone have a better suggestion?
"""
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-890110770
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented: """
Some issues I have noticed testing more use cases:
1. Login attempts with a bad password, log tracking is not really helpful to identify this:
# sssctl analyze request --cid=1 --pam <snip> ... <snip> (2021-07-28 14:48:20): [be[ad.vm]] [unlink_dbg] (0x2000): [RID#11] File already removed: [/var/lib/sss/pubconf/.krb5info_dummy_YlLWP4] (2021-07-28 14:48:20): [be[ad.vm]] [sss_domain_get_state] (0x1000): [RID#11] Domain ad.vm is Active (2021-07-28 14:48:20): [be[ad.vm]] [child_handler_setup] (0x2000): [RID#11] Setting up signal handler up for pid [369475] (2021-07-28 14:48:20): [be[ad.vm]] [child_handler_setup] (0x2000): [RID#11] Signal handler set up for pid [369475] (2021-07-28 14:48:20): [be[ad.vm]] [write_pipe_handler] (0x0400): [RID#11] All data has been sent! (2021-07-28 14:48:20): [be[ad.vm]] [read_pipe_handler] (0x0400): [RID#11] EOF received, client finished (2021-07-28 14:48:20): [be[ad.vm]] [check_wait_queue] (0x1000): [RID#11] Wait queue for user [vagrant@ad.vm] is empty. (2021-07-28 14:48:20): [be[ad.vm]] [krb5_auth_queue_done] (0x1000): [RID#11] krb5_auth_queue request [0x15c9da0] done. (2021-07-28 14:48:20): [be[ad.vm]] [dp_req_done] (0x0400): [RID#11] DP Request [PAM Authenticate #11]: Request handler finished [0]: Success (2021-07-28 14:48:20): [be[ad.vm]] [_dp_req_recv] (0x0400): [RID#11] DP Request [PAM Authenticate #11]: Receiving request data. (2021-07-28 14:48:20): [be[ad.vm]] [dp_req_destructor] (0x0400): [RID#11] DP Request [PAM Authenticate #11]: Request removed. (2021-07-28 14:48:20): [be[ad.vm]] [dp_req_destructor] (0x0400): [RID#11] Number of active DP request: 0 (2021-07-28 14:48:20): [be[ad.vm]] [dp_method_enabled] (0x0400): [RID#11] Target selinux is not configured (2021-07-28 14:48:20): [be[ad.vm]] [child_sig_handler] (0x1000): [RID#11] Waiting for child [369475]. (2021-07-28 14:48:20): [be[ad.vm]] [child_sig_handler] (0x0100): [RID#11] child [369475] finished successfully.
-- The actual errors are in the krb5_child log
(2021-07-28 14:48:20): [krb5_child[369475]] [tgt_req_child] (0x1000): Attempting to get a TGT (2021-07-28 14:48:20): [krb5_child[369475]] [get_and_save_tgt] (0x0400): Attempting kinit for realm [AD.VM] (2021-07-28 14:48:20): [krb5_child[369475]] [sss_krb5_get_init_creds_password] (0x0020): 1647: [-1765328360][Preauthentication failed] (2021-07-28 14:48:20): [krb5_child[369475]] [k5c_send_data] (0x0200): Received error code 1432158222 (2021-07-28 14:48:20): [krb5_child[369475]] [pack_response_packet] (0x2000): response packet size: [4]
-- Or in the journal
Jul 28 14:48:20 master.client.vm sshd[369464]: pam_sss(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=::1 user=vagrant@ad.vm Jul 28 14:48:20 master.client.vm sshd[369464]: pam_sss(sshd:auth): received for user vagrant@ad.vm: 7 (Authentication failure) Jul 28 14:48:21 master.client.vm sshd[369464]: Failed password for vagrant@ad.vm from ::1 port 58940 ssh2 Jul 28 14:48:20 master.client.vm krb5_child[369475]: Preauthentication failed
We agreed on the logging call that this can be addressed later
1. If the SSSD backend is offline (down backend server), failed `id username` command does not indicate this
# sssctl analyze request --list ldb: unable to dlopen /usr/lib64/ldb/modules/ldb/memberof.la : /usr/lib64/ldb/modules/ldb/memberof.la: invalid ELF header ******** Listing nss client requests ******** (2021-07-28 14:39:56: CID #1: id - User by name - administrator@ad.vm # sssctl analyze request --cid=1 ldb: unable to dlopen /usr/lib64/ldb/modules/ldb/memberof.la : /usr/lib64/ldb/modules/ldb/memberof.la: invalid ELF header ******** Checking nss responder for Client ID 1 ******* (2021-07-28 14:39:56): [nss] [cache_req_send] (0x0400): CR #0: REQ_TRACE: New request [CID #1] 'User by name' ********* Checking Backend for Client ID 1 ******** (2021-07-28 14:39:56): [be[ad.vm]] [dp_attach_req] (0x0400): [RID#2] DP Request [Account #2]: REQ_TRACE: New request. [sssd.nss CID #1] Flags [0x0001]. (2021-07-28 14:39:56): [be[ad.vm]] [dp_attach_req] (0x0400): [RID#2] Number of active DP request: 1 (2021-07-28 14:39:56): [be[ad.vm]] [sss_domain_get_state] (0x1000): [RID#2] Domain ad.vm is Active (2021-07-28 14:39:56): [be[ad.vm]] [dp_attach_req] (0x0400): [RID#3] DP Request [Account #3]: REQ_TRACE: New request. [sssd.nss CID #1] Flags [0x0001]. (2021-07-28 14:39:56): [be[ad.vm]] [dp_attach_req] (0x0400): [RID#3] Number of active DP request: 1 (2021-07-28 14:39:56): [be[ad.vm]] [sss_domain_get_state] (0x1000): [RID#3] Domain ad.vm is Active
-- This is because the tool is grepping for the tevent chain ID numbers, but the offline log message does not include the tevent request ID
(2021-07-28 14:35:05): [be[ad.vm]] [sbus_issue_request_done] (0x0040): sssd.dataprovider.getAccountInfo: Error [1432158212]: SSSD is offline
I added commit `DP: Log offline warning for REQ_TRACE tracking` which address this issue, thanks to Sumit's suggestion.
1. If the SSSD backend is offline (down backend server), SSH attempts do not show up in the --list --pam, as the actual user lookup part failed. This might be a bad user experience issue.
No fix for this, but it may be acceptable.
2. Due to the nature of how files provider requests originate, and the lack of tevent chain ID RID messages in the files provider, the parsing tool will be pretty much useless for files provider request tracking. I think this is an acceptable limitation.
We agreed on the logging call that the files provider is not a use case we need to handle with the parsing tool.
"""
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-892739954
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
Label: +Waiting for review
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
Label: -Changes requested
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented: """ I pushed a small addition to include the `pam_print_data [CID #X]` logs in the `--cid=X --pam` output, these were added in the part 1 of this feature here:
https://github.com/SSSD/sssd/pull/5620/commits/8f2d1af902a855bcd66cbbc8c949f... """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-899579591
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
alexey-tikhonov commented: """ @justin-stephenson , FYI, #5748 was merged upstream, so usage of `sssctl_run_command()` requires a small correction now. """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-899582031
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented: """
@justin-stephenson , FYI, #5748 was merged upstream, so usage of `sssctl_run_command()` requires a small correction now.
Rebased this PR on top. """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-899635575
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
sumit-bose commented: """ Hi Scott,
thanks for the latest updates.
I think the log lines from `accept_fd_handler` e.g.
2021-08-18 10:36:34:610384): [nss] [accept_fd_handler] (0x0400): Client [CID #1][cmd su][0x55d181821070][27] connected!
or
(2021-08-18 10:36:34:703799): [pam] [accept_fd_handler] (0x0400): Client [CID #1][cmd su][0x55bc092703c0][24] connected to privileged pipe!
are missing in the parser output, can you add it?
Can you show the help output if the command line is incomplete? Currently it looks like:
``` # sssctl analyze usage: sss_analyze [-h] [--source {files,journald}] [--logdir LOGDIR] {request} ... sss_analyze: error: the following arguments are required: subcommand Command '/usr/lib/python3.9/site-packages/sssd/sss_analyze.py' failed with [2]
# sssctl analyze request # ``` The first case it should not be as an error but the help output should be shown and since there is no `sss_analyze` which can be called it would be better to say `sssctl analyze`. The second case currently has no output, help output would be better here as well.
I would mention in the `--merge` help line that this currently only works properly with `debug_microseconds = True` for all components.
Another item for improvement in a new pull-request is to add the tevent id logic to the different responders. For the nss responder this won't change much since the main task here is done in the cache request code. But e.g. for the pam responder currently also only the cache request part of the logs is shown by the parser but here there are other useful log messages from the main pam responder code as well.
bye, Sumit
"""
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-901023962
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
alexey-tikhonov commented: """ https://s3.eu-central-1.amazonaws.com/sssd-ci/PR-5712/10/rhel8/multihost-ins... How to fix this? """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-901031409
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
sumit-bose commented: """ Hi Justin,
thanks for the latest updates.
I think the log lines from `accept_fd_handler` e.g.
2021-08-18 10:36:34:610384): [nss] [accept_fd_handler] (0x0400): Client [CID #1][cmd su][0x55d181821070][27] connected!
or
(2021-08-18 10:36:34:703799): [pam] [accept_fd_handler] (0x0400): Client [CID #1][cmd su][0x55bc092703c0][24] connected to privileged pipe!
are missing in the parser output, can you add it?
Can you show the help output if the command line is incomplete? Currently it looks like:
``` # sssctl analyze usage: sss_analyze [-h] [--source {files,journald}] [--logdir LOGDIR] {request} ... sss_analyze: error: the following arguments are required: subcommand Command '/usr/lib/python3.9/site-packages/sssd/sss_analyze.py' failed with [2]
# sssctl analyze request # ``` The first case it should not be as an error but the help output should be shown and since there is no `sss_analyze` which can be called it would be better to say `sssctl analyze`. The second case currently has no output, help output would be better here as well.
I would mention in the `--merge` help line that this currently only works properly with `debug_microseconds = True` for all components.
Another item for improvement in a new pull-request is to add the tevent id logic to the different responders. For the nss responder this won't change much since the main task here is done in the cache request code. But e.g. for the pam responder currently also only the cache request part of the logs is shown by the parser but here there are other useful log messages from the main pam responder code as well.
bye, Sumit
"""
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-901023962
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented: """
https://s3.eu-central-1.amazonaws.com/sssd-ci/PR-5712/10/rhel8/multihost-ins... How to fix this?
Looks like updating the shebang to /usr/bin/python3 allowed dnf to find the packages and fixed this """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-901268703
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented: """
Hi Justin,
thanks for the latest updates.
I think the log lines from `accept_fd_handler` e.g.
2021-08-18 10:36:34:610384): [nss] [accept_fd_handler] (0x0400): Client [CID #1][cmd su][0x55d181821070][27] connected!
or
(2021-08-18 10:36:34:703799): [pam] [accept_fd_handler] (0x0400): Client [CID #1][cmd su][0x55bc092703c0][24] connected to privileged pipe!
are missing in the parser output, can you add it?
Added.
Can you show the help output if the command line is incomplete? Currently it looks like:
# sssctl analyze usage: sss_analyze [-h] [--source {files,journald}] [--logdir LOGDIR] {request} ... sss_analyze: error: the following arguments are required: subcommand Command '/usr/lib/python3.9/site-packages/sssd/sss_analyze.py' failed with [2] # sssctl analyze request #
The first case it should not be as an error but the help output should be shown and since there is no `sss_analyze` which can be called it would be better to say `sssctl analyze`. The second case currently has no output, help output would be better here as well.
Fixed both.
I would mention in the `--merge` help line that this currently only works properly with `debug_microseconds = True` for all components.
Done.
Another item for improvement in a new pull-request is to add the tevent id logic to the different responders. For the nss responder this won't change much since the main task here is done in the cache request code. But e.g. for the pam responder currently also only the cache request part of the logs is shown by the parser but here there are other useful log messages from the main pam responder code as well.
Okay that makes sense, should this be worked on now or at a later point? I'm not sure what is the complexity here. Is it just a matter of adding this somewhere in responder setup code? ~~~ sss_chain_id_set(id_num); ~~~ The parsing tool may require some slight changes after this is implemented to obtain all the logs.
bye, Sumit
"""
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-902164702
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
sumit-bose commented: """ Hi,
thanks for the updates. I think from now on every enhancement should be done not in this PR but in a new one.
What I still find irritating is the repeated `request` in the help output: ``` # sssctl analyze usage: sss_analyze [-h] [--source {files,journald}] [--logdir LOGDIR] {request} ...
positional arguments: {request} request Track requests across SSSD components
optional arguments: -h, --help show this help message and exit --source {files,journald} --logdir LOGDIR ```
I think the curly braces are ok on the first line, in case we will add other sub-commands in future, but I think the second occurrence is wrong.
In the `request` help putput it might be worth to put `--list` and `--cid CID` in curly braces like e.g. `{--list | --cid CID}' to indicate that one of the two must be used.
bye, Sumit
"""
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-919052966
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented: """ I pushed a new version which removed some duplicated code. @sumit-bose could you please do a quick minimal functionality test before giving Ack. """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-920054571
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
sumit-bose commented: """ Hi,
thanks for checking the help message, as long as this is the default behavior of argparse I'm fine. I build and installed the latest version with the removed code and it is still working for me. ACK
Thanks for your patience.
bye, Sumit """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-920768851
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
Label: -Waiting for review
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
Label: +Accepted
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
Label: +Changes requested
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
Label: -Accepted
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
pbrezina commented: """ `--list` output has `(` that looks like should not be there at the beginning: ``` (2021-09-21 10:31:47: CID #21: /usr/bin/bash ```
``` (2021-09-21 10:31:47: CID #23: /usr/bin/sh - User by ID - UID:0@ldap.vm - Service by name - https - Service by name - https - User by ID - UID:0@ldap.vm - User by ID - UID:0@ldap.vm ``` I would expect the output to only list the timestampe, id and command. There might be additional parameter to make it more verbose and print the rest of the information.
For better usage, it may be good to introduce nested subcommands: ``` sssctl analyze request list --verbose sssctl analyze request show $id [options] ```
This is again something that is handled by argparse in python-nutcli and can be extracted (code is [here](https://github.com/pbrezina/python-nutcli/blob/master/nutcli/commands.py) and [here](https://github.com/pbrezina/python-nutcli/blob/master/nutcli/parser.py)). Or there are some other alternatives that might be used to create nested commands interface and already packaged (click perhaps?)
"""
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-924854691
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented: """
`--list` output has `(` that looks like should not be there at the beginning:
(2021-09-21 10:31:47: CID #21: /usr/bin/bash
(2021-09-21 10:31:47: CID #23: /usr/bin/sh - User by ID - UID:0@ldap.vm - Service by name - https - Service by name - https - User by ID - UID:0@ldap.vm - User by ID - UID:0@ldap.vm
I would expect the output to only list the timestampe, id and command. There might be additional parameter to make it more verbose and print the rest of the information.
Alexey suggested to print out this information, because otherwise the `--list` output does not give any more valuable information than what can be grepped from the NSS responder log file. I'm fine with either approach.
For better usage, it may be good to introduce nested subcommands:
sssctl analyze request list --verbose sssctl analyze request show $id [options]
This is again something that is handled by argparse in python-nutcli and can be extracted (code is [here](https://github.com/pbrezina/python-nutcli/blob/master/nutcli/commands.py) and [here](https://github.com/pbrezina/python-nutcli/blob/master/nutcli/parser.py)). Or there are some other alternatives that might be used to create nested commands interface and already packaged (click perhaps?)
Okay thanks for the pointers, I agree there can be some improvements made here. I'll look into this.
"""
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-924891118
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
pbrezina commented: """
(2021-09-21 10:31:47: CID #23: /usr/bin/sh - User by ID - UID:0@ldap.vm - Service by name - https - Service by name - https - User by ID - UID:0@ldap.vm - User by ID - UID:0@ldap.vm
I would expect the output to only list the timestampe, id and command. There might be additional parameter to make it more verbose and print the rest of the information.
Alexey suggested to print out this information, because otherwise the `--list` output does not give any more valuable information than what can be grepped from the NSS responder log file. I'm fine with either approach.
I'm not saying it shouldn't be available. But there may be thousands of requests in the logs. Even my small log file from a running machine makes the list unusable. My use case is "Something screwed up, I am able to reproduce it with id command. Find CID of the id command that I executed."
So it may be good to also provide minimum input (possibly also provide some limit and search ability). It may be also good to print command result (success/failure) if possible.
Note that we can grep our logs, our users often can't. That's why we want to deliver this tool. But functionality can be always added later, for know we should focus on making the cli clear and extensible.
For better usage, it may be good to introduce nested subcommands:
sssctl analyze request list --verbose sssctl analyze request show $id [options]
This is again something that is handled by argparse in python-nutcli and can be extracted (code is [here](https://github.com/pbrezina/python-nutcli/blob/master/nutcli/commands.py) and [here](https://github.com/pbrezina/python-nutcli/blob/master/nutcli/parser.py)). Or there are some other alternatives that might be used to create nested commands interface and already packaged (click perhaps?)
Okay thanks for the pointers, I agree there can be some improvements made here. I'll look into this.
"""
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-924901302
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
alexey-tikhonov commented: """
(2021-09-21 10:31:47: CID #23: /usr/bin/sh - User by ID - UID:0@ldap.vm - Service by name - https - Service by name - https - User by ID - UID:0@ldap.vm - User by ID - UID:0@ldap.vm
I would expect the output to only list the timestampe, id and command. There might be additional parameter to make it more verbose and print the rest of the information.
What do you mean by `command` - "/usr/bin/sh" or "User by ID"?
My use case is "Something screwed up, I am able to reproduce it with id command. Find CID of the id command that I executed."
How are you going to find "id command that I executed" if this is a busy host with a lot of look ups in parallel? Or when you are admin and your user complains "I wasn't able to login to that host yesterday" but can't provide exact timestamp? This is where having request+args would be handy.
But I don't mind: - if this output will require `--verbose` option - it will be a little bit more compact (perhaps request + args can fit a single line?)
And probably it makes sense to add a timestamp of a specific request in addition to client connection timestamp (I'm judging based on quoted output, I didn't try the latest version myself)?
"""
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-926016515
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented: """ Pushed a new version.
* I got the recommended command/subcommand parsing working with [python click](https://click.palletsprojects.com/en/8.0.x/) I hope it's okay. It looks to be available in RHEL8 and Fedora for some time.
* I changed the list output to only print 'short' lines as shown below, but the `-v/--verbose` flag enables the previous full list output ~~~ 2021-09-23 19:01:01: CID #237: /usr/sbin/crond ~~~ * The commands are now ~~~ sssctl analyze request list --verbose sssctl analyze request show $id [options] ~~~ * I created PR https://github.com/SSSD/sssd/pull/5795 to allow this parsing tool to work with `logger=journald`. Journal support is working but functionality is limited at the moment. For example list `--verbose` output relies on finding certain functions names which are not available in the journal MESSAGE field. Timestamps are also not printed but I plan to fix this next week.
* `request.py` moved into modules subdirectory
If I missing something please just let me know. """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-926893560
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented: """ Pushed a new version.
* I got the recommended command/subcommand parsing working with [python click](https://click.palletsprojects.com/en/8.0.x/) I hope it's okay. It looks to be available in RHEL8 and Fedora for some time.
* I changed the list output to only print 'short' lines as shown below, but the `-v/--verbose` flag enables the previous full list output ~~~ 2021-09-23 19:01:01: CID #237: /usr/sbin/crond ~~~ * The commands are now ~~~ sssctl analyze request list --verbose sssctl analyze request show $id [options] ~~~ * I created PR https://github.com/SSSD/sssd/pull/5795 to allow this parsing tool to work with `logger=journald`. Journal support is working but functionality is limited at the moment. For example list `--verbose` output relies on finding certain functions names which are not available in the journal MESSAGE field.
* Timestamp is now added to parsing `logger=journald` output.
* `request.py` moved into modules subdirectory
If I missing something please just let me know. """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-926893560
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
Label: +Waiting for review
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
Label: -Changes requested
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
alexey-tikhonov commented: """ There is something weird with mock build in PR CI on all Fedora targets: ``` config.status: creating po/Makefile + --enable-files-domain --disable-polkit-rules-path /var/tmp/rpm-tmp.stnl1t: line 104: --enable-files-domain: command not found ``` """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-931573395
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
pbrezina commented: """ This looks better, thanks. I would certainly like to get it reviewed by python developers. At least, the two Reader classes should use inheritance.
Perhaps, we can avoid using @PYTHON_BIN_PATH@ and configure time option and just sed the shebang in spec file after install? I'm not sure if this is a good idea, please reach to python developers for advice. But it would make testing easier.
I think it would be also good to print uid of the user who started the request (client uid) in the `list` command. It is not yet available in the logs but can be added from `cli_creds_get_uid(cli_ctx->creds)` """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-932135349
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented: """
This looks better, thanks. I would certainly like to get it reviewed by python developers. At least, the two Reader classes should use inheritance.
Perhaps, we can avoid using @PYTHON_BIN_PATH@ and configure time option and just sed the shebang in spec file after install? I'm not sure if this is a good idea, please reach to python developers for advice. But it would make testing easier.
Okay I will ask a python developer to review this.
I think it would be also good to print uid of the user who started the request (client uid) in the `list` command. It is not yet available in the logs but can be added from `cli_creds_get_uid(cli_ctx->creds)`
Done. list output now looks like ~~~ 2021-10-01 15:00:40: [uid 0] CID #1: id 2021-10-01 15:01:01: [uid 0] CID #2: /usr/sbin/crond 2021-10-01 15:04:45: [uid 1000] CID #3: id 2021-10-01 15:04:47: [uid 0] CID #4: sudo 2021-10-01 15:04:47: [uid 0] CID #5: su ~~~
"""
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-932323013
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: edited
Changed field: body Original value: """ * Print client command request list, NSS (default), or PAM ~~~ $ sssctl analyze request --list $ sssctl analyze request --list --pam ~~~
* Track individual NSS request ~~~ $ sssctl analyze request --cid=20 ~~~
* Track individual NSS request including cache request logs ~~~ $ sssctl analyze request --cid=20 --cachereq ~~~
* Track individual PAM request ~~~ $ sssctl analyze request --cid=20 --pam ~~~
* Supports --logger=journald, pending tevent chain ID support for logger=journald ~~~ $ sssctl analyze --source=journald request --list ~~~
* Analyze logs extracted, or sent from another user (CEE use case). Logs must be from compatible SSSD version with REQ_TRACE log messages and tevent chain ID support ~~~ $ sssctl analyze --logdir=/path/to/sosreport/var/log/sssd request --list ~~~ """
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
jakub-vavra-cz commented: """ Please add docstrings to the new python modules, classes and methods. The comments describing method functionality should be included in docstrings instead. Method parameters and returns should be described possibly with type hints. Some python specific IDE like Pycharm Community can help with this including overall python coding conventions (PEP-8). """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-934160697
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented: """
Please add docstrings to the new python modules, classes and methods. The comments describing method functionality should be included in docstrings instead. Method parameters and returns should be described possibly with type hints. Some python specific IDE like Pycharm Community can help with this including overall python coding conventions (PEP-8).
Thanks Jakub, I added docstrings as requested. """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-934688415
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
pbrezina commented: """ Can't run:
``` [pbrezina /dev/shm/sssd]$ sudo sssctl analyze Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/sssd/sss_analyze.py", line 7, in <module> from modules import request File "/usr/lib/python3.9/site-packages/sssd/modules/request.py", line 78 Return source lines which match any number of patterns ^ SyntaxError: invalid syntax Command '/usr/lib/python3.9/site-packages/sssd/sss_analyze.py' failed with [1]
``` """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-936227327
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented: """
Can't run:
[pbrezina /dev/shm/sssd]$ sudo sssctl analyze Traceback (most recent call last): File "/usr/lib/python3.9/site-packages/sssd/sss_analyze.py", line 7, in <module> from modules import request File "/usr/lib/python3.9/site-packages/sssd/modules/request.py", line 78 Return source lines which match any number of patterns ^ SyntaxError: invalid syntax Command '/usr/lib/python3.9/site-packages/sssd/sss_analyze.py' failed with [1]
Fixed now, sorry. """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-936299262
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented: """ Latest push includes change to declare Reader as an abstract base class with required methods, Files() and Journald() readers are subclassing Reader. """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-936600494
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented: """ Latest push includes change to declare Reader as an abstract base class with required methods, source Files and Journald classes are subclassing Reader. """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-936600494
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
pbrezina commented: """ I look into Fedora python packaging guidelines and we can use [py3_shebang_fix](https://docs.fedoraproject.org/en-US/packaging-guidelines/Python/#py3_sheban...) macro instead of --with-python-bin and autoconf. This looks like a cleaner approach. """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-939872088
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
pbrezina commented: """ I look into Fedora python packaging guidelines and we can use [py3_shebang_fix](https://docs.fedoraproject.org/en-US/packaging-guidelines/Python/#py3_sheban...) macro instead of --with-python-bin and autoconf. This looks like a cleaner approach.
After this change, I think we can push. """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-939872088
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented: """
I look into Fedora python packaging guidelines and we can use [py3_shebang_fix](https://docs.fedoraproject.org/en-US/packaging-guidelines/Python/#py3_sheban...) macro instead of --with-python-bin and autoconf. This looks like a cleaner approach.
After this change, I think we can push.
Okay thanks, it is fixed now to use `%py3_shebang_fix` and I removed the python bin lines """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-940428058
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
pbrezina commented: """ Ack. Thank you for your patience. """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-942561691
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
pbrezina commented: """ @SSSD/developers if you have further comments please, share them now. I will merge it tomorrow. """
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-942562706
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
Label: +Accepted
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
Label: -Waiting for review
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
Label: +Ready to push
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
pbrezina commented: """ Pushed PR: https://github.com/SSSD/sssd/pull/5712
* `master` * 097feb329c77f4d693beec05343baa4385b0d62b - SSSCTL: Add analyze command * 82e051e1f15060554ecacc07107c82675369e0bb - TOOLS: Add sss_analyze utility * 26086212a986f1485e0e52771d7aa9c31c92d8a7 - Responder: Log client uid that started a request * 3343b5a8111368c03ed36b965995fc255cd8530d - DP: Log offline warning for REQ_TRACE tracking
"""
See the full comment at https://github.com/SSSD/sssd/pull/5712#issuecomment-943180984
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
Label: +Pushed
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
Label: -Accepted
URL: https://github.com/SSSD/sssd/pull/5712 Title: #5712: Health and Support Analyzer - Add request log parsing utility
Label: -Ready to push
URL: https://github.com/SSSD/sssd/pull/5712 Author: justin-stephenson Title: #5712: Health and Support Analyzer - Add request log parsing utility Action: closed
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5712/head:pr5712 git checkout pr5712
sssd-devel@lists.fedorahosted.org