URL: https://github.com/SSSD/sssd/pull/5532 Author: ikerexxe Title: #5532: Handle ldap_install_tls() configuration and retrial Action: opened
PR body: """ Configure socket options when calling ldap_install_tls() to avoid hitting EINTR during connect. Set the communication to asynchronous. This configuration can't be applied for the connection part, which has to be always blocking. On top of that set the network timeout to ldap_opt_timeout option, to decrease the possibility of triggering a timeout error when polling.
If the call to ldap_install_tls() fails with EINTR, retry it again. """
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5532/head:pr5532 git checkout pr5532
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: Handle ldap_install_tls() configuration and retrial
Label: +Waiting for review
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: Handle ldap_install_tls() configuration and retrial
Label: +Bugzilla
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: Handle ldap_install_tls() configuration and retrial
alexey-tikhonov commented: """ **Wrt 1st patch** I don't understand it's description, specifically `Configure socket options when calling ldap_install_tls() to avoid hitting EINTR during connect.` part. IIUC as a result of this patch blocking `read()` is replaced with timed `poll()` under the hood of openlap:openssl (`LDAP_OPT_CONNECT_ASYNC`). As for `LDAP_OPT_NETWORK_TIMEOUT` - it is set in [`sdap_sys_connect_done()`](https://github.com/SSSD/sssd/blob/master/src/providers/ldap/sdap_async_conne...). Does this happen too late? Do we need to set it in both places? Also take a note of `LDAP_OPT_RESTART` set a little bit above and a corresponding comment. This comment looks interesting.
In general, this clearly doesn't "avoid" the issue (both read() with our socket options and poll() return EINTR being interrupted with a signal) and, taking into account our default timeout 6 seconds, I think it won't even make probability lower.
So... does it make sense to make openldap switch to timed poll() instead of blocking read()? Probably "yes" as a general improvement for a 2-x branch, but at least I wouldn't propose this change for 1-16 branch without a better reason. And anyway this requires a better explanation.
**Wrt 2nd patch** I think we shouldn't rely on undocumented usage of `errno`, i.e. it's better to check watchdog's context. """
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-799332143
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: Handle ldap_install_tls() configuration and retrial
ikerexxe commented: """
**Wrt 1st patch** I don't understand it's description, specifically `Configure socket options when calling ldap_install_tls() to avoid hitting EINTR during connect.` part. IIUC as a result of this patch blocking `read()` is replaced with timed `poll()` under the hood of openlap:openssl (`LDAP_OPT_CONNECT_ASYNC`). As for `LDAP_OPT_NETWORK_TIMEOUT` - it is set in [`sdap_sys_connect_done()`](https://github.com/SSSD/sssd/blob/master/src/providers/ldap/sdap_async_conne...). Does this happen too late? Do we need to set it in both places? Also take a note of `LDAP_OPT_RESTART` set a little bit above and a corresponding comment. This comment looks interesting.
In general, this clearly doesn't "avoid" the issue (both read() with our socket options and poll() return EINTR being interrupted with a signal) and, taking into account our default timeout 6 seconds, I think it won't even make probability lower.
So... does it make sense to make openldap switch to timed poll() instead of blocking read()? Probably "yes" as a general improvement for a 2-x branch, but at least I wouldn't propose this change for 1-16 branch without a better reason. And anyway this requires a better explanation.
I've run a battery of tests by setting only one of the communication flags per test. I've uploaded the most prominent strace logs to the bugzilla in the attachment called `strace battery of tests`. Setting `LDAP_OPT_CONNECT_ASYNC` uses the read() system call to initiate the SSL connection and it fails with EINTR. In my opinion, it doesn't make any sense to use this flag.
Setting `LDAP_OPT_NETWORK_TIMEOUT` also uses the read() system call to initiate the SSL connection, but in this case it fails with EAGAIN. I'm not sure if openldap retries the process because I don't see sssd printing `ldap_install_tls failed` but I don't see any retrial either. I'm hesitant about setting this flag.
Setting `LDAP_OPT_RESTART` uses the read() system call to initiate the SSL connection and it fails with EINTR. As pointed out in the bugzilla, checking the man page for SSL_get_error API (https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html), it seems that the connection should be dropped when it returns `SSL_ERROR_SYSCALL`. Taking this into account I think that `LDAP_OPT_RESTART` shouldn't be used.
**Wrt 2nd patch** I think we shouldn't rely on undocumented usage of `errno`, i.e. it's better to check watchdog's context.
Do you mean watchdog_ctx structure? Would it be enough to compare the timestamp in that structure with the actual time and see if it's below a threshold?
"""
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-800125286
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: Handle ldap_install_tls() configuration and retrial
alexey-tikhonov commented: """
**Wrt 1st patch**
I've run a battery of tests by setting only one of the communication flags per test. I've uploaded the most prominent strace logs to the bugzilla in the attachment called `strace battery of tests`.
(I didn't check the logs yet but ...)
Setting `LDAP_OPT_CONNECT_ASYNC` uses the read() Setting `LDAP_OPT_NETWORK_TIMEOUT` also uses the read() Setting `LDAP_OPT_RESTART` uses the read()
Why did we see poll() previously? Anyway, at this point I'm totally lost wrt rationale of the 1st patch.
**Wrt 2nd patch** I think we shouldn't rely on undocumented usage of `errno`, i.e. it's better to check watchdog's context.
Do you mean watchdog_ctx structure?
Yes. Take a note: it's private. Please avoid making it public entirely, better introduce a helper to read required info.
Would it be enough to compare the timestamp in that structure with the actual time and see if it's below a threshold?
Not sure I understand your idea. IMO, it's better to rely on `ticks`.
"""
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-800293802
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: Handle ldap_install_tls() configuration and retrial
alexey-tikhonov commented: """
Setting `LDAP_OPT_CONNECT_ASYNC` uses the read() Setting `LDAP_OPT_NETWORK_TIMEOUT` also uses the read() Setting `LDAP_OPT_RESTART` uses the read()
Why did we see poll() previously?
Probably because it requires both ASYNC && (TIMEOUT != 0). But this raises a question if `sdap_sys_connect_done()` is a proper place to set all the options set there...
"""
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-800314529
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: Handle ldap_install_tls() configuration and retrial
ikerexxe commented: """
**Wrt 1st patch**
Why did we see poll() previously?
We saw and still see poll(), but not for the authentication handshake sequence. Without any of the options set poll() is used for some part of the communication. This part of the communication is the same one that uses poll() when any of the options is set.
Probably because it requires both ASYNC && (TIMEOUT != 0).
No, I set both options and read() is still used for the authentication handshake. As said before I think we were looking to the wrong part of the logs and that's why we saw poll().
Anyway, at this point I'm totally lost wrt rationale of the 1st patch.
I'm having doubts with the usage of `LDAP_OPT_NETWORK_TIMEOUT` option, as in this case read() fails with EAGAIN instead of EINTR and ldap_install_tls() doesn't fail. Maybe openldap handles the retry in this case but I need to take a closer look.
I'll remove the usage of `LDAP_OPT_CONNECT_ASYNC` from this patch because I don't see any benefit.
**Wrt 2nd patch** I think we shouldn't rely on undocumented usage of `errno`, i.e. it's better to check watchdog's context.
Do you mean watchdog_ctx structure?
Yes. Take a note: it's private. Please avoid making it public entirely, better introduce a helper to read required info.
Would it be enough to compare the timestamp in that structure with the actual time and see if it's below a threshold?
Not sure I understand your idea. IMO, it's better to rely on `ticks`.
I've understood your idea. """
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-800937194
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: Handle ldap_install_tls() configuration and retrial
alexey-tikhonov commented: """
**Wrt 1st patch**
Why did we see poll() previously?
We saw and still see poll(), but not for the authentication handshake sequence. Without any of the options set poll() is used for some part of the communication. This part of the communication is the same one that uses poll() when any of the options is set.
I don't think so. https://bugzilla.redhat.com/show_bug.cgi?id=1839972#c130 (sorry for the private link) : ``` 04:45:50.443947 fcntl(23, F_GETFL) = 0x2 (flags O_RDWR) <0.000009> 04:45:50.443984 fcntl(23, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000009> 04:45:50.444071 write(23, "\26...", 289) = 289 <0.000032> 04:45:50.444142 read(23, 0x55c5711ef360, 7) = -1 EAGAIN (Resource temporarily unavailable) <0.000011> 04:45:50.444190 poll([{fd=23, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 12000) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) <1.771686> ``` If this is not a part of a handshake then please tell me what is it?
Probably because it requires both ASYNC && (TIMEOUT != 0).
No, I set both options and read() is still used for the authentication handshake.
Of course, `read()` is required. After all, you need to get the data from the socket eventually :)
Let me please remind a main purpose of this "settings game". The purpose was to figure out if using `LDAP_OPT_CONNECT_ASYNC` helps to avoid removal of `O_NONBLOCK` flag from our socket. The idea was that if we would preserve `O_NONBLOCK` then we wouldn't need `SO_RCVTIMEO/SO_SNDTIMEO` socket options that were identified as a cause of ERESTARTSYS -> EINTR change (that is an immediate cause of the ticket).
But, as strace you provided in https://bugzilla.redhat.com/show_bug.cgi?id=1839972#c129 shows, `_ASYNC` replaces blocking `read()` with timed `poll()` (and frankly this is quite expected). And unfortunately `poll()` is one of ``` "interfaces are never restarted after being interrupted by a signal handler, regardless of the use of SA_RESTART; they always fail with the error EINTR when interrupted by a signal handler" ``` (signal(7) man page)
At this point I've got an answer for my initial proposal and I personally don't see a reason to touch any settings to resolve this specific ticket.
But you wrote:
I'm having doubts with the usage of LDAP_OPT_NETWORK_TIMEOUT option, as in this case read() fails with EAGAIN instead of EINTR and ldap_install_tls() doesn't fail. Maybe openldap handles the retry in this case but I need to take a closer look.
This is interesting. Could you please quote corresponding strace or at least point to a specific log that exhibits this behavior?
As I pointed out previously, SSSD already sets `LDAP_OPT_NETWORK_TIMEOUT`. Setting the same option in different places needs an explanation.
"""
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-800964115
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: Handle ldap_install_tls() configuration and retrial
ikerexxe commented: """
**Wrt 1st patch**
04:45:50.443947 fcntl(23, F_GETFL) = 0x2 (flags O_RDWR) <0.000009> 04:45:50.443984 fcntl(23, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000009> 04:45:50.444071 write(23, "\26\...", 289) = 289 <0.000032> 04:45:50.444142 read(23, 0x55c5711ef360, 7) = -1 EAGAIN (Resource temporarily unavailable) <0.000011> 04:45:50.444190 poll([{fd=23, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 12000) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) <1.771686>If this is not a part of a handshake then please tell me what is it?
It's part of the handshake, but I haven't seen this behaviour again even though I've tried it another 10 times.
But you wrote:
I'm having doubts with the usage of LDAP_OPT_NETWORK_TIMEOUT option, as in this case read() fails with EAGAIN instead of EINTR and ldap_install_tls() doesn't fail. Maybe openldap handles the retry in this case but I need to take a closer look.
This is interesting. Could you please quote corresponding strace or at least point to a specific log that exhibits this behavior?
https://bugzilla.redhat.com/attachment.cgi?id=1763588, file strace_network_timeout.out.
As I pointed out previously, SSSD already sets `LDAP_OPT_NETWORK_TIMEOUT`. Setting the same option in different places needs an explanation.
There isn't any log but it still fails and there isn't any retry procedure. So I'll remove the first patch completely.
"""
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-800986678
URL: https://github.com/SSSD/sssd/pull/5532 Author: ikerexxe Title: #5532: Handle ldap_install_tls() configuration and retrial Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5532/head:pr5532 git checkout pr5532
URL: https://github.com/SSSD/sssd/pull/5532 Author: ikerexxe Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption Action: edited
Changed field: title Original value: """ Handle ldap_install_tls() configuration and retrial """
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
ikerexxe commented: """ I've updated the PR by dropping the first patch and changing the second one to detect if ldap_install_tls() failed because the watchdog interrupted it. """
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-801041154
URL: https://github.com/SSSD/sssd/pull/5532 Author: ikerexxe Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5532/head:pr5532 git checkout pr5532
URL: https://github.com/SSSD/sssd/pull/5532 Author: ikerexxe Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5532/head:pr5532 git checkout pr5532
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
ikerexxe commented: """ Updated with Alexey's improvements. """
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-801112078
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
alexey-tikhonov commented: """ Thanks for the updates. Did you try your latest version with your reproducer? """
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-801295330
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
ikerexxe commented: """
Thanks for the updates. Did you try your latest version with your reproducer?
Yes, when the process fails it is retried. """
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-801756933
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
alexey-tikhonov commented: """
Did you try your latest version with your reproducer?
Yes, when the process fails it is retried.
Thanks for the logs. Functionally it looks good.
But I have a question: ``` (11:39:30): [pam_print_data] (0x0100): command: SSS_PAM_AUTHENTICATE (11:39:30): [sdap_uri_callback] (0x0400): Constructed uri 'ldaps://10.0.155.220:636' (11:39:30): [decide_tls_usage] (0x2000): [ldaps://10.0.155.220:636] is a secure channel. No need to run START_TLS (11:39:30): [sssd_async_socket_init_send] (0x0400): Setting 12 seconds timeout for connecting ...network delay (11:39:40): [sss_ldap_init_sys_connect_done] (0x0020): ldap_install_tls failed: [Connect error] [unknown error] (11:39:40): [sss_ldap_init_sys_connect_done] (0x0020): Assuming TLS handshake was interrupted (11:39:40): [sss_ldap_init_state_destructor] (0x0400): calling ldap_unbind_ext for ldap:[0xdf4950] sd:[26] (11:39:40): [sss_ldap_init_state_destructor] (0x0400): closing socket [26] (11:39:40): [sdap_sys_connect_done] (0x0020): sdap_async_connect_call request failed: [1432158320]: TLS handshake was interrupted. (11:39:40): [sdap_handle_release] (0x2000): Trace: sh[0xdf7020], connected[0], ops[(nil)], ldap[(nil)], destructor_lock[0], release_memory[0] (11:39:40): [sdap_cli_connect_done] (0x0040): Performing retry due to TLS handshake interruption (11:39:40): [fo_set_port_status] (0x0100): Marking port 636 of server '10.0.155.220' as 'not working' (11:39:40): [fo_set_port_status] (0x0400): Marking port 636 of duplicate server '10.0.155.220' as 'not working' (11:39:40): [decide_tls_usage] (0x2000): [ldaps://10.0.155.220:636] is a secure channel. No need to run START_TLS (11:39:40): [sssd_async_socket_init_send] (0x0400): Setting 12 seconds timeout for connecting (11:39:42): [sdap_ldap_connect_callback_add] (0x1000): New LDAP connection to [ldaps://10.0.155.220:636/??base] with fd [26]. ``` -- why `Marking port ... as 'not working'`? IIUC, this is exactly ip:port that is being retried (and succeeds). """
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-801848446
URL: https://github.com/SSSD/sssd/pull/5532 Author: ikerexxe Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5532/head:pr5532 git checkout pr5532
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
ikerexxe commented: """
-- why `Marking port ... as 'not working'`? IIUC, this is exactly ip:port that is being retried (and succeeds).
I've also taken this into account in the new version of the patch. """
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-801986134
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
alexey-tikhonov commented: """ Thank you.
Besides few minor/cosmetic remarks, looks good to me.
Covscan is clean. """
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-809410958
URL: https://github.com/SSSD/sssd/pull/5532 Author: ikerexxe Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5532/head:pr5532 git checkout pr5532
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
ikerexxe commented: """ I've changed the PR taking into account your improvements. """
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-810005543
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
alexey-tikhonov commented: """ Hi,
thank you for updates and logs.
There is a weird thing in the log: ``` [sss_ldap_init_sys_connect_done] (0x0020): Assuming TLS handshake was interrupted [sdap_sys_connect_done] (0x0020): sdap_async_connect_call request failed: [1432158320]: TLS handshake was interrupted. [sdap_cli_connect_done] (0x0040): TLS handshake was interruped, provider may retry [be_resolve_server_send] (0x0040): Will not retry, maximum number of attempts (2) reached [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP' [get_server_status] (0x1000): Status of server '10.0.155.220' is 'working' [get_port_status] (0x1000): Port status of port 636 for server '10.0.155.220' is 'working' [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds [get_server_status] (0x1000): Status of server '10.0.155.220' is 'working' [be_resolve_server_process] (0x0040): The fail over cycled through all available servers [be_resolve_server_done] (0x1000): Server resolution failed: [2]: No such file or directory ``` -- `status ... is 'working'` despite `Will not retry`
I guess the reason is `be_fo_set_port_status(... PORT_NOT_WORKING)` isn't executed in this case (while it should) """
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-810337498
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Label: -Waiting for review
URL: https://github.com/SSSD/sssd/pull/5532 Author: ikerexxe Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5532/head:pr5532 git checkout pr5532
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Label: +Waiting for review
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
ikerexxe commented: """
-- `status ... is 'working'` despite `Will not retry`
Pushed a new revision that changes this behaviour.
"""
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-810875437
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
alexey-tikhonov commented: """ Thanks. From functional point of view this now looks good.
I'm really not sure about implementation of FO code changes though. Do we actually need to go to FO code? Can't we handle `retry_same_server` right inside `sdap_cli_resolve_next()`? """
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-810900915
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
ikerexxe commented: """
Thanks. From functional point of view this now looks good.
I'm really not sure about implementation of FO code changes though. Do we actually need to go to FO code? Can't we handle `retry_same_server` right inside `sdap_cli_resolve_next()`?
It's possible but I am not sure if that's better than the actual implementation. @sumit-bose what do you think? Should I move the retry logic from `be_resolve_server_send()` to `sdap_cli_resolve_next()` as Alexey suggests? """
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-810947952
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
sumit-bose commented: """
Thanks. From functional point of view this now looks good. I'm really not sure about implementation of FO code changes though. Do we actually need to go to FO code? Can't we handle `retry_same_server` right inside `sdap_cli_resolve_next()`?
It's possible but I am not sure if that's better than the actual implementation. @sumit-bose what do you think? Should I move the retry logic from `be_resolve_server_send()` to `sdap_cli_resolve_next()` as Alexey suggests?
Hi,
in general yes, but I wonder what would be the benefits? Having it in `be_resolve_server_send()` makes the handling of the tevent request quite easy and it would allow other callers to use `retry_same_server` as well, although we currently do not have use-cases for this.
bye, Sumit """
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-813941302
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
alexey-tikhonov commented: """ Ok. """
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-813946170
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
alexey-tikhonov commented: """ LGTM, but since this is an important patch, I'd like anyone to have a second look here.
@pbrezina or @sumit-bose , could you please take a look? """
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-813969840
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
alexey-tikhonov commented: """ LGTM, but since this is an important patch, I'd like somebody to have a second look here.
@pbrezina or @sumit-bose , could you please take a look? """
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-813969840
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
alexey-tikhonov commented: """ LGTM, but since this is an important patch, I'd like somebody to have a second look.
@pbrezina or @sumit-bose , could you please take a look? """
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-813969840
URL: https://github.com/SSSD/sssd/pull/5532 Author: ikerexxe Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption Action: edited
Changed field: body Original value: """ Configure socket options when calling ldap_install_tls() to avoid hitting EINTR during connect. Set the communication to asynchronous. This configuration can't be applied for the connection part, which has to be always blocking. On top of that set the network timeout to ldap_opt_timeout option, to decrease the possibility of triggering a timeout error when polling.
If the call to ldap_install_tls() fails with EINTR, retry it again. """
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
sumit-bose commented: """
Retry logic means that something failed but we have resolved the server successfully so imho the retry logic should be implemented elsewhere. Also at this time the retry logic spans across two components - `be_resolve_server` and `sdap_cli_connect` instead of being implemented at only one place - this is because `be_resolved_server` is not the component that failed. This will have to happen in other components as well if we want to reuse it so the only part that gets reused is only the attempt counter.
Hi,
that's a fair point. @ikerexxe, would you mind the change the code and follow @pbrezina's suggestion?
bye, Sumit
My suggestion is to implement it only in `sdap_cli_connect_done` and retry only `sdap_connect_send`, but if Summit thinks that this is beneficial I won't oppose. However, triple check that storing `fo_server` pointer will not cause issues and that you don't need `for_ref_server`.
"""
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-816696617
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Label: -Waiting for review
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Label: +Changes requested
URL: https://github.com/SSSD/sssd/pull/5532 Author: ikerexxe Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5532/head:pr5532 git checkout pr5532
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
ikerexxe commented: """
Retry logic means that something failed but we have resolved the server successfully so imho the retry logic should be implemented elsewhere. Also at this time the retry logic spans across two components - `be_resolve_server` and `sdap_cli_connect` instead of being implemented at only one place - this is because `be_resolved_server` is not the component that failed. This will have to happen in other components as well if we want to reuse it so the only part that gets reused is only the attempt counter.
My suggestion is to implement it only in `sdap_cli_connect_done` and retry only `sdap_connect_send`, but if Summit thinks that this is beneficial I won't oppose.
I've changed the code to be in line with your proposal. Please review it for further improvement.
"""
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-818644180
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Label: -Changes requested
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Label: +Waiting for review
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Label: +branch: sssd-1-16
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Label: -Waiting for review
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Label: +Changes requested
URL: https://github.com/SSSD/sssd/pull/5532 Author: ikerexxe Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption Action: synchronized
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5532/head:pr5532 git checkout pr5532
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Label: +Waiting for review
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Label: -Changes requested
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
pbrezina commented: """ Ack from me. """
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-821246253
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Label: +Ready to push
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Label: -Waiting for review
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Label: +Accepted
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Label: -branch: sssd-1-16
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
pbrezina commented: """ Pushed PR: https://github.com/SSSD/sssd/pull/5532
* `master` * da55e3e69707de416b7949d08c165c950090bbb6 - ldap: retry ldap_install_tls() when watchdog interruption
"""
See the full comment at https://github.com/SSSD/sssd/pull/5532#issuecomment-823119763
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Label: +Pushed
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Label: -Accepted
URL: https://github.com/SSSD/sssd/pull/5532 Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption
Label: -Ready to push
URL: https://github.com/SSSD/sssd/pull/5532 Author: ikerexxe Title: #5532: ldap: retry ldap_install_tls() when watchdog interruption Action: closed
To pull the PR as Git branch: git remote add ghsssd https://github.com/SSSD/sssd git fetch ghsssd pull/5532/head:pr5532 git checkout pr5532
sssd-devel@lists.fedorahosted.org