URL:
https://github.com/SSSD/sssd/pull/5532
Title: #5532: Handle ldap_install_tls() configuration and retrial
ikerexxe commented:
"""
**Wrt 1st patch**
I don't understand it's description, specifically `Configure socket options when
calling ldap_install_tls() to avoid hitting EINTR during connect.` part.
IIUC as a result of this patch blocking `read()` is replaced with timed `poll()` under
the hood of openlap:openssl (`LDAP_OPT_CONNECT_ASYNC`).
As for `LDAP_OPT_NETWORK_TIMEOUT` - it is set in
[`sdap_sys_connect_done()`](https://github.com/SSSD/sssd/blob/master/src/providers/ldap/sdap_async_connection.c#L199).
Does this happen too late? Do we need to set it in both places?
Also take a note of `LDAP_OPT_RESTART` set a little bit above and a corresponding
comment. This comment looks interesting.
In general, this clearly doesn't "avoid" the issue (both read() with our
socket options and poll() return EINTR being interrupted with a signal) and, taking into
account our default timeout 6 seconds, I think it won't even make probability lower.
So... does it make sense to make openldap switch to timed poll() instead of blocking
read()? Probably "yes" as a general improvement for a 2-x branch, but at least I
wouldn't propose this change for 1-16 branch without a better reason. And anyway this
requires a better explanation.
I've run a battery of tests by setting only one of the communication flags per test.
I've uploaded the most prominent strace logs to the bugzilla in the attachment called
`strace battery of tests`. Setting `LDAP_OPT_CONNECT_ASYNC` uses the read() system call to
initiate the SSL connection and it fails with EINTR. In my opinion, it doesn't make
any sense to use this flag.
Setting `LDAP_OPT_NETWORK_TIMEOUT` also uses the read() system call to initiate the SSL
connection, but in this case it fails with EAGAIN. I'm not sure if openldap retries
the process because I don't see sssd printing `ldap_install_tls failed` but I
don't see any retrial either. I'm hesitant about setting this flag.
Setting `LDAP_OPT_RESTART` uses the read() system call to initiate the SSL connection and
it fails with EINTR. As pointed out in the bugzilla, checking the man page for
SSL_get_error API (
https://www.openssl.org/docs/man1.1.1/man3/SSL_get_error.html), it
seems that the connection should be dropped when it returns `SSL_ERROR_SYSCALL`. Taking
this into account I think that `LDAP_OPT_RESTART` shouldn't be used.
**Wrt 2nd patch**
I think we shouldn't rely on undocumented usage of `errno`, i.e. it's better to
check watchdog's context.
Do you mean watchdog_ctx structure? Would it be enough to compare the timestamp in that
structure with the actual time and see if it's below a threshold?
"""
See the full comment at
https://github.com/SSSD/sssd/pull/5532#issuecomment-800125286