I'm sending two patches related to the last USN detection (ticket #734). I did some testing and they seem to work just fine. The only thing I'm not sure about is if I got right what Simo exactly meant by the ticket, but I hope I got it right.
Thanks Jan
On Fri, 11 Mar 2011 13:56:18 +0100 Jan Zelený jzeleny@redhat.com wrote:
I'm sending two patches related to the last USN detection (ticket #734). I did some testing and they seem to work just fine. The only thing I'm not sure about is if I got right what Simo exactly meant by the ticket, but I hope I got it right.
Thanks Jan
Sorry Jan but it doesn't seem to do what I asked for.
As far as I can see you are overriding the last_usn when you connect, which is the exact opposite of what should be done.
last_usn must never be set to whatever value is available from ldap.
You only need to check if the value from ldap is lower then the currently internally value for last_usn if we are reconnecting to the same server,and you need to reset last_usn to 0 if the highest value reported from ldap is lower than the highest we internally stored.
If we are connecting to a different server we always unconditionally reset last_usn to 0, as USNs are not global so checking values between different servers have no meaning.
Simo.
Simo Sorce ssorce@redhat.com wrote:
Sorry Jan but it doesn't seem to do what I asked for.
I'm not sure you are entirely right here. Let me explain:
As far as I can see you are overriding the last_usn when you connect, which is the exact opposite of what should be done.
No, I'm not overriding it. I suppose you are referring to the first patch (Add value of the last USN to server configuration). This is indeed adding the value of last USN to sdap_server_opts during connection, but this structure instance is only temporary and it is stored internally after the connection creating process is finished (see the second patch, the comparison of USN and server ID is in there).
last_usn must never be set to whatever value is available from ldap.
You only need to check if the value from ldap is lower then the currently internally value for last_usn if we are reconnecting to the same server,and you need to reset last_usn to 0 if the highest value reported from ldap is lower than the highest we internally stored.
If we are connecting to a different server we always unconditionally reset last_usn to 0, as USNs are not global so checking values between different servers have no meaning.
I'm not sure we should be resetting the value to 0. Even after fresh replica installation the USN is not zero. In fact I didn't stumble upon a single case of USN being zero (considering that it was supported and used on the server). Can you please explain a little bit more why we should do this and not just leave the USN which is currently on the server?
On Tue, 15 Mar 2011 09:40:35 +0100 Jan Zelený jzeleny@redhat.com wrote:
Simo Sorce ssorce@redhat.com wrote:
Sorry Jan but it doesn't seem to do what I asked for.
I'm not sure you are entirely right here. Let me explain:
As far as I can see you are overriding the last_usn when you connect, which is the exact opposite of what should be done.
No, I'm not overriding it. I suppose you are referring to the first patch (Add value of the last USN to server configuration). This is indeed adding the value of last USN to sdap_server_opts during connection, but this structure instance is only temporary and it is stored internally after the connection creating process is finished (see the second patch, the comparison of USN and server ID is in there).
last_usn must never be set to whatever value is available from ldap.
You only need to check if the value from ldap is lower then the currently internally value for last_usn if we are reconnecting to the same server,and you need to reset last_usn to 0 if the highest value reported from ldap is lower than the highest we internally stored.
If we are connecting to a different server we always unconditionally reset last_usn to 0, as USNs are not global so checking values between different servers have no meaning.
I'm not sure we should be resetting the value to 0. Even after fresh replica installation the USN is not zero. In fact I didn't stumble upon a single case of USN being zero (considering that it was supported and used on the server). Can you please explain a little bit more why we should do this and not just leave the USN which is currently on the server?
The last usn on the server is simply used to keep track of what is the highest usn used by the server.
The one we use internally instead is the higher usn of any entry we actually *fetched* from the server.
So while we remain attached to the same server, and this server is operating normally, the server's highest USN is always >= the one we have on file.
If the server resets the USNs (this happen after a replica re-initialization in 389 DS for example), then we want to set the internal highest USN to 0 so that on the next enumeration search we will search for any entry with USN > 0. Using any other value may result in skipping entries that have changed between the time we fetched them and the replica was re-initialized.
Same reasoning when we switch to a new server. Because USNs are not global, each server has it's own USN number space, so when we switch we have to reset to 0 as the highest USN we currently have on file has no meaning and using it in a search may make us skip to search entries that changed since our last fetch. In extreme cases, if the new server have USNs that are much lower then the ones in the previous server (because it was just re-initialized for example), then we would fail to get any new change for potentially a very long period of time, at least until something forces us to do a full rescan, and yet because we do not reset the highest usn on a full rescan we may still keep not getting changes after the full rescan.
So, setting the usn to 0 on a new connection (or a re-initialized server) is fundamental, and setting it to the current reported highest usn from the server is completely wrong and defeats the purpose.
Simo.
Simo Sorce ssorce@redhat.com wrote:
On Tue, 15 Mar 2011 09:40:35 +0100
Jan Zelený jzeleny@redhat.com wrote:
Simo Sorce ssorce@redhat.com wrote:
Sorry Jan but it doesn't seem to do what I asked for.
I'm not sure you are entirely right here. Let me explain:
As far as I can see you are overriding the last_usn when you connect, which is the exact opposite of what should be done.
No, I'm not overriding it. I suppose you are referring to the first patch (Add value of the last USN to server configuration). This is indeed adding the value of last USN to sdap_server_opts during connection, but this structure instance is only temporary and it is stored internally after the connection creating process is finished (see the second patch, the comparison of USN and server ID is in there).
last_usn must never be set to whatever value is available from ldap.
You only need to check if the value from ldap is lower then the currently internally value for last_usn if we are reconnecting to the same server,and you need to reset last_usn to 0 if the highest value reported from ldap is lower than the highest we internally stored.
If we are connecting to a different server we always unconditionally reset last_usn to 0, as USNs are not global so checking values between different servers have no meaning.
I'm not sure we should be resetting the value to 0. Even after fresh replica installation the USN is not zero. In fact I didn't stumble upon a single case of USN being zero (considering that it was supported and used on the server). Can you please explain a little bit more why we should do this and not just leave the USN which is currently on the server?
The last usn on the server is simply used to keep track of what is the highest usn used by the server.
The one we use internally instead is the higher usn of any entry we actually *fetched* from the server.
So while we remain attached to the same server, and this server is operating normally, the server's highest USN is always >= the one we have on file.
If the server resets the USNs (this happen after a replica re-initialization in 389 DS for example), then we want to set the internal highest USN to 0 so that on the next enumeration search we will search for any entry with USN > 0. Using any other value may result in skipping entries that have changed between the time we fetched them and the replica was re-initialized.
Same reasoning when we switch to a new server. Because USNs are not global, each server has it's own USN number space, so when we switch we have to reset to 0 as the highest USN we currently have on file has no meaning and using it in a search may make us skip to search entries that changed since our last fetch. In extreme cases, if the new server have USNs that are much lower then the ones in the previous server (because it was just re-initialized for example), then we would fail to get any new change for potentially a very long period of time, at least until something forces us to do a full rescan, and yet because we do not reset the highest usn on a full rescan we may still keep not getting changes after the full rescan.
So, setting the usn to 0 on a new connection (or a re-initialized server) is fundamental, and setting it to the current reported highest usn from the server is completely wrong and defeats the purpose.
Simo.
Thanks for very detailed explanation Simo, I'm sending corrected patch in attachment.
-- Jan
Jan Zelený jzeleny@redhat.com wrote:
Simo Sorce ssorce@redhat.com wrote:
On Tue, 15 Mar 2011 09:40:35 +0100
Jan Zelený jzeleny@redhat.com wrote:
Simo Sorce ssorce@redhat.com wrote:
Sorry Jan but it doesn't seem to do what I asked for.
I'm not sure you are entirely right here. Let me explain:
As far as I can see you are overriding the last_usn when you connect, which is the exact opposite of what should be done.
No, I'm not overriding it. I suppose you are referring to the first patch (Add value of the last USN to server configuration). This is indeed adding the value of last USN to sdap_server_opts during connection, but this structure instance is only temporary and it is stored internally after the connection creating process is finished (see the second patch, the comparison of USN and server ID is in there).
last_usn must never be set to whatever value is available from ldap.
You only need to check if the value from ldap is lower then the currently internally value for last_usn if we are reconnecting to the same server,and you need to reset last_usn to 0 if the highest value reported from ldap is lower than the highest we internally stored.
If we are connecting to a different server we always unconditionally reset last_usn to 0, as USNs are not global so checking values between different servers have no meaning.
I'm not sure we should be resetting the value to 0. Even after fresh replica installation the USN is not zero. In fact I didn't stumble upon a single case of USN being zero (considering that it was supported and used on the server). Can you please explain a little bit more why we should do this and not just leave the USN which is currently on the server?
The last usn on the server is simply used to keep track of what is the highest usn used by the server.
The one we use internally instead is the higher usn of any entry we actually *fetched* from the server.
So while we remain attached to the same server, and this server is operating normally, the server's highest USN is always >= the one we have on file.
If the server resets the USNs (this happen after a replica re-initialization in 389 DS for example), then we want to set the internal highest USN to 0 so that on the next enumeration search we will search for any entry with USN > 0. Using any other value may result in skipping entries that have changed between the time we fetched them and the replica was re-initialized.
Same reasoning when we switch to a new server. Because USNs are not global, each server has it's own USN number space, so when we switch we have to reset to 0 as the highest USN we currently have on file has no meaning and using it in a search may make us skip to search entries that changed since our last fetch. In extreme cases, if the new server have USNs that are much lower then the ones in the previous server (because it was just re-initialized for example), then we would fail to get any new change for potentially a very long period of time, at least until something forces us to do a full rescan, and yet because we do not reset the highest usn on a full rescan we may still keep not getting changes after the full rescan.
So, setting the usn to 0 on a new connection (or a re-initialized server) is fundamental, and setting it to the current reported highest usn from the server is completely wrong and defeats the purpose.
Simo.
Thanks for very detailed explanation Simo, I'm sending corrected patch in attachment.
Can someone please take a look at the patch and ack that all issues discovered by Simo have been corrected?
Also note that in the original email there is another patch, which hasn't been reviewed yet.
Thanks Jan
On Tue, 2011-04-12 at 12:33 +0200, Jan Zelený wrote:
Jan Zelený jzeleny@redhat.com wrote:
Simo Sorce ssorce@redhat.com wrote:
On Tue, 15 Mar 2011 09:40:35 +0100
Jan Zelený jzeleny@redhat.com wrote:
Simo Sorce ssorce@redhat.com wrote:
Sorry Jan but it doesn't seem to do what I asked for.
I'm not sure you are entirely right here. Let me explain:
As far as I can see you are overriding the last_usn when you connect, which is the exact opposite of what should be done.
No, I'm not overriding it. I suppose you are referring to the first patch (Add value of the last USN to server configuration). This is indeed adding the value of last USN to sdap_server_opts during connection, but this structure instance is only temporary and it is stored internally after the connection creating process is finished (see the second patch, the comparison of USN and server ID is in there).
last_usn must never be set to whatever value is available from ldap.
You only need to check if the value from ldap is lower then the currently internally value for last_usn if we are reconnecting to the same server,and you need to reset last_usn to 0 if the highest value reported from ldap is lower than the highest we internally stored.
If we are connecting to a different server we always unconditionally reset last_usn to 0, as USNs are not global so checking values between different servers have no meaning.
I'm not sure we should be resetting the value to 0. Even after fresh replica installation the USN is not zero. In fact I didn't stumble upon a single case of USN being zero (considering that it was supported and used on the server). Can you please explain a little bit more why we should do this and not just leave the USN which is currently on the server?
The last usn on the server is simply used to keep track of what is the highest usn used by the server.
The one we use internally instead is the higher usn of any entry we actually *fetched* from the server.
So while we remain attached to the same server, and this server is operating normally, the server's highest USN is always >= the one we have on file.
If the server resets the USNs (this happen after a replica re-initialization in 389 DS for example), then we want to set the internal highest USN to 0 so that on the next enumeration search we will search for any entry with USN > 0. Using any other value may result in skipping entries that have changed between the time we fetched them and the replica was re-initialized.
Same reasoning when we switch to a new server. Because USNs are not global, each server has it's own USN number space, so when we switch we have to reset to 0 as the highest USN we currently have on file has no meaning and using it in a search may make us skip to search entries that changed since our last fetch. In extreme cases, if the new server have USNs that are much lower then the ones in the previous server (because it was just re-initialized for example), then we would fail to get any new change for potentially a very long period of time, at least until something forces us to do a full rescan, and yet because we do not reset the highest usn on a full rescan we may still keep not getting changes after the full rescan.
So, setting the usn to 0 on a new connection (or a re-initialized server) is fundamental, and setting it to the current reported highest usn from the server is completely wrong and defeats the purpose.
Simo.
Thanks for very detailed explanation Simo, I'm sending corrected patch in attachment.
Can someone please take a look at the patch and ack that all issues discovered by Simo have been corrected?
Also note that in the original email there is another patch, which hasn't been reviewed yet.
Ack to both patches and pushed to master.
On Tue, 2011-04-19 at 14:57 -0400, Stephen Gallagher wrote:
On Tue, 2011-04-12 at 12:33 +0200, Jan Zelený wrote:
Can someone please take a look at the patch and ack that all issues discovered by Simo have been corrected?
Also note that in the original email there is another patch, which hasn't been reviewed yet.
Ack to both patches and pushed to master.
Unfortunately, I've now discovered several issues with this patch (too late).
First, it causes a crash if SSSD started up in offline mode (e.g. disconnected laptop) and then goes online (e.g. VPN).
#0 0x00007fec7adddd28 in sdap_check_online_done (req=0x0) at ../src/providers/ldap/ldap_id.c:729 #1 0x00007fec7ae0f3e1 in sdap_cli_auth_done (subreq=0x0) at ../src/providers/ldap/sdap_async_connection.c:1430 #2 0x00007fec7ae0e207 in sdap_auth_done (subreq=0x1ea9570) at ../src/providers/ldap/sdap_async_connection.c:1007 #3 0x00007fec7ae0c8a9 in simple_bind_done (op=0x1ea9880, reply=0x1ea97f0, error=0, pvt=0x1ea9570) at ../src/providers/ldap/sdap_async_connection.c:540 #4 0x00007fec7adf4aa9 in sdap_process_message (ev=0x1e74470, sh=0x1eb2c70, msg=0x1eb33e0) at ../src/providers/ldap/sdap_async.c:364 #5 0x00007fec7adf4363 in sdap_process_result (ev=0x1e74470, pvt=0x1eb2c70) at ../src/providers/ldap/sdap_async.c:207 #6 0x00007fec7adf3db6 in sdap_ldap_result (ev=0x1e74470, fde=0x1eb3140, flags=1, pvt=0x1eb2c70) at ../src/providers/ldap/sdap_async.c:152 #7 0x000000389ee05d57 in ?? () from /usr/lib64/libtevent.so.0 #8 0x000000389ee035b0 in _tevent_loop_once () from /usr/lib64/libtevent.so.0 #9 0x000000389ee0373b in tevent_common_loop_wait () from /usr/lib64/libtevent.so.0 #10 0x000000000043d41f in server_loop (main_ctx=0x1e755e0) at ../src/util/server.c:526 #11 0x000000000040f8d2 in main (argc=6, argv=0x7fffbb9c67a8) at ../src/providers/data_provider_be.c:1284
The problem is that ctx->srv_opts is NULL here (since we've never made a connection to the server before this).
While I was investigating this, I noticed something else as well. This entire IF block is completely useless as written. In the IF block, you set ctx->srv_opts attributes, but then immediately after this block, you replace the contents of ctx->srv_opts with the returned srv_opts from sdap_cli_connect_recv()
My guess here is that the assignments in the IF block were supposed to be setting srv_opts instead of ctx->srv_opts. The only part that confuses me is where you're setting ctx->srv_opts->last_usn to srv_opts->last_usn. I thought we had covered before that if the value of srv_opts->last_usn was ever lower than the stored value, we MUST set it to zero. This is because we know that the USN values are not in line with the ones we have stored, so we have to do a full update to be safe.
I'm attaching a patch that I think will resolve the issue (though I can't test it as I don't have access to a server with USN values). At the very least, it resolves the crash above for me.
On 04/27/2011 03:30 PM, Stephen Gallagher wrote:
On Tue, 2011-04-19 at 14:57 -0400, Stephen Gallagher wrote:
On Tue, 2011-04-12 at 12:33 +0200, Jan Zelený wrote:
Can someone please take a look at the patch and ack that all issues discovered by Simo have been corrected?
Also note that in the original email there is another patch, which hasn't been reviewed yet.
Ack to both patches and pushed to master.
Unfortunately, I've now discovered several issues with this patch (too late).
First, it causes a crash if SSSD started up in offline mode (e.g. disconnected laptop) and then goes online (e.g. VPN).
#0 0x00007fec7adddd28 in sdap_check_online_done (req=0x0) at ../src/providers/ldap/ldap_id.c:729 #1 0x00007fec7ae0f3e1 in sdap_cli_auth_done (subreq=0x0) at ../src/providers/ldap/sdap_async_connection.c:1430 #2 0x00007fec7ae0e207 in sdap_auth_done (subreq=0x1ea9570) at ../src/providers/ldap/sdap_async_connection.c:1007 #3 0x00007fec7ae0c8a9 in simple_bind_done (op=0x1ea9880, reply=0x1ea97f0, error=0, pvt=0x1ea9570) at ../src/providers/ldap/sdap_async_connection.c:540 #4 0x00007fec7adf4aa9 in sdap_process_message (ev=0x1e74470, sh=0x1eb2c70, msg=0x1eb33e0) at ../src/providers/ldap/sdap_async.c:364 #5 0x00007fec7adf4363 in sdap_process_result (ev=0x1e74470, pvt=0x1eb2c70) at ../src/providers/ldap/sdap_async.c:207 #6 0x00007fec7adf3db6 in sdap_ldap_result (ev=0x1e74470, fde=0x1eb3140, flags=1, pvt=0x1eb2c70) at ../src/providers/ldap/sdap_async.c:152 #7 0x000000389ee05d57 in ?? () from /usr/lib64/libtevent.so.0 #8 0x000000389ee035b0 in _tevent_loop_once () from /usr/lib64/libtevent.so.0 #9 0x000000389ee0373b in tevent_common_loop_wait () from /usr/lib64/libtevent.so.0 #10 0x000000000043d41f in server_loop (main_ctx=0x1e755e0) at ../src/util/server.c:526 #11 0x000000000040f8d2 in main (argc=6, argv=0x7fffbb9c67a8) at ../src/providers/data_provider_be.c:1284
The problem is that ctx->srv_opts is NULL here (since we've never made a connection to the server before this).
While I was investigating this, I noticed something else as well. This entire IF block is completely useless as written. In the IF block, you set ctx->srv_opts attributes, but then immediately after this block, you replace the contents of ctx->srv_opts with the returned srv_opts from sdap_cli_connect_recv()
My guess here is that the assignments in the IF block were supposed to be setting srv_opts instead of ctx->srv_opts. The only part that confuses me is where you're setting ctx->srv_opts->last_usn to srv_opts->last_usn. I thought we had covered before that if the value of srv_opts->last_usn was ever lower than the stored value, we MUST set it to zero. This is because we know that the USN values are not in line with the ones we have stored, so we have to do a full update to be safe.
I'm attaching a patch that I think will resolve the issue (though I can't test it as I don't have access to a server with USN values). At the very least, it resolves the crash above for me.
Ack
On Wed, 2011-04-27 at 17:45 +0200, Jakub Hrozek wrote:
On 04/27/2011 03:30 PM, Stephen Gallagher wrote:
On Tue, 2011-04-19 at 14:57 -0400, Stephen Gallagher wrote:
On Tue, 2011-04-12 at 12:33 +0200, Jan Zelený wrote:
Can someone please take a look at the patch and ack that all issues discovered by Simo have been corrected?
Also note that in the original email there is another patch, which hasn't been reviewed yet.
Ack to both patches and pushed to master.
Unfortunately, I've now discovered several issues with this patch (too late).
First, it causes a crash if SSSD started up in offline mode (e.g. disconnected laptop) and then goes online (e.g. VPN).
#0 0x00007fec7adddd28 in sdap_check_online_done (req=0x0) at ../src/providers/ldap/ldap_id.c:729 #1 0x00007fec7ae0f3e1 in sdap_cli_auth_done (subreq=0x0) at ../src/providers/ldap/sdap_async_connection.c:1430 #2 0x00007fec7ae0e207 in sdap_auth_done (subreq=0x1ea9570) at ../src/providers/ldap/sdap_async_connection.c:1007 #3 0x00007fec7ae0c8a9 in simple_bind_done (op=0x1ea9880, reply=0x1ea97f0, error=0, pvt=0x1ea9570) at ../src/providers/ldap/sdap_async_connection.c:540 #4 0x00007fec7adf4aa9 in sdap_process_message (ev=0x1e74470, sh=0x1eb2c70, msg=0x1eb33e0) at ../src/providers/ldap/sdap_async.c:364 #5 0x00007fec7adf4363 in sdap_process_result (ev=0x1e74470, pvt=0x1eb2c70) at ../src/providers/ldap/sdap_async.c:207 #6 0x00007fec7adf3db6 in sdap_ldap_result (ev=0x1e74470, fde=0x1eb3140, flags=1, pvt=0x1eb2c70) at ../src/providers/ldap/sdap_async.c:152 #7 0x000000389ee05d57 in ?? () from /usr/lib64/libtevent.so.0 #8 0x000000389ee035b0 in _tevent_loop_once () from /usr/lib64/libtevent.so.0 #9 0x000000389ee0373b in tevent_common_loop_wait () from /usr/lib64/libtevent.so.0 #10 0x000000000043d41f in server_loop (main_ctx=0x1e755e0) at ../src/util/server.c:526 #11 0x000000000040f8d2 in main (argc=6, argv=0x7fffbb9c67a8) at ../src/providers/data_provider_be.c:1284
The problem is that ctx->srv_opts is NULL here (since we've never made a connection to the server before this).
While I was investigating this, I noticed something else as well. This entire IF block is completely useless as written. In the IF block, you set ctx->srv_opts attributes, but then immediately after this block, you replace the contents of ctx->srv_opts with the returned srv_opts from sdap_cli_connect_recv()
My guess here is that the assignments in the IF block were supposed to be setting srv_opts instead of ctx->srv_opts. The only part that confuses me is where you're setting ctx->srv_opts->last_usn to srv_opts->last_usn. I thought we had covered before that if the value of srv_opts->last_usn was ever lower than the stored value, we MUST set it to zero. This is because we know that the USN values are not in line with the ones we have stored, so we have to do a full update to be safe.
I'm attaching a patch that I think will resolve the issue (though I can't test it as I don't have access to a server with USN values). At the very least, it resolves the crash above for me.
Ack
I'm going to hold off pushing this until I get buy-in from Jan.
On Wed, 2011-04-27 at 17:45 +0200, Jakub Hrozek wrote:
On 04/27/2011 03:30 PM, Stephen Gallagher wrote:
On Tue, 2011-04-19 at 14:57 -0400, Stephen Gallagher wrote:
On Tue, 2011-04-12 at 12:33 +0200, Jan Zelený wrote:
Can someone please take a look at the patch and ack that all issues discovered by Simo have been corrected?
Also note that in the original email there is another patch, which hasn't been reviewed yet.
Ack to both patches and pushed to master.
Unfortunately, I've now discovered several issues with this patch (too late).
First, it causes a crash if SSSD started up in offline mode (e.g. disconnected laptop) and then goes online (e.g. VPN).
#0 0x00007fec7adddd28 in sdap_check_online_done (req=0x0) at ../src/providers/ldap/ldap_id.c:729 #1 0x00007fec7ae0f3e1 in sdap_cli_auth_done (subreq=0x0) at ../src/providers/ldap/sdap_async_connection.c:1430 #2 0x00007fec7ae0e207 in sdap_auth_done (subreq=0x1ea9570) at ../src/providers/ldap/sdap_async_connection.c:1007 #3 0x00007fec7ae0c8a9 in simple_bind_done (op=0x1ea9880, reply=0x1ea97f0, error=0, pvt=0x1ea9570) at ../src/providers/ldap/sdap_async_connection.c:540 #4 0x00007fec7adf4aa9 in sdap_process_message (ev=0x1e74470, sh=0x1eb2c70, msg=0x1eb33e0) at ../src/providers/ldap/sdap_async.c:364 #5 0x00007fec7adf4363 in sdap_process_result (ev=0x1e74470, pvt=0x1eb2c70) at ../src/providers/ldap/sdap_async.c:207 #6 0x00007fec7adf3db6 in sdap_ldap_result (ev=0x1e74470, fde=0x1eb3140, flags=1, pvt=0x1eb2c70) at ../src/providers/ldap/sdap_async.c:152 #7 0x000000389ee05d57 in ?? () from /usr/lib64/libtevent.so.0 #8 0x000000389ee035b0 in _tevent_loop_once () from /usr/lib64/libtevent.so.0 #9 0x000000389ee0373b in tevent_common_loop_wait () from /usr/lib64/libtevent.so.0 #10 0x000000000043d41f in server_loop (main_ctx=0x1e755e0) at ../src/util/server.c:526 #11 0x000000000040f8d2 in main (argc=6, argv=0x7fffbb9c67a8) at ../src/providers/data_provider_be.c:1284
The problem is that ctx->srv_opts is NULL here (since we've never made a connection to the server before this).
While I was investigating this, I noticed something else as well. This entire IF block is completely useless as written. In the IF block, you set ctx->srv_opts attributes, but then immediately after this block, you replace the contents of ctx->srv_opts with the returned srv_opts from sdap_cli_connect_recv()
My guess here is that the assignments in the IF block were supposed to be setting srv_opts instead of ctx->srv_opts. The only part that confuses me is where you're setting ctx->srv_opts->last_usn to srv_opts->last_usn. I thought we had covered before that if the value of srv_opts->last_usn was ever lower than the stored value, we MUST set it to zero. This is because we know that the USN values are not in line with the ones we have stored, so we have to do a full update to be safe.
I'm attaching a patch that I think will resolve the issue (though I can't test it as I don't have access to a server with USN values). At the very least, it resolves the crash above for me.
Ack
I'm going to hold off pushing this until I get buy-in from Jan.
Thanks for catching this.
One question though: did you test reconnection to the same server with reset USN? I'm sure it will work in terms of refreshing the local database, I'm just not entirely certain that the srv_opts->last_usn will be set to current value later in the code (which would mean the next reconnection to the same server wouldn't start refreshing the data). If the test passes, consider the patch Acked.
I can test this scenario (reconnecting to the same server at least twice, both times with USN lower than registered in SSSD) myself, but not until this evening, I have busy schedule during the day.
Thanks Jan
On Thu, 2011-04-28 at 08:07 +0200, Jan Zelený wrote:
On Wed, 2011-04-27 at 17:45 +0200, Jakub Hrozek wrote:
On 04/27/2011 03:30 PM, Stephen Gallagher wrote:
On Tue, 2011-04-19 at 14:57 -0400, Stephen Gallagher wrote:
On Tue, 2011-04-12 at 12:33 +0200, Jan Zelený wrote:
Can someone please take a look at the patch and ack that all issues discovered by Simo have been corrected?
Also note that in the original email there is another patch, which hasn't been reviewed yet.
Ack to both patches and pushed to master.
Unfortunately, I've now discovered several issues with this patch (too late).
First, it causes a crash if SSSD started up in offline mode (e.g. disconnected laptop) and then goes online (e.g. VPN).
#0 0x00007fec7adddd28 in sdap_check_online_done (req=0x0) at ../src/providers/ldap/ldap_id.c:729 #1 0x00007fec7ae0f3e1 in sdap_cli_auth_done (subreq=0x0) at ../src/providers/ldap/sdap_async_connection.c:1430 #2 0x00007fec7ae0e207 in sdap_auth_done (subreq=0x1ea9570) at ../src/providers/ldap/sdap_async_connection.c:1007 #3 0x00007fec7ae0c8a9 in simple_bind_done (op=0x1ea9880, reply=0x1ea97f0, error=0, pvt=0x1ea9570) at ../src/providers/ldap/sdap_async_connection.c:540 #4 0x00007fec7adf4aa9 in sdap_process_message (ev=0x1e74470, sh=0x1eb2c70, msg=0x1eb33e0) at ../src/providers/ldap/sdap_async.c:364 #5 0x00007fec7adf4363 in sdap_process_result (ev=0x1e74470, pvt=0x1eb2c70) at ../src/providers/ldap/sdap_async.c:207 #6 0x00007fec7adf3db6 in sdap_ldap_result (ev=0x1e74470, fde=0x1eb3140, flags=1, pvt=0x1eb2c70) at ../src/providers/ldap/sdap_async.c:152 #7 0x000000389ee05d57 in ?? () from /usr/lib64/libtevent.so.0 #8 0x000000389ee035b0 in _tevent_loop_once () from /usr/lib64/libtevent.so.0 #9 0x000000389ee0373b in tevent_common_loop_wait () from /usr/lib64/libtevent.so.0 #10 0x000000000043d41f in server_loop (main_ctx=0x1e755e0) at ../src/util/server.c:526 #11 0x000000000040f8d2 in main (argc=6, argv=0x7fffbb9c67a8) at ../src/providers/data_provider_be.c:1284
The problem is that ctx->srv_opts is NULL here (since we've never made a connection to the server before this).
While I was investigating this, I noticed something else as well. This entire IF block is completely useless as written. In the IF block, you set ctx->srv_opts attributes, but then immediately after this block, you replace the contents of ctx->srv_opts with the returned srv_opts from sdap_cli_connect_recv()
My guess here is that the assignments in the IF block were supposed to be setting srv_opts instead of ctx->srv_opts. The only part that confuses me is where you're setting ctx->srv_opts->last_usn to srv_opts->last_usn. I thought we had covered before that if the value of srv_opts->last_usn was ever lower than the stored value, we MUST set it to zero. This is because we know that the USN values are not in line with the ones we have stored, so we have to do a full update to be safe.
I'm attaching a patch that I think will resolve the issue (though I can't test it as I don't have access to a server with USN values). At the very least, it resolves the crash above for me.
Ack
I'm going to hold off pushing this until I get buy-in from Jan.
Thanks for catching this.
One question though: did you test reconnection to the same server with reset USN? I'm sure it will work in terms of refreshing the local database, I'm just not entirely certain that the srv_opts->last_usn will be set to current value later in the code (which would mean the next reconnection to the same server wouldn't start refreshing the data). If the test passes, consider the patch Acked.
If the local last_usn value is *lower* than what's on the server it is fine, nothing bad will happen and it is expected to happen from time to time as we read usn values only from user/group objects, but an admin may have modified objects we never pull.
Not sure what you mean about the next reconnection. Unless the last_usn on the (same) server is lower than we have on record we do *not* want a full refresh. We need a full refresh only if the server is different (this in all cases independetly of the valu of last_usn) or if the same server shows a lower value.
I can test this scenario (reconnecting to the same server at least twice, both times with USN lower than registered in SSSD) myself, but not until this evening, I have busy schedule during the day.
thanks, Simo.
Thanks for catching this.
One question though: did you test reconnection to the same server with reset USN? I'm sure it will work in terms of refreshing the local database, I'm just not entirely certain that the srv_opts->last_usn will be set to current value later in the code (which would mean the next reconnection to the same server wouldn't start refreshing the data). If the test passes, consider the patch Acked.
If the local last_usn value is *lower* than what's on the server it is fine, nothing bad will happen and it is expected to happen from time to time as we read usn values only from user/group objects, but an admin may have modified objects we never pull.
Not sure what you mean about the next reconnection. Unless the last_usn on the (same) server is lower than we have on record we do *not* want a full refresh. We need a full refresh only if the server is different (this in all cases independetly of the valu of last_usn) or if the same server shows a lower value.
I think you misunderstood me. My concern about Stephen's patch was a little different - I wasn't sure if the last_usn he sets to 0 shouldn't be left as is (it is not set anywhere else, which would leave it 0 and if reconnecting to the same server again, the need for full refresh won't be even detected).
Looking at my original code, I see the same thing in sdap_id_op.c, including similar error Stephen fixed by his patch (setting current_srv_opts and rewriting them later). Let me do some testing and get back to you.
Thanks Jan
On Thu, 2011-04-28 at 08:07 +0200, Jan Zelený wrote:
On Wed, 2011-04-27 at 17:45 +0200, Jakub Hrozek wrote:
On 04/27/2011 03:30 PM, Stephen Gallagher wrote:
On Tue, 2011-04-19 at 14:57 -0400, Stephen Gallagher wrote:
On Tue, 2011-04-12 at 12:33 +0200, Jan Zelený wrote:
Can someone please take a look at the patch and ack that all issues discovered by Simo have been corrected?
Also note that in the original email there is another patch, which hasn't been reviewed yet.
Ack to both patches and pushed to master.
Unfortunately, I've now discovered several issues with this patch (too late).
First, it causes a crash if SSSD started up in offline mode (e.g. disconnected laptop) and then goes online (e.g. VPN).
#0 0x00007fec7adddd28 in sdap_check_online_done (req=0x0) at ../src/providers/ldap/ldap_id.c:729 #1 0x00007fec7ae0f3e1 in sdap_cli_auth_done (subreq=0x0) at ../src/providers/ldap/sdap_async_connection.c:1430 #2 0x00007fec7ae0e207 in sdap_auth_done (subreq=0x1ea9570) at ../src/providers/ldap/sdap_async_connection.c:1007 #3 0x00007fec7ae0c8a9 in simple_bind_done (op=0x1ea9880, reply=0x1ea97f0, error=0, pvt=0x1ea9570) at ../src/providers/ldap/sdap_async_connection.c:540 #4 0x00007fec7adf4aa9 in sdap_process_message (ev=0x1e74470, sh=0x1eb2c70, msg=0x1eb33e0) at ../src/providers/ldap/sdap_async.c:364 #5 0x00007fec7adf4363 in sdap_process_result (ev=0x1e74470, pvt=0x1eb2c70) at ../src/providers/ldap/sdap_async.c:207 #6 0x00007fec7adf3db6 in sdap_ldap_result (ev=0x1e74470, fde=0x1eb3140, flags=1, pvt=0x1eb2c70) at ../src/providers/ldap/sdap_async.c:152 #7 0x000000389ee05d57 in ?? () from /usr/lib64/libtevent.so.0 #8 0x000000389ee035b0 in _tevent_loop_once () from /usr/lib64/libtevent.so.0 #9 0x000000389ee0373b in tevent_common_loop_wait () from /usr/lib64/libtevent.so.0 #10 0x000000000043d41f in server_loop (main_ctx=0x1e755e0) at ../src/util/server.c:526 #11 0x000000000040f8d2 in main (argc=6, argv=0x7fffbb9c67a8) at ../src/providers/data_provider_be.c:1284
The problem is that ctx->srv_opts is NULL here (since we've never made a connection to the server before this).
While I was investigating this, I noticed something else as well. This entire IF block is completely useless as written. In the IF block, you set ctx->srv_opts attributes, but then immediately after this block, you replace the contents of ctx->srv_opts with the returned srv_opts from sdap_cli_connect_recv()
My guess here is that the assignments in the IF block were supposed to be setting srv_opts instead of ctx->srv_opts. The only part that confuses me is where you're setting ctx->srv_opts->last_usn to srv_opts->last_usn. I thought we had covered before that if the value of srv_opts->last_usn was ever lower than the stored value, we MUST set it to zero. This is because we know that the USN values are not in line with the ones we have stored, so we have to do a full update to be safe.
I'm attaching a patch that I think will resolve the issue (though I can't test it as I don't have access to a server with USN values). At the very least, it resolves the crash above for me.
Ack
I'm going to hold off pushing this until I get buy-in from Jan.
Thanks for catching this.
One question though: did you test reconnection to the same server with reset USN? I'm sure it will work in terms of refreshing the local database, I'm just not entirely certain that the srv_opts->last_usn will be set to current value later in the code (which would mean the next reconnection to the same server wouldn't start refreshing the data). If the test passes, consider the patch Acked.
I haven't tested it at all. As you can see in my original email, I don't have a setup to test with. I'm also not entirely sure I covered all my bases here (hence why I need your input).
I can test this scenario (reconnecting to the same server at least twice, both times with USN lower than registered in SSSD) myself, but not until this evening, I have busy schedule during the day.
Please do.
On Wed, 2011-04-27 at 17:45 +0200, Jakub Hrozek wrote:
On 04/27/2011 03:30 PM, Stephen Gallagher wrote:
On Tue, 2011-04-19 at 14:57 -0400, Stephen Gallagher wrote:
On Tue, 2011-04-12 at 12:33 +0200, Jan Zelený wrote:
Can someone please take a look at the patch and ack that all issues discovered by Simo have been corrected?
Also note that in the original email there is another patch, which hasn't been reviewed yet.
Ack to both patches and pushed to master.
Unfortunately, I've now discovered several issues with this patch (too late).
First, it causes a crash if SSSD started up in offline mode (e.g. disconnected laptop) and then goes online (e.g. VPN).
#0 0x00007fec7adddd28 in sdap_check_online_done (req=0x0) at ../src/providers/ldap/ldap_id.c:729 #1 0x00007fec7ae0f3e1 in sdap_cli_auth_done (subreq=0x0) at ../src/providers/ldap/sdap_async_connection.c:1430 #2 0x00007fec7ae0e207 in sdap_auth_done (subreq=0x1ea9570) at ../src/providers/ldap/sdap_async_connection.c:1007 #3 0x00007fec7ae0c8a9 in simple_bind_done (op=0x1ea9880, reply=0x1ea97f0, error=0, pvt=0x1ea9570) at ../src/providers/ldap/sdap_async_connection.c:540 #4 0x00007fec7adf4aa9 in sdap_process_message (ev=0x1e74470, sh=0x1eb2c70, msg=0x1eb33e0) at ../src/providers/ldap/sdap_async.c:364 #5 0x00007fec7adf4363 in sdap_process_result (ev=0x1e74470, pvt=0x1eb2c70) at ../src/providers/ldap/sdap_async.c:207 #6 0x00007fec7adf3db6 in sdap_ldap_result (ev=0x1e74470, fde=0x1eb3140, flags=1, pvt=0x1eb2c70) at ../src/providers/ldap/sdap_async.c:152 #7 0x000000389ee05d57 in ?? () from /usr/lib64/libtevent.so.0 #8 0x000000389ee035b0 in _tevent_loop_once () from /usr/lib64/libtevent.so.0 #9 0x000000389ee0373b in tevent_common_loop_wait () from /usr/lib64/libtevent.so.0 #10 0x000000000043d41f in server_loop (main_ctx=0x1e755e0) at ../src/util/server.c:526 #11 0x000000000040f8d2 in main (argc=6, argv=0x7fffbb9c67a8) at ../src/providers/data_provider_be.c:1284
The problem is that ctx->srv_opts is NULL here (since we've never made a connection to the server before this).
While I was investigating this, I noticed something else as well. This entire IF block is completely useless as written. In the IF block, you set ctx->srv_opts attributes, but then immediately after this block, you replace the contents of ctx->srv_opts with the returned srv_opts from sdap_cli_connect_recv()
My guess here is that the assignments in the IF block were supposed to be setting srv_opts instead of ctx->srv_opts. The only part that confuses me is where you're setting ctx->srv_opts->last_usn to srv_opts->last_usn. I thought we had covered before that if the value of srv_opts->last_usn was ever lower than the stored value, we MUST set it to zero. This is because we know that the USN values are not in line with the ones we have stored, so we have to do a full update to be safe.
I'm attaching a patch that I think will resolve the issue (though I can't test it as I don't have access to a server with USN values). At the very least, it resolves the crash above for me.
Ack
I'm going to hold off pushing this until I get buy-in from Jan.
I'm sending my version of the correction. I already tested it and it seems to work for me. Stephen, can you run your test, which caused the crash?
This patch already contains handling of the situation that came to my mind today. See commit message for some details.
Jan
On Fri, 2011-04-29 at 15:53 +0200, Jan Zelený wrote:
On Wed, 2011-04-27 at 17:45 +0200, Jakub Hrozek wrote:
On 04/27/2011 03:30 PM, Stephen Gallagher wrote:
On Tue, 2011-04-19 at 14:57 -0400, Stephen Gallagher wrote:
On Tue, 2011-04-12 at 12:33 +0200, Jan Zelený wrote:
Can someone please take a look at the patch and ack that all issues discovered by Simo have been corrected?
Also note that in the original email there is another patch, which hasn't been reviewed yet.
Ack to both patches and pushed to master.
Unfortunately, I've now discovered several issues with this patch (too late).
First, it causes a crash if SSSD started up in offline mode (e.g. disconnected laptop) and then goes online (e.g. VPN).
#0 0x00007fec7adddd28 in sdap_check_online_done (req=0x0) at ../src/providers/ldap/ldap_id.c:729 #1 0x00007fec7ae0f3e1 in sdap_cli_auth_done (subreq=0x0) at ../src/providers/ldap/sdap_async_connection.c:1430 #2 0x00007fec7ae0e207 in sdap_auth_done (subreq=0x1ea9570) at ../src/providers/ldap/sdap_async_connection.c:1007 #3 0x00007fec7ae0c8a9 in simple_bind_done (op=0x1ea9880, reply=0x1ea97f0, error=0, pvt=0x1ea9570) at ../src/providers/ldap/sdap_async_connection.c:540 #4 0x00007fec7adf4aa9 in sdap_process_message (ev=0x1e74470, sh=0x1eb2c70, msg=0x1eb33e0) at ../src/providers/ldap/sdap_async.c:364 #5 0x00007fec7adf4363 in sdap_process_result (ev=0x1e74470, pvt=0x1eb2c70) at ../src/providers/ldap/sdap_async.c:207 #6 0x00007fec7adf3db6 in sdap_ldap_result (ev=0x1e74470, fde=0x1eb3140, flags=1, pvt=0x1eb2c70) at ../src/providers/ldap/sdap_async.c:152 #7 0x000000389ee05d57 in ?? () from /usr/lib64/libtevent.so.0 #8 0x000000389ee035b0 in _tevent_loop_once () from /usr/lib64/libtevent.so.0 #9 0x000000389ee0373b in tevent_common_loop_wait () from /usr/lib64/libtevent.so.0 #10 0x000000000043d41f in server_loop (main_ctx=0x1e755e0) at ../src/util/server.c:526 #11 0x000000000040f8d2 in main (argc=6, argv=0x7fffbb9c67a8) at ../src/providers/data_provider_be.c:1284
The problem is that ctx->srv_opts is NULL here (since we've never made a connection to the server before this).
While I was investigating this, I noticed something else as well. This entire IF block is completely useless as written. In the IF block, you set ctx->srv_opts attributes, but then immediately after this block, you replace the contents of ctx->srv_opts with the returned srv_opts from sdap_cli_connect_recv()
My guess here is that the assignments in the IF block were supposed to be setting srv_opts instead of ctx->srv_opts. The only part that confuses me is where you're setting ctx->srv_opts->last_usn to srv_opts->last_usn. I thought we had covered before that if the value of srv_opts->last_usn was ever lower than the stored value, we MUST set it to zero. This is because we know that the USN values are not in line with the ones we have stored, so we have to do a full update to be safe.
I'm attaching a patch that I think will resolve the issue (though I can't test it as I don't have access to a server with USN values). At the very least, it resolves the crash above for me.
Ack
I'm going to hold off pushing this until I get buy-in from Jan.
I'm sending my version of the correction. I already tested it and it seems to work for me. Stephen, can you run your test, which caused the crash?
This patch already contains handling of the situation that came to my mind today. See commit message for some details.
Jan
Ack
On Wed, 2011-05-04 at 09:10 -0400, Stephen Gallagher wrote:
On Fri, 2011-04-29 at 15:53 +0200, Jan Zelený wrote:
I'm sending my version of the correction. I already tested it and it seems to work for me. Stephen, can you run your test, which caused the crash?
This patch already contains handling of the situation that came to my mind today. See commit message for some details.
Jan
Ack
Pushed to master
sssd-devel@lists.fedorahosted.org