-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
[PATCH 1/2] Remove freed server_common entities from list We didn't hit this before as we never removed common entities. When using service requests, we remove the resolved fo_servers when we hit a timeout, so the server_common can be also removed.
[PATCH 2/2] Support SRV servers in failover Adds a new failover API call fo_add_srv_server that allows the caller to specify a server that is later resolved into a list of specific servers using SRV requests.
Also adds a new failover option that specifies how often should the servers resolved from SRV query considered valid until we need a refresh.
The "real" servers to connect to are returned to the user as usual, using the fo_resolve_service_{send,recv} calls.
On Fri, 2010-04-23 at 11:57 +0200, Jakub Hrozek wrote:
[PATCH 1/2] Remove freed server_common entities from list We didn't hit this before as we never removed common entities. When using service requests, we remove the resolved fo_servers when we hit a timeout, so the server_common can be also removed.
Nack.
common = rc_alloc(mem_ctx, struct server_common); if (common == NULL) return NULL; + talloc_set_destructor((TALLOC_CTX *) common, destroy_server_common);
Please set the destructor later in the function, rc_alloc() does not return zeroed memory and common can be freed if talloc_strdup() fails. BTW, this might be a good opportunity to remove all the silly common->something = NULL; statements and instead use a memset().
Another issue: Notice that in the destructor, if there really are some requests pending and we really free the server_common, this might cause the resolve_service_request_destructor() to be called later and try to remove the request from common->request_list. Attached is a small untested patch that might help, please squash it in with your patch. Additionally, change the destructor to not traverse the list, but instead print a warning if it is not empty (to notify us that we have a bug in the code).
Lastly, a small nitpick: Could you please name the server_common destructor "server_common_destructor"? It would be obvious on a first glance that it is a destructor and shouldn't be called directly. This also seems to be the convention for most of the rest of the code.
[PATCH 2/2] Support SRV servers in failover Adds a new failover API call fo_add_srv_server that allows the caller to specify a server that is later resolved into a list of specific servers using SRV requests.
Also adds a new failover option that specifies how often should the servers resolved from SRV query considered valid until we need a refresh.
The "real" servers to connect to are returned to the user as usual, using the fo_resolve_service_{send,recv} calls.
Nack.
In collapse_srv_lookup(): + while (server->prev && server->prev->srv_data == meta->srv_data) { + tmp = server->prev; + DLIST_REMOVE(server->service->server_list, tmp); + talloc_zfree(tmp); + } + while (server->next && server->next->srv_data == meta->srv_data) { + tmp = server->prev; ^^^^ + DLIST_REMOVE(server->service->server_list, tmp); + talloc_zfree(tmp); + }
Looks like a copy&paste error.
Nitpic in fo_add_srv_server(): + server->port = 0; Not really needed since we use talloc_zero().
In the same function, you allocate srv_data, domain, proto and srv from the service context, but I'd expect you to use server for the former and srv_data for the later three. Was this a mistake or am I missing something?
In function fo_resolve_srv_done(): + for (reply = reply_list; reply; reply = reply->next) { + DLIST_FOR_EACH(server, state->service->server_list) { + if (server->port == reply->port) { + ret = EEXIST; + break; + } + } + if (ret == EEXIST) continue;
You should reset ret to EOK before the DLIST_FOR_EACH() macro, otherwise if you mark one as EEXIST, the others will be marked that way as well.
I see some space/tabs mixing in dlinklist.h, plus: + for (tmp = (list2); tmp->next; tmp = tmp->next) This will crash if list2 == NULL.
Another small nitpick: Would you mind not using the fo_* prefix for static functions? I wanted to keep it only for the public functions that will be used by other translation units.
Ah, yet another nitpick: + * The 'srv_retry_timeout' member specifies how log a SRV lookup ^^^ - typo
And lastly, since you added a substantial amount of code, please add you name to the list of authors at the top of the file and add this year :)
Thanks Martin
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
New patches are attached, comments are inline.
On 04/26/2010 05:54 PM, Martin Nagy wrote:
On Fri, 2010-04-23 at 11:57 +0200, Jakub Hrozek wrote:
[PATCH 1/2] Remove freed server_common entities from list We didn't hit this before as we never removed common entities. When using service requests, we remove the resolved fo_servers when we hit a timeout, so the server_common can be also removed.
Nack.
common = rc_alloc(mem_ctx, struct server_common); if (common == NULL) return NULL;
- talloc_set_destructor((TALLOC_CTX *) common,
destroy_server_common);
Please set the destructor later in the function, rc_alloc() does not return zeroed memory and common can be freed if talloc_strdup() fails.
done
BTW, this might be a good opportunity to remove all the silly common->something = NULL; statements and instead use a memset().
That sounds too dangerous on a rc_alloc-ed structure, it might also zero the reference counts, no? I've left these assignments explicitly
Another issue: Notice that in the destructor, if there really are some requests pending and we really free the server_common, this might cause the resolve_service_request_destructor() to be called later and try to remove the request from common->request_list. Attached is a small untested patch that might help, please squash it in with your patch. Additionally, change the destructor to not traverse the list, but instead print a warning if it is not empty (to notify us that we have a bug in the code).
I agree with the solution. This is very difficult to test (a SRV records collapsing at the same time another request to a server with the same name is being resolved), but I have included your patch in the revised version and ran some smoketests.
Lastly, a small nitpick: Could you please name the server_common destructor "server_common_destructor"? It would be obvious on a first glance that it is a destructor and shouldn't be called directly. This also seems to be the convention for most of the rest of the code.
OK, I was trying to keep up with the ${action}_server_common scheme but I agree that destructor is more descriptive
[PATCH 2/2] Support SRV servers in failover Adds a new failover API call fo_add_srv_server that allows the caller to specify a server that is later resolved into a list of specific servers using SRV requests.
Also adds a new failover option that specifies how often should the servers resolved from SRV query considered valid until we need a refresh.
The "real" servers to connect to are returned to the user as usual, using the fo_resolve_service_{send,recv} calls.
Nack.
In collapse_srv_lookup():
while (server->prev && server->prev->srv_data ==meta->srv_data) {
tmp = server->prev;DLIST_REMOVE(server->service->server_list, tmp);talloc_zfree(tmp);}while (server->next && server->next->srv_data ==meta->srv_data) {
tmp = server->prev; ^^^^DLIST_REMOVE(server->service->server_list, tmp);talloc_zfree(tmp);}Looks like a copy&paste error.
Yep, this was a bug.
Nitpic in fo_add_srv_server():
- server->port = 0;
Not really needed since we use talloc_zero().
done
In the same function, you allocate srv_data, domain, proto and srv from the service context, but I'd expect you to use server for the former and srv_data for the later three. Was this a mistake or am I missing something?
ok, allocating them on top of srv_data makes more sense.
In function fo_resolve_srv_done():
- for (reply = reply_list; reply; reply = reply->next) {
DLIST_FOR_EACH(server, state->service->server_list) {if (server->port == reply->port) {ret = EEXIST;break;}}if (ret == EEXIST) continue;You should reset ret to EOK before the DLIST_FOR_EACH() macro, otherwise if you mark one as EEXIST, the others will be marked that way as well.
nice catch, thanks
I see some space/tabs mixing in dlinklist.h, plus:
for (tmp = (list2); tmp->next; tmp = tmp->next)This will crash if list2 == NULL.
fixed
Another small nitpick: Would you mind not using the fo_* prefix for static functions? I wanted to keep it only for the public functions that will be used by other translation units.
done
Ah, yet another nitpick:
- The 'srv_retry_timeout' member specifies how log a SRV lookup ^^^ - typo
done
And lastly, since you added a substantial amount of code, please add you name to the list of authors at the top of the file and add this year :)
done
Thanks Martin
Thank you for the review!
On Mon, 2010-04-26 at 23:14 +0200, Jakub Hrozek wrote:
New patches are attached, comments are inline.
On 04/26/2010 05:54 PM, Martin Nagy wrote:
On Fri, 2010-04-23 at 11:57 +0200, Jakub Hrozek wrote:
[PATCH 1/2] Remove freed server_common entities from list We didn't hit this before as we never removed common entities. When using service requests, we remove the resolved fo_servers when we hit a timeout, so the server_common can be also removed.
Nack.
BTW, this might be a good opportunity to remove all the silly common->something = NULL; statements and instead use a memset().
That sounds too dangerous on a rc_alloc-ed structure, it might also zero the reference counts, no? I've left these assignments explicitly
Oh, right, didn't realize that.
Ack for 0001.
And lastly, since you added a substantial amount of code, please add you name to the list of authors at the top of the file and add this year :)
done
Uh-huh, I wanted you to add the current year, not to bump it :) Oh well.
Ack for 0002.
Thanks. Martin
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/27/2010 02:48 PM, Martin Nagy wrote:
Ack for 0002.
Please note that this patch requires the "Sort SRV replies according to RFC 2782" patch which is still under review.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/27/2010 02:48 PM, Martin Nagy wrote:
Uh-huh, I wanted you to add the current year, not to bump it :) Oh well.
Ack for 0002.
Thanks. Martin
Stephen found out that the patch did not work with c-ares versions prior to 1.7. The attached patch fixes that.
On 04/27/2010 11:04 AM, Jakub Hrozek wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/27/2010 02:48 PM, Martin Nagy wrote:
Uh-huh, I wanted you to add the current year, not to bump it :) Oh well.
Ack for 0002.
Thanks. Martin
Stephen found out that the patch did not work with c-ares versions prior to 1.7. The attached patch fixes that.
Ack.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/27/2010 06:16 PM, Stephen Gallagher wrote:
On 04/27/2010 11:04 AM, Jakub Hrozek wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/27/2010 02:48 PM, Martin Nagy wrote:
Uh-huh, I wanted you to add the current year, not to bump it :) Oh well.
Ack for 0002.
Thanks. Martin
Stephen found out that the patch did not work with c-ares versions prior to 1.7. The attached patch fixes that.
Ack.
I'm sorry I have realized I forgot to squash in one additional fix that removes spurious backslash at the end of a line that would break a conditional.
Here goes, hopefully this will be the last iteration..
On 04/27/2010 01:10 PM, Jakub Hrozek wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/27/2010 06:16 PM, Stephen Gallagher wrote:
On 04/27/2010 11:04 AM, Jakub Hrozek wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/27/2010 02:48 PM, Martin Nagy wrote:
Uh-huh, I wanted you to add the current year, not to bump it :) Oh well.
Ack for 0002.
Thanks. Martin
Stephen found out that the patch did not work with c-ares versions prior to 1.7. The attached patch fixes that.
Ack.
I'm sorry I have realized I forgot to squash in one additional fix that removes spurious backslash at the end of a line that would break a conditional.
Here goes, hopefully this will be the last iteration..
Ah yeah. I found and squashed that into the patch. I forgot I had done that when I sent the ack. Mea culpa.
Thanks for sending the corrected patch.
On 04/27/2010 01:11 PM, Stephen Gallagher wrote:
On 04/27/2010 01:10 PM, Jakub Hrozek wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/27/2010 06:16 PM, Stephen Gallagher wrote:
On 04/27/2010 11:04 AM, Jakub Hrozek wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/27/2010 02:48 PM, Martin Nagy wrote:
Uh-huh, I wanted you to add the current year, not to bump it :) Oh well.
Ack for 0002.
Thanks. Martin
Stephen found out that the patch did not work with c-ares versions prior to 1.7. The attached patch fixes that.
Ack.
I'm sorry I have realized I forgot to squash in one additional fix that removes spurious backslash at the end of a line that would break a conditional.
Here goes, hopefully this will be the last iteration..
Ah yeah. I found and squashed that into the patch. I forgot I had done that when I sent the ack. Mea culpa.
Thanks for sending the corrected patch.
Pushed both to master and sssd-1-2.
Jakub Hrozek wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
<snip>
[PATCH 2/2] Support SRV servers in failover Adds a new failover API call fo_add_srv_server that allows the caller to specify a server that is later resolved into a list of specific servers using SRV requests.
This looks doc-worthy... Can somebody explain "SRV servers" and how they work here? My limited understanding makes me think I can define _failover._tcp as SRV records in my DNS with weights and priorities, etc. Is that right? How about an example? Has this made it into a man page yet?
Also adds a new failover option that specifies how often should the servers resolved from SRV query considered valid until we need a refresh.
What's the option? I missed it... :(
thanks a bunch~!
The "real" servers to connect to are returned to the user as usual, using the fo_resolve_service_{send,recv} calls.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/30/2010 03:46 AM, David O'Brien wrote:
Jakub Hrozek wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
<snip>
[PATCH 2/2] Support SRV servers in failover Adds a new failover API call fo_add_srv_server that allows the caller to specify a server that is later resolved into a list of specific servers using SRV requests.
Please keep in mind that there is one more patch on the list which actually uses this feature in back ends. These two only add the code that makes it possible for back ends to use service discovery into the failover code.
This looks doc-worthy... Can somebody explain "SRV servers" and how they work here? My limited understanding makes me think I can define _failover._tcp as SRV records in my DNS with weights and priorities, etc. Is that right?
Pretty much, yeah. For every service that you want to use service discovery with, you need to add a special DNS record in the form of "_service._protocol._domain TTL priority weight port hostname" to your DNS server.
There are usually multiple records like this with different priority (for failover) and different weights (for load balancing).
The client then makes a SRV DNS query in the form of "_service._protocol._domain", for example "_ldap._tcp._redhat.com" and gets back a list of host names along with their priorities and weights. The client then sorts this list according to priorities and weights and connects to the first server in this sorted list.
For full (actually quite readable) explanation, please see RFC 2782. The wikipedia entry on "SRV record" is also quite nice and maybe easier to digest.
How about an example? Has this made it into a man page yet?
Some documentation is in the back end integration patch - look for "[PATCH] Use service discovery in backends". If you could review the man page additions, it would be much appreciated!
Also adds a new failover option that specifies how often should the servers resolved from SRV query considered valid until we need a refresh.
What's the option? I missed it... :(
Sorry this was not very clear. This is actually not a new configuration option for sssd, but rather a new member of the fo_options structure.
I think it would be nice to use the TTL value here but unfortunately our resolver does not return the TTL field of the SRV records.
Currently this is hardcoded to 8 hours (value pulled out of thin air..).
thanks a bunch~!
sssd-devel@lists.fedorahosted.org