On Thu, Jan 31, 2013 at 11:32:21AM -0500, Simo Sorce wrote:
On Thu, 2013-01-31 at 16:40 +0100, Sumit Bose wrote:
> On Thu, Jan 31, 2013 at 09:43:09AM -0500, Simo Sorce wrote:
> > On Thu, 2013-01-31 at 10:49 +0100, Sumit Bose wrote:
> > > Hi,
> > >
> > > I have created a design page for
> > >
https://fedorahosted.org/sssd/ticket/1032 "[RFE] sssd should support
DNS
> > > sites" at
> > >
https://fedorahosted.org/sssd/wiki/DesignDocs/ActiveDirectoryDNSSites .
> > > It can be found below as well.
> > >
> > > Corrections, comments and enhancements are welcome.
> >
> > Good start, comments inline!
> >
>
>
> > > struct tevent_req *resolv_getsrv_site_send(TALLOC_CTX *mem_ctx,
> > > struct tevent_context *ev,
> > > struct resolv_ctx *ctx,
> > > struct site_ctx *site_ctx,
> > > const char *service,
> > > const char *domain);
> >
> > I do not think you should have a site_ctx and a resolv_ctx, one or the
> > other.
> > If you added site_ctx with the idea of calling this function recursively
> > then I would object to that, you should have a higher level internal
> > public function that does the whole processing and an internal private
> > one that is called multiple times and conceal internal requirements.
>
> no, I didn't plan to use it recursively. The idea was to have both of
> the functions you suggested in one. If site_ctx is NULL,
> resolv_getsrv_site_send() will do get_site and then srv_by_site. The
> found site is returned with the corresponding _recv() call so that in
> later lookups the get_site step can be skipped.
>
> If the provider goes offline the saved site_ctx will be deleted so that
> the get_site step is done again when the provider goes online again.
>
> Instead of using a plain char * for the site I took a struct to safe
> e.g. the nearest site, if available, to have a better fallback step then
> just Default-First-Site-Name.
I see your point but my concern is with site_ctx being 'magic' in that
you can pass NULL at any time to cause a new site discovery and you are
supposed to keep it around, but then offline operation may delete it.
I see a clear risk that at some point you will have a dangling pointer
to a since deleted site_ctx.
I am thinking we should probably have a more permanent context that has
setters to invalidate their internal state when offline happens.
This way the pointer is always valid and can be passed around safely and
store on some long term upper context.
Btw while looking at this is seem that the resolv_ctx could be this long
term context already (so no need for explicit site_ctx, just add a sub
struct in resolv_ctx), and is initialized in data_provider_fo.c in
be_init_failover() which is called from be_process_init().
I was more thinking of making this a member of the main IPA and AD
provider context. From my point of view resolv_ctx is for the low level
DNS lookups while the site concept is specific for the AD.
Unfortunately for some reason resolv_init() is also called by
sdap_sudo_hostname_send() and ipa_dydndns_init(), I can't understand
why, bug ?
> >
> > Also I do not know that we really need to specify the service.
>
> I thought it is all about finding a specific service in a site?
This statement was meant in the context of splitting your function in 2,
one to find the site (where service need not be specified) and then one
function to find the actual service.
But if we conceal site discovery into resolv_ctx, we can simply avoid a
new separate interface altogether.
We simply add site support to the resolver library through a module and
let all current calls be 'magically' made site aware.
We might need to change calls that specify SRV searches so that we
separate Service and Domain, but I think that would be a better
architecture as we can abstract away how we do resolve stuff into a
module of the resolver.
This way when we later add an IPA location discovery mechanism we do not
have to change anything in the common code, we just plug in a different
resolver plugin for 'location' discovery for the ipa domain case.
and we can actually mix both as long as we make the 'site/location'
plugin domain bound (hence why we need to split service and domain
arguments in requests so each request can be bound to the correct site
discovery code.
I have to think about it. It certainly would make code reuse much easier
but I wonder this really works in complex setup with multiple IPA and AD
domains.
> >
> > > ==== Finding a DC for the CLDAP ping ====
> > > To find any DC in the domain a Windows client, and samba as well, look
> > > for a _ldap._tcp service record. The only difference is that samba
> > > uses a plain _ldap._tcp.domain.name while a Windows client
> > > _ldap._tcp.Default-First-Site-Name._sites.dc._msdcs.domain.name. I
> > > guess it will fall back to _ldap._tcp.domain.name if the other name
> > > cannot be resolved. I would suggest to use _ldap._tcp.domain.name for
> > > the SSSD implementation.
> >
> > Why ? Using the Default-First-Site-Name one is technically more correct
> > when multiple sites are in use.
>
> Will this return all DCs or only the ones that are not assigned to a
> specific site?
Good question, but if the site exists it is the 'default' one we
should ..well ... default to.
> I just thought _ldap._tcp.domain.name will return a
> larger number of DCs to choose from?
This is actually a problem, you may end up trying to access a DC in
Antractica over a Geosynchronous satellite link. I do not think you want
to ever do that unless you live in Antractica :)
> But I'm not against using the Default-First-Site-Name one here.
I think it should be the default fallback if nothing can be obtained by
a CLDAP ping.
sure, but this is about how to find a DC to send the CLDAP ping to.
> Besides that, you mentioned localized version of
> Default-First-Site-Name. Would this be an argument against using it?
No, it is an argument about not assuming the name, but picking the right
one from the CLDAP ping. If the CLDAP ping times out it is ok to 'try'
this is a step before, can
_ldap._tcp.Default-First-Site-Name._sites.dc._msdcs.domain.name be used
to find a DC to send the CLDAP ping. E.g. according to
http://social.technet.microsoft.com/Forums/da/winserverNIS/thread/2afc3cf...
the name can be changed. So chances are that the above request will not
return anything. So I guess even if the first returned address from a
_ldap._tcp.domain.name request is the DC on the Mars rover, the first
CLDAP ping will timeout and we send a second one to the second address.
It's only a single UPD packet and the response will fit into a single
packet, too
it, worst case we get back an error and further fallback to
searching
SRV records w/o any site component in the name at all.
However I think it is also 'possibly' an argument for having an option
to set what is the site sssd should be bound to. A way to force it.
Just for those cases where sites can be resolved via DNS but CLDAP is
filtered (think VPN access or something).
If an option with a specific site is present, that specific site should
be immediately tried before everything else. If it fails then the usual
discovery mechanism should be used.
I agree that such an option make sense, but I would use it differently.
If you try immediately then chances are that you always talk to your
home site which might be bad for roaming users. I would suggest to use
it as the fallback if the CLDAP ping fails.
bye,
Sumit
Simo.
--
Simo Sorce * Red Hat, Inc * New York
_______________________________________________
sssd-devel mailing list
sssd-devel(a)lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/sssd-devel