On Thu, Apr 23, 2015 at 07:29:07AM -0400, Stephen Gallagher wrote:
> On Thu, 2015-04-23 at 08:14 +0200, Lukas Slebodnik wrote:
> > On (20/04/15 14:38), Stephen Gallagher wrote:
> > > On Mon, 2015-04-20 at 08:53 +0200, Lukas Slebodnik wrote:
> > > > ehlo,
> > > >
> > > > attached patch fixes crash in
> > > >
https://fedorahosted.org/sssd/ticket/2629
> > > >
> > >
> > >
> > > Nack. I'd rather we fixed the root of this problem. I did some
> > > digging
> > > this afternoon and tracked the issue back to ad_gpo.c line 3499 (in
> > > current master). If we get back a NULL result or num_results == 0,
> > > then we just skip over this item in the list and start processing
> > > the
> > > next one. Unfortunately, that leaves an item in the candidate_gpos
> > > list that was never properly constructed.
> > >
> > You are right.
> >
> > We got a referral to GPO and therefore we do not find any attributes.
> >
> > [sdap_sd_search_send] (0x0400): Searching entry [cn={2BA15B73-9524-
> > 419F-B4B7-185E1F0D3DCF},cn=policies,cn=system,DC=example,DC=com]
> > using SD
> > [sdap_print_server] (0x2000): Searching 10.1.1.14
> > [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with
> > [(objectclass=*)][cn={2BA15B73-9524-419F-B4B7-
> > 185E1F0D3DCF},cn=policies,cn=system,DC=example,DC=com].
> > [sdap_get_generic_ext_step] (0x1000): Requesting attrs:
> > [nTSecurityDescriptor]
> > [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [cn]
> > [sdap_get_generic_ext_step] (0x1000): Requesting attrs:
> > [gPCFileSysPath]
> > [sdap_get_generic_ext_step] (0x1000): Requesting attrs:
> > [gPCMachineExtensionNames]
> > [sdap_get_generic_ext_step] (0x1000): Requesting attrs:
> > [gPCFunctionalityVersion]
> > [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [flags]
> > [sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid =
> > 14
> > [sdap_process_result] (0x2000): Trace: sh[0x7f5d409a8c60],
> > connected[1], ops[0x7f5d409d2c10], ldap[0x7f5d409a9ed0]
> > [sdap_process_result] (0x2000): Trace: ldap_result found nothing!
> > [sdap_process_result] (0x2000): Trace: sh[0x7f5d409a8c60],
> > connected[1], ops[0x7f5d409d2c10], ldap[0x7f5d409a9ed0]
> > [sdap_process_message] (0x4000): Message type:
> > [LDAP_RES_SEARCH_RESULT]
> > [sdap_get_generic_op_finished] (0x0400): Search result:
> > Referral(10), 0000202B: RefErr: DSID-0310063C, data 0, 1 access
> > points
> > ref 1: 'lzb.hq'
> >
> > [sdap_get_generic_op_finished] (0x1000): Ref:
> > ldap://lzb.hq/cn=%7B2BA15B73-9524-419F-B4B7-
> > 185E1F0D3DCF%7D,cn=policies,cn=system,DC=example,DC=com
> > [ad_gpo_get_gpo_attrs_done] (0x0040): no attrs found for GPO; try
> > next GPO.
> >
> > > Under what circumstances can the secinfo_dacl search return success
> > > but with zero results? Is there a bug or a race here (such as the
> > > AD
> > > server has updated the GPO since we got the list of candidate
> > > GPOs?).
> > > How best to handle this?
> > >
> > > With your patch here, it looks like we're assuming that it's okay
> > > to
> > > just skip over this GPO. If that's the case, then what we really
> > > need
> > > to be doing in ad_gpo_get_gpo_attrs_step() is to mark the gp_gpo as
> > > being invalid and then after we've gone through them all, shrink
> > > the
> > > array, removing all of the invalid entries. This will be more
> > > future-
> > > proof, as it's not just the gpo_sd that is uninitialized here.
> > > Every
> > > member of this gp_gpo is NULL except for the DN.
> > >
> > > If it's *not* okay that we've gotten no results for this lookup
> > > (such
> > > as in the race case; we don't want to be skipping over a GPO that
> > > might properly be denying users), we may need to restart
> > > processing at
> > > least a couple times to try to avoid the race and go offline if we
> > > can't complete the processing (so we at least stick to our cached
> > > rules).
> >
> > I'm sorry I didn't noticed it in log files the first time.
> > Now we know the root of problem.
> > Which of your proposal do you prefer now?
> >
> > LS
>
> OK, I hadn't considered the referral scenario. (I wasn't actually
> aware that GPOs could *be* referred to elsewhere). Is this referral
> pointing to a different machine than the host is enrolled to? It's
> trying to reach a DN at lzb.hq that is identical to the one we just
> requested.
>
> I'd really like to know how to reproduce the setup for this situation;
> is it inheriting a GPO from a parent domain somehow?
>
> Neither of my above proposals are a proper solution for this case,
> unfortunately. The first one would make the crash go away, but if this
> GPO really exists and is restrictive, not processing it is *really
> bad*. Right now we're actually slightly lucky in that the crash has a
> net result of causing an access denial rather than erroneously
> resulting in a user login, but if we were to "solve" this problem the
> way Lukas's original patch intended (or my first alternative above),
> it would be introducing a security issue. So let's not do either of
> those.
>
> The second option I suggested there is invalid as well, since the
> entry *exists*, it's just been converted to a referral. There are
> therefore only two safe options at this time:
>
> 1) Quick-and-ugly: If we encounter a referral, skip straight to denial
> and loudly warn in the logs that this is a situation we are unable to
> handle yet. This will avoid the crash, but will leave the GPO behavior
> equivalently bad to the current state. Might be acceptable if this is
> only happening on a very small subset of systems.
I'm afraid we need to go this way now.
> Users experiencing
> this might be able to set "ldap_referrals = True" on the affected
> systems to quietly solve the problem at the expense of all the other
> side-effects to that (particularly performance)
I wouldn't even suggest enabling referrals. I've seen the LDAP provider
misbehave completely with referrals enabled, openldap-libs is not really
great at chasing them and LDAP provider would go offline etc..
So enabling referrals doesn't just degrade performance, but also
functionality..
I also tested that using Global Catalog wouldn't help here, some
attributes we require are not replicated to GC, such as gPCFileSysPath.
Are you sure we're using the right connection to the right domain? We
can select the right connection based on the DN that we know.
I'm sorry, but I couldn't reproduce the bug myself.
We agreed to do the "Quick-and-ugly" way for 1.12 + documentation
but he mentioned that it is walid use-case and it would be good to
implement it in next release.
LS