On Tue, Oct 06, 2015 at 12:54:41PM +0200, Jakub Hrozek wrote:
On Mon, Oct 05, 2015 at 04:46:55PM +0200, Jakub Hrozek wrote:
> On Mon, Oct 05, 2015 at 12:33:48PM +0200, Sumit Bose wrote:
> > On Mon, Oct 05, 2015 at 12:08:26PM +0200, Jakub Hrozek wrote:
> > > On Fri, Oct 02, 2015 at 10:32:27AM +0200, Jakub Hrozek wrote:
> > > > On Thu, Oct 01, 2015 at 01:41:07PM +0200, Jakub Hrozek wrote:
> > > > > I don't have a good idea how to reproduce except simulate
the failure in
> > > > > gdb, sorry... at least I verified that after setting the context
to NULL:
> > > > > (gdb) set clist[1] = 0
> > > > > the request runs to completion and SSSD doesn't crash
anymore. So I think
> > > > > we should just add a check..
> > > >
> > > > Turns out this is easier to reproduce and hence more embarassing.
Just
> > > > add POSIX attributes to AD but don't replicate them to GC..
> > >
> > > Any takers for review?
> >
> > (I already took it some time ago in patchworks :-)
> >
> > While the patch fixes the issue and is also a right way to fix the issue
> > I would recommend to change it a bit to make the reason more clear.
> >
> > As you said to reproduce the issue you need trust to AD configured which
> > POSIX attributes but not replicate them to the Global Catalog. If you
> > new start with an empty cache a sequence like
> >
> > getent group group_with_posix_gid(a)ad.domain
> > sss_cache -E
> > getent group group_with_posix_gid(a)ad.domain
> >
> > will trigger the crash.
> >
> > What happens is that during the first lookup SSSD sees that the GC does
> > not have the POSIX attributes and disable the GC lookup. As a
> > consequence ad_gc_conn_list() will return only a single entry for the
> > second lookup and unconditionally accessing the second entry will fail.
> > While the fix is correct, I would recommend to add a boolean option to
> > ad_gc_conn_list() and set ignore_mark_offline in ad_gc_conn_list().
> > Otherwise you have to add extra logic to ipa_get_ad_acct_send() to make
> > sure ignore_mark_offline is set even it ad_gc_conn_list() only returns
> > one element.
>
> You're right that the logic was scattered all over the place. The
> attached patches consolidate it into ad_common.c so all callers use the
> same code. It's a larger change than before, but hopefully it's still
> digestable and as a bonus we can have tests for different combinations
> of contexts, GC status and master/sub domains.
>
> The second patch is not really needed, the only reason I included it is
> that I think it's better to have similar code grouped together in small
> functions and again, tests.
>
> If you think these are too invasive, we can only apply them to master
> and do exactly what you proposed for sssd-1-13.
>
> btw I had a bit of trouble with downstream tests, the seem to fail in
> unrelated way, so I'm sending the patches to the list after some manual
> testing and will resume the downstream tests tomorrow..
The attached patches fix some more Sumit's review comments. With Lukas'
help I was also able to run downstream ad_forest tests and didn't see
any issues even with the patches.
The patches are looking good. I think they are not too invasive and can
be pushed to sssd-1-13 as well. As expected as I wasn't able to trigger
the crash anymore and CI
(
http://sssd-ci.duckdns.org/logs/job/29/39/summary.html) passed as well.
ACK
bye,
Sumit