On Wed, Sep 04, 2013 at 08:50:47AM +0200, Jakub Hrozek wrote:
> On Tue, Aug 20, 2013 at 10:32:35PM +0200, Lukas Slebodnik wrote:
> > On (20/08/13 14:17), Jakub Hrozek wrote:
> > >On Tue, Aug 20, 2013 at 02:00:35PM +0200, Lukas Slebodnik wrote:
> > >> ehlo,
> > >>
> > >> I have a question about struct pam_auth_req and crash from BZ972699
> > >> I thought It is a duplicate of upstream ticket
> > >>
https://fedorahosted.org/sssd/ticket/2018 (use after free)
> > >>
> > >> But I anlysed the core dump and it doe not look like use after free.
> > >> I also checked talloc_magic for struct "pam_auth_req" and it
is ok.
> > >>
> > >> Here is the definition of structure.
> > >> (gdb) ptype preq
> > >> type = struct pam_auth_req {
> > >> struct cli_ctx *cctx;
> > >> struct sss_domain_info *domain;
> > >> struct pam_data *pd;
> > >> pam_dp_callback_t *callback;
> > >> struct ldb_result *res;
> > >> _Bool check_provider;
> > >> void *data;
> > >> } *
> > >>
> > >>
> > >> And here is output from gdb
> > >>
> > >> (gdb) p *preq
> > >> $1 = {cctx = 0x9f90a0, domain = 0x9a3740, pd = 0xa178f0, callback =
0,
> > >> res = 0x0, check_provider = false, data = 0x0}
> > >>
> > >> #0 0x0000000000000000 in ?? ()
> > >> #1 0x00000000004110d1 in pam_dp_process_reply (pending=0x9d3c00,
> > >> ptr=<value optimized out>) at
src/responder/pam/pamsrv_dp.c:79
> > >> #2 0x00007f1aeb0a661a in complete_pending_call_and_unlock (
> > >> connection=0xa1e3a0, pending=0x9d3c00, message=<value optimized
out>)
> > >> at dbus-connection.c:2234
> > >> #3 0x00007f1aeb0a886f in dbus_connection_dispatch
(connection=0xa1e3a0)
> > >> at dbus-connection.c:4397
> > >> #4 0x000000000045425e in sbus_dispatch (ev=0x99d380,
> > >> te=<value optimized out>, tv=..., data=<value optimized
out>)
> > >> at src/sbus/sssd_dbus_connection.c:104
> > >> #5 0x00007f1aeb920bd9 in tevent_common_loop_timer_delay
(ev=0x99d380)
> > >> at ../tevent_timed.c:254
> > >>
> > >> 77 dbus_pending_call_unref(pending);
> > >> 78 dbus_message_unref(msg);
> > >> 79 preq->callback(preq); <<<<<<<
CHASH HERE
> > >> 80 }
> > >> 81
> > >>
> > >> sssd_pam crashed because preq->callback was NULL.
> > >>
> > >> I would like to describe struct pam_auth_req more deeply.
> > >> struct pam_auth_req is created in function pam_forwarder
> > >> form file "src/responder/pam/pamsrv_cmd.c".
> > >>
> > >> Most of structure members are initialized in this function, but
*calback*
> > >> is initialized in function pam_dom_forwarder.
> > >> 1173 if (!NEED_CHECK_PROVIDER(preq->domain->provider)) {
> > >> 1174 preq->callback = pam_reply;
<<<<HERE
> > >> 1175 ret = LOCAL_pam_handler(preq);
> > >> 1176 }
> > >> 1177 else {
> > >> 1178 preq->callback = pam_reply;
<<<<< OR HERE
> > >
> > >It was probably this one, the other call is only called for
> > >id_provider=local
> > >
> > preq->callback is NULL, pam_dom_forwarder could not be called.
> > Output from gdb is in this mail few lines before.
> >
> > >> 1179 ret = pam_dp_send_req(preq,
SSS_CLI_SOCKET_TIMEOUT/2);
> > >> 1180 DEBUG(4, ("pam_dp_send_req returned %d\n",
ret));
> > >> 1181 }
> > >> 1182
> > >>
> > >> There are 3 places where is pam_dom_forwarder called and it is called
only
> > >> in some condition (pam_check_user_search returned EOK)
> > >> Mostly it looks like following lines:
> > >> 852 ret = pam_check_user_search(preq);
> > >> 853 if (ret == EOK) {
> > >> 854 pam_dom_forwarder(preq);
> > >> 855 }
> > >>
> > >>
> > >> And my question are:
> > >> 1. Do we need to initialize pam_auth_req->callback only in
function
> > >> pam_dom_forwarder?
> > >
> > >We could also move the callback to src/responder/pam/pamsrv_dp.c and
> > >either assign it from pam_dp_send_req() or call it directly from
> > >pam_dp_process_reply() because the way I read the code we can only ever
> > >call pam_dp_send_req with cb == pam_reply;
> > >
> > >> 2. Should we check preq->callback for NULL?
> > >
> > >Maybe, as a sanity check, but we still should fix the real bug.
> > >
> > >> 3. Do you have any idea how to reproduce it?
> > >
> > >Can we ask the reporter to run with valgrind? Also, did we rule out
> > >that sssd_be might be crashing or being restarted in between the requst
> > >is send and before it finishes?
> > >
> > There isn't a reproducer for this bug.
> > Reporter wrote: "Every now and then, sssd terminates"
> >
> > >>
> > >> I really don't know what is right solution.
> > >>
> > >> Thank you very much for any ideas.
> > >>
> > >> LS
>
> Hi,
>
> are there any other ideas on how to proceed with this problem? I haven't
> found any..
Maybe new logs with a higher debug level would help. Form the old logs
attached to the BZ it looks that the auth requests just needs a long
time and the client (sshd) start a new request before the old is
finished. This would indicate the use after free issue which does not
match the core dump. New logs would help to find out how SSSD is getting
into the state where the callback is NULL.
bye,
Sumit
Lukas analyzed the core once again (the reporter uploaded new corefiles
after I sent the mail to sssd-devel) and we're now reasonably sure he is
hitting a known bug where, as you said, the request is taking a long
time, the client times out and the responder accesses freed memory that
was freed as a result of client timeout destructor.
I will just build him a scratch build, maybe he can test it out. Looks
like he's running CentOS anyway, so he might now worry about support :)
Thanks for another look at the issue.