su starts behaving oddly sometimes on F19

Adam Williamson awilliam at redhat.com
Mon May 20 19:29:49 UTC 2013


On Fri, 2013-05-17 at 12:33 +0200, Tomas Mraz wrote:
> On Thu, 2013-05-16 at 22:29 -0700, Adam Williamson wrote: 
> > This is a weird bug I've seen 3 or 4 times since upgrading to F19, and
> > am having trouble pinning down.
> > 
> > Occasionally, after my session has been up for some time, runs of 'su'
> > start behaving oddly. After I enter the root password, it takes a long
> > time - longer than the delay that's always happened when you fat-finger
> > your password, even - before it succeeds. Usually, of course, it's
> > instant. But when this bug happens, I just get to sit there while it
> > thinks about it for like 10-15 seconds before I eventually get a root
> > prompt.
> > 
> > This only applies to my desktop session - but it applies to any terminal
> > running in the desktop. At least, it applies to gnome-terminal (even if
> > closed and opened again) and xterm. But if I go to ctrl-alt-f2, login,
> > and run an 'su', that still returns instantly.
> > 
> > I've tried strace'ing su, but interestingly, I can't get it to work:
> > running su via strace always seems to result in "Authentication
> > failure" (which doesn't display this bug; it's only _normally_ slow, the
> > same slowness that has always been the case when you fail the password).
> > 
> > So I'm kinda stuck, really. Has anyone else seen this? Any bright ideas
> > for debugging it? Thanks!
> 
> Try attaching the strace (as root) to the su process when it waits for
> the password.

Wow, that's a good (and obvious) idea. *smacks self on forehead*

So that works indeed, and now I can see where the pause is. Relevant
snippet:
                          = 0
stat("/root/.xauth/import", 0x7fff660f8b70) = -1 ENOENT (No such file or directory)
getgid()                                = 1001
getuid()                                = 1001
pipe([4, 6])                            = 0
pipe([7, 8])                            = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f2a174b9a10) = 15192
close(4)                                = 0
close(8)                                = 0
close(6)                                = 0
read(7, 

** PAUSE HERE **

"", 2048)                       = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=15192, si_status=1, si_utime=0, si_stime=0} ---
close(7)                                = 0

So it forks off a child, and pauses while that child runs. So I did it
again, and straced the child process, and here's what I get:

[root at adam Beta-RC2]# strace -p 22182
Process 22182 attached
restart_syscall(<... resuming interrupted call ...>) = 0
open("/run/gdm/auth-for-adamw-szT11D/database-c", O_WRONLY|O_CREAT|O_EXCL, 0600) = -1 EACCES (Permission denied)
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({2, 0}, 0x7fff8b56d5c0)       = 0
open("/run/gdm/auth-for-adamw-szT11D/database-c", O_WRONLY|O_CREAT|O_EXCL, 0600) = -1 EACCES (Permission denied)
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({2, 0}, 0x7fff8b56d5c0)       = 0
open("/run/gdm/auth-for-adamw-szT11D/database-c", O_WRONLY|O_CREAT|O_EXCL, 0600) = -1 EACCES (Permission denied)
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({2, 0}, 0x7fff8b56d5c0)       = 0
open("/run/gdm/auth-for-adamw-szT11D/database-c", O_WRONLY|O_CREAT|O_EXCL, 0600) = -1 EACCES (Permission denied)
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({2, 0}, 0x7fff8b56d5c0)       = 0
open("/run/gdm/auth-for-adamw-szT11D/database-c", O_WRONLY|O_CREAT|O_EXCL, 0600) = -1 EACCES (Permission denied)
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({2, 0}, 0x7fff8b56d5c0)       = 0
open("/run/gdm/auth-for-adamw-szT11D/database-c", O_WRONLY|O_CREAT|O_EXCL, 0600) = -1 EACCES (Permission denied)
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({2, 0}, 0x7fff8b56d5c0)       = 0
open("/run/gdm/auth-for-adamw-szT11D/database-c", O_WRONLY|O_CREAT|O_EXCL, 0600) = -1 EACCES (Permission denied)
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({2, 0}, 0x7fff8b56d5c0)       = 0
write(2, "/usr/bin/xauth:  timeout in lock"..., 91) = -1 EBADF (Bad file descriptor)
write(2, "\0", 1)                       = -1 EBADF (Bad file descriptor)
exit_group(1)                           = ?
+++ exited with 1 +++

The actual child process that is run is:

/usr/bin/xauth -f /run/gdm/auth-for-adamw-szT11D/database nlist :0

If we look at the permissions:

[root at adam Beta-RC2]# ls -lZ /run/gdm/auth-for-adamw-szT11D/database 
-rw-------. adamw adamw system_u:object_r:xdm_var_run_t:s0 /run/gdm/auth-for-adamw-szT11D/database
[root at adam Beta-RC2]# ls -dlZ /run/gdm/auth-for-adamw-szT11D/
drwx--x--x. adamw adamw system_u:object_r:xdm_var_run_t:s0 /run/gdm/auth-for-adamw-szT11D/

note, the file is just 'database', while the strace shows 'database-c';
I don't know if that's significant.

SELinux doesn't seem to be involved here, or at least, 'setenforce
Permissive' doesn't make the problem go away. I note in the 'broken'
state some other stuff doesn't work either, though some of these may be
other bugs. One that's very likely related is that I have a dumb hack
set up to isolate facebook from the rest of my web use (for privacy
reasons); a user account called 'facebook' and a little script
called /usr/local/bin/facebook which just does this:

#!/bin/sh
su facebook -c "firefox http://www.facebook.com"

my main user account never goes to facebook.com and has it blocked seven
ways from Sunday using noscript and adblock. Anyhow, when I'm seeing
this problem with a plain 'su', that dumb facebook trick doesn't work
any more: after entering the 'facebook' user's password, I get a similar
long pause, then:

(process:22887): GLib-CRITICAL **: g_slice_set_config: assertion
`sys_page_size == 0' failed
No protocol specified
No protocol specified
Error: cannot open display: :0

that's almost certainly the same bug. One that I'm not sure is the same
bug is that an attempt to loopback mount an ISO file to /mnt/temp when
running as root after an 'su' command that hit this bug seems to hang
and be unkillable.
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | identi.ca: adamwfedora
http://www.happyassassin.net



More information about the devel mailing list