I'm having a problem building the current zsh srpm (zsh-4.3.10-4.fc13.src.rpm) on my local builder, which runs F11-x86_64 and has mock-0.9.18-1.fc11.noarch. On IRC, a user reported the same problem, only his builder runs CentOS 5.4 and has mock-0.9.14-2.el5.noarch. Surprisingly, the same srpm will build fine in koji.
I'm not sure how to describe the hang in enough detail without having to understand the zsh test framework. The bottom line is simply that one of the tests just hangs. It's doing some testing of the read builtin; I believe it's one of these two stanzas which hangs, although output buffering may make it difficult to be sure:
read -d: <<<foo:bar print $REPLY 0:read up to delimiter
foo
print foo:bar|IFS=: read -A print $reply 0:use different, IFS separator to array
foo bar
The last thing in the build log is "Running test: read up to delimiter". ps just shows:
tibbs 9322 0.0 0.0 114684 1916 pts/3 TN 12:39 00:00:00 ../Src/zsh +Z -f ./ztst.zsh ./B04read.ztst
wchan is signal_stop. The process doesn't seem to be consuming any CPU, but if I strace it, I get an endless stream of
ioctl(11, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig -icanon echo...}) = ? ERESTARTSYS (To be restarted) --- SIGTTOU (Stopped (tty output)) @ 0 (0) --- --- SIGTTOU (Stopped (tty output)) @ 0 (0) ---
In koji there doesn't seem to be any delay at all running this test. I'm stumped.
- J<
That is job control weirdness. Something about the state of tty magic is different in the two different contexts where you run the test. The things to look at are whether the tests are using a temporary pty and what 'ps j' says about all the processes involved in the build/test (including mock and its layers of whatever before the rpmbuild).
"RM" == Roland McGrath roland@redhat.com writes:
RM> That is job control weirdness. Something about the state of tty RM> magic is different in the two different contexts where you run the RM> test.
Unfortunately that level of magic is mostly beyond me.
RM> The things to look at are whether the tests are using a RM> temporary pty
I do not really know how to determine that. However, it simplifies things to know that if there's any difference it's not in the spec but in the environment in which mock is called.
RM> and what 'ps j' says about all the processes involved RM> in the build/test (including mock and its layers of whatever before RM> the rpmbuild).
Here's the forest from px axjf (sorry for long lines):
23843 10479 10479 10479 ? -1 SNs 0 0:00 _ sshd: tibbs [priv] 10479 10486 10479 10479 ? -1 SN 7225 0:00 | _ sshd: tibbs@pts/3 10486 10487 10487 10487 pts/3 10527 SNs 7225 0:00 | _ -zsh 10487 10527 10527 10487 pts/3 10527 SN+ 7225 0:00 | _ /bin/bash /home/tibbs/bin/dobuild /home/tibbs/work/extras-cvs/zsh/devel/zsh-4.3.10-4.fc13.src.rpm 10527 10530 10527 10487 pts/3 10527 SN+ 7225 0:02 | _ /usr/bin/python -tt /usr/sbin/mock -r fedora-rawhide-x86_64 -v --rebuild /home/tibbs/work/extras-cvs/zsh/devel/zsh-4.3.10-4.fc13.src.rpm 10530 13230 13230 10487 pts/3 10527 TN 7225 0:00 | | _ rpmbuild -bb --target x86_64 --nodeps builddir/build/SPECS/zsh.spec 13230 28410 13230 10487 pts/3 10527 TN 7225 0:00 | | _ /bin/sh -e /var/tmp/rpm-tmp.5VZZeU 28410 28414 13230 10487 pts/3 10527 TN 7225 0:00 | | _ make test 28414 28415 13230 10487 pts/3 10527 TN 7225 0:00 | | _ /bin/sh -c cd Test ; make check 28415 28416 13230 10487 pts/3 10527 TN 7225 0:00 | | _ make check 28416 28505 13230 10487 pts/3 10527 TN 7225 0:00 | | _ /bin/sh -c if ZTST_testlist="`for f in ./*.ztst; ? do echo $f; done`" ? ZTST_srcdir="." ? ZTST_exe=../Src/zsh ? ../Src/zsh +Z -f ./runtests.zsh; then ? sta 28505 28507 13230 10487 pts/3 10527 TN 7225 0:00 | | _ ../Src/zsh +Z -f ./runtests.zsh 28507 29888 13230 10487 pts/3 10527 TN 7225 0:00 | | _ ../Src/zsh +Z -f ./ztst.zsh ./B04read.ztst
Is any of that remotely helpful? On the buildsys, a mock build (not for zsh, but they should all start the same) looks sort of like this:
1 3056 3055 3055 ? -1 S 0 391:44 /usr/bin/python /usr/sbin/kojid --force-lock --verbose 3056 17091 17091 3055 ? -1 S 0 0:07 _ /usr/bin/python /usr/sbin/kojid --force-lock --verbose 17091 17272 17091 3055 ? -1 S 101 0:01 _ /usr/bin/python -tt /usr/sbin/mock -r koji/dist-f13-build-653289-93821 --no-clean --target x86_64 --rebuild /mnt/koji/work/tasks/8377/1828377/kvirc-4.0.0-0.19.rc1.fc13.src.rpm 17272 25386 25386 3055 ? -1 S 101 0:00 _ rpmbuild -bb --target x86_64 --nodeps builddir/build/SPECS/kvirc.spec 25386 25413 25386 3055 ? -1 S 101 0:00 _ /bin/sh -e /var/tmp/rpm-tmp.vzHree 25413 25917 25386 3055 ? -1 S 101 0:00 _ make -j4
and so on. There are some obvious differences.
- J<
Unfortunately that level of magic is mostly beyond me.
That's OK. We can translate for you.
RM> The things to look at are whether the tests are using a RM> temporary pty
I do not really know how to determine that.
The TTY column in ps is a start.
However, it simplifies things to know that if there's any difference it's not in the spec but in the environment in which mock is called.
IMHO it is the fault of the package's test suite that it cares the ambient environment where 'make check' is run. If it's going to be affected by the tty state of the caller of "make", then it should run those tests inside a temporary pty (i.e. use expect or script or something).
However, I also think that we should endeavor to make the multiple recommended ways of building Fedora rpms (i.e. "use koji" and "run mock yourself") run their builds in a consistent environment across all the recommended methods.
RM> and what 'ps j' says about all the processes involved RM> in the build/test (including mock and its layers of whatever before RM> the rpmbuild).
Here's the forest from px axjf (sorry for long lines):
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
23843 10479 10479 10479 ? -1 SNs 0 0:00 _ sshd: tibbs [priv] 10479 10486 10479 10479 ? -1 SN 7225 0:00 | _ sshd: tibbs@pts/3 10486 10487 10487 10487 pts/3 10527 SNs 7225 0:00 | _ -zsh 10487 10527 10527 10487 pts/3 10527 SN+ 7225 0:00 | _ /bin/bash /home/tibbs/bin/dobuild /home/tibbs/work/extras-cvs/zsh/devel/zsh-4.3.10-4.fc13.src.rpm 10527 10530 10527 10487 pts/3 10527 SN+ 7225 0:02 | _ /usr/bin/python -tt /usr/sbin/mock -r fedora-rawhide-x86_64 -v --rebuild /home/tibbs/work/extras-cvs/zsh/devel/zsh-4.3.10-4.fc13.src.rpm
This shows mock running on your normal tty (TTY matches your command-line shell) in the foreground (PGID==TPGID means "foreground", and PGID!=TPGID, means "background"), as you would expect.
10530 13230 13230 10487 pts/3 10527 TN 7225 0:00 | | _ rpmbuild -bb --target x86_64 --nodeps builddir/build/SPECS/zsh.spec
This shows mock runs rpmbuild in the same session and in the background. This is the same is if you had done:
$ rpmbuild -bb --target x86_64 --nodeps builddir/build/SPECS/zsh.spec ^Z (before the test suite) [1]+ Stopped rpmbuild -bb --target x86_64 --nodeps builddir/build/SPECS/zsh.spec $ bg [1]+ rpmbuild -bb --target x86_64 --nodeps builddir/build/SPECS/zsh.spec & $
I think if you try it that way you will see the same problem. I suspect that if you run it in the foreground, you won't have any problem.
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
13230 28410 13230 10487 pts/3 10527 TN 7225 0:00 | | _ /bin/sh -e /var/tmp/rpm-tmp.5VZZeU 28410 28414 13230 10487 pts/3 10527 TN 7225 0:00 | | _ make test 28414 28415 13230 10487 pts/3 10527 TN 7225 0:00 | | _ /bin/sh -c cd Test ; make check 28415 28416 13230 10487 pts/3 10527 TN 7225 0:00 | | _ make check 28416 28505 13230 10487 pts/3 10527 TN 7225 0:00 | | _ /bin/sh -c if ZTST_testlist="`for f in ./*.ztst; ? do echo $f; done`" ? ZTST_srcdir="." ? ZTST_exe=../Src/zsh ? ../Src/zsh +Z -f ./runtests.zsh; then ? sta 28505 28507 13230 10487 pts/3 10527 TN 7225 0:00 | | _ ../Src/zsh +Z -f ./runtests.zsh 28507 29888 13230 10487 pts/3 10527 TN 7225 0:00 | | _ ../Src/zsh +Z -f ./ztst.zsh ./B04read.ztst
This all shows that rpmbuild on down did no job control fiddling, so it's all in the rpmbuild "job" (PGID == PID of rpmbuild).
Is any of that remotely helpful? On the buildsys, a mock build (not for zsh, but they should all start the same) looks sort of like this:
Yes, this is the information we needed to explain what you are seeing.
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 3056 3055 3055 ? -1 S 0 391:44 /usr/bin/python /usr/sbin/kojid --force-lock --verbose
3056 17091 17091 3055 ? -1 S 0 0:07 _ /usr/bin/python /usr/sbin/kojid --force-lock --verbose 17091 17272 17091 3055 ? -1 S 101 0:01 _ /usr/bin/python -tt /usr/sbin/mock -r koji/dist-f13-build-653289-93821 --no-clean --target x86_64 --rebuild /mnt/koji/work/tasks/8377/1828377/kvirc-4.0.0-0.19.rc1.fc13.src.rpm 17272 25386 25386 3055 ? -1 S 101 0:00 _ rpmbuild -bb --target x86_64 --nodeps builddir/build/SPECS/kvirc.spec 25386 25413 25386 3055 ? -1 S 101 0:00 _ /bin/sh -e /var/tmp/rpm-tmp.vzHree 25413 25917 25386 3055 ? -1 S 101 0:00 _ make -j4
This shows that kojid runs mock in an orphan session (SID == PID of somebody dead) with no controlling terminal (TTY/TPGID shows ?/-1). From mock on down behaves the same, i.e. putting rpmbuild in its own process group. But with no tty, this behaves differently than being in a background process group.
Off hand I think that mock should run rpmbuild either in a temporary pty/session or with no tty at all. The no tty option requires mock to catch the SIGHUP when its own tty goes away and know to go kill the rpmbuild that won't get its own SIGHUP, or else being in the middle of a mock command when your session dies (ssh drop, modem drop, terminal window killed, etc.) will leave the build still running.
The temporary pty option seems better to me off hand. Then the whole rpmbuild would be in the foreground on that pty, so the bonehead packages like zsh won't be upset.
Thanks, Roland
buildsys@lists.fedoraproject.org