bz2007417-glibc-ldd-segfaults-when-inspecting failure
by Florian Weimer
Sergey,
we currently have this failure in Fedora rawhide gating:
:: [ 00:37:41 ] :: [ LOG ] :: Output of 'ldd /usr/lib/modules/5.17.0-0.rc2.20220204gitdcb85f85fa6f.86.fc36.x86_64/vdso/vdso64.so
/usr/lib/modules/5.17.0-0.rc3.89.fc36.x86_64/vdso/vdso64.so':
:: [ 00:37:41 ] :: [ LOG ] :: --------------- OUTPUT START ---------------
:: [ 00:37:41 ] :: [ LOG ] :: statically linked
:: [ 00:37:41 ] :: [ LOG ] :: /usr/share/beakerlib/testing.sh: line 879: 6058 Segmentation fault (core dumped) /usr/lib/modules/5.17.0-0.rc3.89.fc36.x86_64/vdso/vdso64.so
:: [ 00:37:41 ] :: [ LOG ] :: --------------- OUTPUT END ---------------
:: [ 00:37:41 ] :: [ FAIL ] :: Command 'ldd /usr/lib/modules/5.17.0-0.rc2.20220204gitdcb85f85fa6f.86.fc36.x86_64/vdso/vdso64.so
/usr/lib/modules/5.17.0-0.rc3.89.fc36.x86_64/vdso/vdso64.so' (Expected 0, got 139)
:: [ 00:37:41 ] :: [ BEGIN ] :: Running 'dmesg'
[ 331.036464] show_signal_msg: 51 callbacks suppressed
[ 331.036468] vdso64.so[6058]: segfault at 0 ip 00007f86dba14047 sp 00007ffde1b49530 error 6 in vdso64.so[7f86dba14000+1000]
[ 331.048415] Code: 00 00 00 40 00 00 00 00 00 00 00 28 0e 00 00 00 00 00 00 00 00 00 00 40 00 38 00 04 00 40 00 11 00 10 00 01 00 00 00 05 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
<https://osci-jenkins-1.ci.fedoraproject.org/job/fedora-ci/job/dist-git-pi...>
I stared at this for a long time, but I think I know now what is going
on. This line
rlRun -l "ldd `find /usr -name vdso64.so`"
looks innocuous enough. But the test machine seem to have *two* kernels
installed, and the find command outputs something like:
/usr/lib/modules/5.17.0-0.rc2.20220204gitdcb85f85fa6f.86.fc36.x86_64/vdso/vdso64.so
/usr/lib/modules/5.17.0-0.rc3.89.fc36.x86_64/vdso/vdso64.so
So two lines of output. In a regular shell, this would work because the
newline is just used as an argument separator. But here, the output is
pasted into the "" string before invoking rlRun, so the actual command
looks like this:
rlRun -l "ldd /usr/lib/modules/5.17.0-0.rc2.20220204gitdcb85f85fa6f.86.fc36.x86_64/vdso/vdso64.so
/usr/lib/modules/5.17.0-0.rc3.89.fc36.x86_64/vdso/vdso64.so"
And it seems that due to the way rlRun is implemented, these are
actually two different shell commands:
ldd /usr/lib/modules/5.17.0-0.rc2.20220204gitdcb85f85fa6f.86.fc36.x86_64/vdso/vdso64.so
/usr/lib/modules/5.17.0-0.rc3.89.fc36.x86_64/vdso/vdso64.so
And the second command crashes because Linux does not refuse to run
shared objects. This is a different bug from the ldd bug, and fixing it
requires binutils changes (Nick's fix should be in the upcoming 2.38
version), and a kernel change (which currently does not exist):
set e_entry to 0 for DSOs that don't have _start.
<https://bugzilla.redhat.com/show_bug.cgi?id=2004952>
Prevent executed .so files with e_entry == 0 from attempting to become
a process.
<https://bugzilla.redhat.com/show_bug.cgi?id=2004942>
Sergey, would you please look at fixing the test? I'm not sure about
the best way to do it. Maybe:
for vdso in `find /usr -name vdso64.so` ; do
rlRun -l "ldd $vdso"
done
?
Thanks,
Florian
2 years, 2 months