Francesco, Yaniv,
Have you found the time to dig into the coredump of these Segmentation faults?
(It is unrelated to Liron's new patches)
On Fri, May 30, 2014 at 01:14:03PM +0100, Jenkins ci oVirt Server wrote:
Project: http://jenkins.ovirt.org/job/vdsm_master_unit_tests/ Build: http://jenkins.ovirt.org/job/vdsm_master_unit_tests/3043/ Build Number: 3043 Build Status: Failure Triggered By: Started by an SCM change
Changes Since Last Success:
Changes for Build #3043 [Federico Simoncelli] hsm: unify vm ovf management lock
[Liron Aravot] core: imageSharing - export logic to functions
[Liron Aravot] core: removal of unneeded callback passing
[Liron Aravot] core: generify "streamDownloadImage" related methods
[Liron Aravot] core: BindingXMLRPC - exporting logic out from do_PUT.
[Liron Aravot] core: introducing uploadImageToStream
Failed Tests:
1 tests failed. FAILED: nosetests.xml.<init>
Error Message:
Stack Trace: Test report file /home/jenkins/workspace/vdsm_master_unit_tests/tests/nosetests.xml was length 0
----- Original Message -----
From: "Dan Kenigsberg" danken@redhat.com To: fromani@redhat.com, smizrahi@redhat.com, ybronhei@redhat.com Cc: eedri@redhat.com, abaron@redhat.com, vdsm-patches@lists.fedorahosted.org, dcaro@redhat.com, laravot@redhat.com, fsimonce@redhat.com Sent: Friday, May 30, 2014 3:00:26 PM Subject: Re: [oVirt Jenkins] vdsm_master_unit_tests - Build # 3043 - Failure!
Francesco, Yaniv,
Have you found the time to dig into the coredump of these Segmentation faults?
(It is unrelated to Liron's new patches)
No much progress since last report. I'm very annoyed by this issue but I'm having hard time wrapping my head around it.
Let me summarize what I currently know:
* the segfault should be reproduceable on any box running nose >= 1.3.0, just using $ cd vdsm $ ./configure && make $ NOSE_WITH_XUNIT=1 make check or at least I can reproduce the issue on all the boxes I tried locally (vanilla F20, F19)
* if we run each testunit separately, we do NOT observe the failure. This triggers the segfault: $ cd tests $ ./run_tests_local.sh ./*.py
This does not: $ cd tests $ for TEST in `ls ./*.py`; do ./run_tests_local.sh $TEST; done
* the stack traces I observed are huge, more than 750 levels deep. This suggests the stack exausted, and this in turn probably triggered by some kind of recursion gone wild. Note the offending stack trace is just on one thread; all the others are quiet.
* I tried to reproduce the issue with a simpler use case with no luck so far.
At the moment I don't have better suggestions than bite the bullet and dig in the huge stack trace looking for repetitive patterns or some sort of hint.
Suggestions welcome!
----- Original Message -----
From: "Francesco Romani" fromani@redhat.com To: "Dan Kenigsberg" danken@redhat.com Cc: smizrahi@redhat.com, ybronhei@redhat.com, eedri@redhat.com, abaron@redhat.com, vdsm-patches@lists.fedorahosted.org, dcaro@redhat.com, laravot@redhat.com, fsimonce@redhat.com Sent: Friday, May 30, 2014 4:15:17 PM Subject: Re: [oVirt Jenkins] vdsm_master_unit_tests - Build # 3043 - Failure!
----- Original Message -----
From: "Dan Kenigsberg" danken@redhat.com To: fromani@redhat.com, smizrahi@redhat.com, ybronhei@redhat.com Cc: eedri@redhat.com, abaron@redhat.com, vdsm-patches@lists.fedorahosted.org, dcaro@redhat.com, laravot@redhat.com, fsimonce@redhat.com Sent: Friday, May 30, 2014 3:00:26 PM Subject: Re: [oVirt Jenkins] vdsm_master_unit_tests - Build # 3043 - Failure!
Francesco, Yaniv,
Have you found the time to dig into the coredump of these Segmentation faults?
(It is unrelated to Liron's new patches)
No much progress since last report. I'm very annoyed by this issue but I'm having hard time wrapping my head around it.
Let me summarize what I currently know:
- the segfault should be reproduceable on any box running nose >= 1.3.0, just
using $ cd vdsm $ ./configure && make $ NOSE_WITH_XUNIT=1 make check or at least I can reproduce the issue on all the boxes I tried locally (vanilla F20, F19)
- if we run each testunit separately, we do NOT observe the failure.
This triggers the segfault: $ cd tests $ ./run_tests_local.sh ./*.py
This does not: $ cd tests $ for TEST in `ls ./*.py`; do ./run_tests_local.sh $TEST; done
- the stack traces I observed are huge, more than 750 levels deep.
This suggests the stack exausted, and this in turn probably triggered by some kind of recursion gone wild.
I think I have that DVD
Note the offending stack trace is just on one thread; all the others are quiet.
- I tried to reproduce the issue with a simpler use case with no luck so far.
At the moment I don't have better suggestions than bite the bullet and dig in the huge stack trace looking for repetitive patterns or some sort of hint.
Can't seem to reproduce with 1.3.0 and f20. Where can I get the core dump?
Suggestions welcome!
-- Francesco Romani RedHat Engineering Virtualization R & D Phone: 8261328 IRC: fromani
----- Original Message -----
From: "Saggi Mizrahi" smizrahi@redhat.com To: "Francesco Romani" fromani@redhat.com Cc: "Dan Kenigsberg" danken@redhat.com, ybronhei@redhat.com, eedri@redhat.com, abaron@redhat.com, vdsm-patches@lists.fedorahosted.org, dcaro@redhat.com, laravot@redhat.com, fsimonce@redhat.com Sent: Thursday, June 5, 2014 11:24:27 AM Subject: Re: [oVirt Jenkins] vdsm_master_unit_tests - Build # 3043 - Failure!
At the moment I don't have better suggestions than bite the bullet and dig in the huge stack trace looking for repetitive patterns or some sort of hint.
Can't seem to reproduce with 1.3.0 and f20. Where can I get the core dump?
Strange :\
However, here it is another core from my main dev box, which is a F20 with few updates (mostly from virt-preview and few other places - full list, if relevant, provided as pkgs.txt.gz in the folder below)
here https://drive.google.com/folderview?id=0B9ZpeH8QzH5rY1NEZUpwZUw2bzQ&usp=...
core.20626.1000.gz is the fresh new core, core.20626.1000.md5 is its checksum.
Let me know if I can provide further assistence
vdsm-patches@lists.fedorahosted.org