Proposal: AutoQA Infrastructure Upgrade
by Tim Flink
After looking at the jobstats on autoqa-stg over the last day or so
since I changed all of the clients to be fc18, I'm happy enough with
the completion rates to suggest that we upgrade production autoqa and
get rid of our fc17 clients.
Proposed setup:
2x virthosts with 4x VMs each, 4096M and 20G disk
- 6x f18 x86_64
- 2x f18 i386
This is the setup I've been using in stg for a while and it seems to be
able to keep up with incoming jobs just fine. If it turns out not to be
enough, we can assign another virthost to production. With the new
ansible playbooks, it's trivial to reconfigure/add/remove clients and
the full process from virthost install/config to client install/config
would take well under an hour.
I'd like to do this on either Tuesday or Wednesday when we're removing
the fc17 repos from autoqa production. At that time, I'll upgrade
autoqa01 with the test fixes from the last several days, remove the
fc17 repos and disable the conflicts test as a non-hotfix.
Are there any objections to this plan or any additional details
requested before making a decision?
Tim
10 years, 9 months
[AutoQA] #439: depcheck crashes almost every time on fc19
by fedora-badges
#439: depcheck crashes almost every time on fc19
---------------------+----------------------
Reporter: tflink | Owner:
Type: defect | Status: new
Priority: major | Milestone: Depcheck
Component: tests | Keywords:
Blocked By: | Blocking:
---------------------+----------------------
While upgrading clients to fc18 and fc19 in staging, I noticed that there
were a lot of depcheck crashes on the fc19 clients (120/128 runs crashed).
Example job:
http://autoqa-stg.fedoraproject.org/results/238990-autotest/
Traceback:
{{{
Traceback (most recent call last):
File "./depcheck", line 112, in <module>
profile=opts.profile)
File "/usr/share/autotest/tests/depcheck/depcheck_lib.py", line 394,
in depcheck_main
test_dir=temp_dir, accepted_dir=acc_dir)
File "/usr/share/autotest/tests/depcheck/depcheck_lib.py", line 338,
in do_depcheck
do_mash(test_dir, mash_arches)
File "/usr/share/autotest/tests/depcheck/depcheck_lib.py", line 157,
in do_mash
rc = themash.doMultilib()
File "/usr/lib/python2.7/site-packages/mash/__init__.py", line 592, in
doMultilib
pid = self.doDepSolveAndMultilib(arch, repocache)
File "/usr/lib/python2.7/site-packages/mash/__init__.py", line 465, in
doDepSolveAndMultilib
self.config.rpm_path % {'arch':arch})
File "/usr/lib64/python2.7/posixpath.py", line 75, in join
if b.startswith('/'):
AttributeError: 'NoneType' object has no attribute 'startswith'
Traceback (most recent call last):
File "./depcheck", line 112, in <module>
profile=opts.profile)
File "/usr/share/autotest/tests/depcheck/depcheck_lib.py", line 394,
in depcheck_main
test_dir=temp_dir, accepted_dir=acc_dir)
File "/usr/share/autotest/tests/depcheck/depcheck_lib.py", line 341,
in do_depcheck
y.pkgSack # populates all package sacks
File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 1050, in
<lambda>
pkgSack = property(fget=lambda self: self._getSacks(),
File "/usr/lib/python2.7/site-packages/yum/__init__.py", line 770, in
_getSacks
self.repos.populateSack(which=repos)
File "/usr/lib/python2.7/site-packages/yum/repos.py", line 387, in
populateSack
sack.populate(repo, mdtype, callback, cacheonly)
File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 224, in
populate
if self._check_db_version(repo, mydbtype):
File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 316, in
_check_db_version
return repo._check_db_version(mdtype)
File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 1483, in
_check_db_version
repoXML = self.repoXML
File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 1669, in
<lambda>
repoXML = property(fget=lambda self: self._getRepoXML(),
File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 1665, in
_getRepoXML
self._loadRepoXML(text=self.ui_id)
File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 1656, in
_loadRepoXML
return self._groupLoadRepoXML(text, self._mdpolicy2mdtypes())
File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 1630, in
_groupLoadRepoXML
if self._commonLoadRepoXML(text):
File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 1455, in
_commonLoadRepoXML
result = self._getFileRepoXML(local, text)
File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 1233, in
_getFileRepoXML
size=102400) # setting max size as 100K
File "/usr/lib/python2.7/site-packages/yum/yumRepo.py", line 1029, in
_getFile
raise Errors.NoMoreMirrorsRepoError(errstr, errors)
yum.Errors.NoMoreMirrorsRepoError: failure: repodata/repomd.xml from
pending: [Errno 256] No more mirrors to try.
file:///tmp/depcheck.oEfOtO/repodata/repomd.xml: [Errno 14] curl#37 -
"Couldn't open file /tmp/depcheck.oEfOtO/repodata/repomd.xml"
}}}
--
Ticket URL: <https://fedorahosted.org/autoqa/ticket/439>
AutoQA <http://autoqa.fedorahosted.org>
Automated QA project
10 years, 9 months
[AutoQA] #438: repoclosure test is failing due to change in behavior upstream
by fedora-badges
#438: repoclosure test is failing due to change in behavior upstream
---------------------+-------------------------------------------------
Reporter: tflink | Owner:
Type: defect | Status: new
Priority: major | Milestone: Package Update Acceptance Test Plan
Component: tests | Keywords:
Blocked By: | Blocking:
---------------------+-------------------------------------------------
Starting with F19, I'm seeing crashes in the repoclosure test on failure:
ref: http://autoqa-
stg.fedoraproject.org/results/238806-autotest/virt27.qa/debug/client.0.DEBUG
{{{
07/16 15:16:41 ERROR| test:0572| Traceback (most recent call last):
07/16 15:16:41 ERROR| test:0572| File "/usr/lib/python2.7/site-
packages/autoqa/decorators.py", line 72, in newf
07/16 15:16:41 ERROR| test:0572| f_result = f(*args, **kwargs)
#call the decorated function
07/16 15:16:41 ERROR| test:0572| File
"/usr/share/autotest/tests/repoclosure/repoclosure.py", line 52, in
run_once
07/16 15:16:41 ERROR| test:0572| out = utils.system_output(cmd,
retain_output=True)
07/16 15:16:41 ERROR| test:0572| File
"/usr/share/autotest/common_lib/base_utils.py", line 931, in system_output
07/16 15:16:41 ERROR| test:0572| args=args).stdout
07/16 15:16:41 ERROR| test:0572| File
"/usr/share/autotest/common_lib/base_utils.py", line 658, in run
07/16 15:16:41 ERROR| test:0572| "Command returned non-zero exit
status")
07/16 15:16:41 ERROR| test:0572| CmdError: Command <repoclosure
--tempcache --newest
--repofrompath=target,http://infrastructure.fedoraproject.org/pub/fedora/linux/updates/testing/18/x86_64
--repoid=target
--repofrompath=parent-1,http://infrastructure.fedoraproject.org/pub/fedora/linux/updates/18/x86_64
--repoid=parent-1
--repofrompath=parent-2,http://infrastructure.fedoraproject.org/pub/fedora/linux/releases/18/Everything/x86_64/os
--repoid=parent-2> failed, rc=1, Command returned non-zero exit status
07/16 15:16:41 ERROR| test:0572| * Command:
07/16 15:16:41 ERROR| test:0572| repoclosure --tempcache --newest
--repofrompath=target,http://infrastruct
07/16 15:16:41 ERROR| test:0572|
ure.fedoraproject.org/pub/fedora/linux/updates/testing/18/x86_64
07/16 15:16:41 ERROR| test:0572| --repoid=target
--repofrompath=parent-1,http://infrastructure.fedoraprojec
07/16 15:16:41 ERROR| test:0572|
t.org/pub/fedora/linux/updates/18/x86_64 --repoid=parent-1 --repofrompath=
07/16 15:16:41 ERROR| test:0572|
parent-2,http://infrastructure.fedoraproject.org/pub/fedora/linux/releases
07/16 15:16:41 ERROR| test:0572| /18/Everything/x86_64/os
--repoid=parent-2
07/16 15:16:41 ERROR| test:0572| Exit status: 1
07/16 15:16:41 ERROR| test:0572| Duration: 195.717083931
}}}
This is causing F19 repoclosure to crash instead of fail and no results
are reported
--
Ticket URL: <https://fedorahosted.org/autoqa/ticket/438>
AutoQA <http://autoqa.fedorahosted.org>
Automated QA project
10 years, 9 months
[Fedora QA] #383: Migrate from FAS to FAS-OpenID
by fedora-badges
#383: Migrate from FAS to FAS-OpenID
--------------------------------------+------------------------
Reporter: tflink | Owner: tflink
Type: enhancement | Status: new
Priority: major | Milestone: Fedora 20
Component: Blocker bug tracker page | Version:
Keywords: | Blocked By:
Blocking: |
--------------------------------------+------------------------
= problem =
With the introduction of FAS-OpenID, the current method of FAS
authentication will be going away somewhere around F20. We will need to
migrate to the newer FAS-OpenID before that time.
= analysis =
The OpenID code in python-fedora-flask should make the transition
relatively painless but it will still require quite a bit of testing to
make sure that everything still works.
--
Ticket URL: <https://fedorahosted.org/fedora-qa/ticket/383>
Fedora QA <http://fedorahosted.org/fedora-qa>
Fedora Quality Assurance
10 years, 9 months
Re: [AutoQA] #433: conflicts keep crashing
by fedora-badges
#433: conflicts keep crashing
--------------------+-------------------------
Reporter: kparal | Owner:
Type: defect | Status: new
Priority: major | Milestone: Hot issues
Component: tests | Resolution:
Keywords: | Blocked By:
Blocking: |
--------------------+-------------------------
Comment (by tflink):
I happened to be on the console of a fc18 box when conflicts was failing
on it and found an outofmemory error. That particular client on stg has
4096M memory.
{{{
[89423.693552] potential_confl invoked oom-killer: gfp_mask=0x201da,
order=0, oom_score_adj=0
[89423.697457] potential_confl cpuset=/ mems_allowed=0
[89423.698872] Pid: 5297, comm: potential_confl Not tainted
3.9.11-200.fc18.x86_64 #1
[89423.700951] Call Trace:
[89423.701564] [<ffffffff810d39d6>] ?
cpuset_print_task_mems_allowed+0x96/0xc0
[89423.702701] [<ffffffff81656ac0>] dump_header+0x7a/0x1be
[89423.703553] [<ffffffff810f86ce>] ? __delayacct_freepages_end+0x2e/0x30
[89423.704629] [<ffffffff81304533>] ? ___ratelimit+0xa3/0x120
[89423.705524] [<ffffffff811349c7>] oom_kill_process+0x1c7/0x310
[89423.706473] [<ffffffff8106b375>] ? has_ns_capability_noaudit+0x15/0x20
[89423.707565] [<ffffffff81135199>] out_of_memory+0x439/0x480
[89423.708463] [<ffffffff8113b112>] __alloc_pages_nodemask+0xac2/0xae0
[89423.709488] [<ffffffff811798e8>] alloc_pages_current+0xb8/0x190
[89423.710449] [<ffffffff81131167>] __page_cache_alloc+0xd7/0x100
[89423.711401] [<ffffffff8113161c>] ? find_get_page+0x3c/0x110
[89423.712312] [<ffffffff811336b2>] filemap_fault+0x2b2/0x4a0
[89423.713204] [<ffffffff81158ff1>] __do_fault+0x71/0x550
[89423.714042] [<ffffffff8115c3e5>] handle_pte_fault+0x95/0xa90
[89423.714952] [<ffffffff811b66c1>] ? touch_atime+0x71/0x140
[89423.715836] [<ffffffff810135d1>] ? __switch_to+0x181/0x4a0
[89423.716732] [<ffffffff812fe901>] ? cpumask_any_but+0x31/0x40
[89423.717654] [<ffffffff8115dc31>] handle_mm_fault+0x291/0x650
[89423.718579] [<ffffffff81162283>] ? vma_adjust+0x343/0x770
[89423.719468] [<ffffffff81663591>] __do_page_fault+0x181/0x4f0
[89423.720396] [<ffffffff81165cc7>] ? mprotect_fixup+0x157/0x280
[89423.721337] [<ffffffff8166390e>] do_page_fault+0xe/0x10
[89423.722190] [<ffffffff81663085>] do_async_page_fault+0x35/0x90
[89423.723137] [<ffffffff8165ff08>] async_page_fault+0x28/0x30
[89423.724039] Mem-Info:
[89423.724413] Node 0 DMA per-cpu:
[89423.724926] CPU 0: hi: 0, btch: 1 usd: 0
[89423.725697] CPU 1: hi: 0, btch: 1 usd: 0
[89423.726464] Node 0 DMA32 per-cpu:
[89423.727019] CPU 0: hi: 186, btch: 31 usd: 0
[89423.727785] CPU 1: hi: 186, btch: 31 usd: 30
[89423.728550] Node 0 Normal per-cpu:
[89423.729123] CPU 0: hi: 186, btch: 31 usd: 0
[89423.729876] CPU 1: hi: 186, btch: 31 usd: 0
[89423.730652] active_anon:776790 inactive_anon:192701 isolated_anon:32
[89423.730652] active_file:26 inactive_file:4 isolated_file:0
[89423.730652] unevictable:0 dirty:5 writeback:7 unstable:0
[89423.730652] free:21224 slab_reclaimable:2391 slab_unreclaimable:6820
[89423.730652] mapped:32 shmem:1 pagetables:5149 bounce:0
[89423.730652] free_cma:0
[89423.735521] Node 0 DMA free:15908kB min:264kB low:328kB high:396kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB
managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB
slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB
pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? yes
[89423.741642] lowmem_reserve[]: 0 3472 3919 3919
[89423.742446] Node 0 DMA32 free:61268kB min:59636kB low:74544kB
high:89452kB active_anon:2920720kB inactive_anon:584264kB active_file:80kB
inactive_file:16kB unevictable:0kB isolated(anon):128kB isolated(file):0kB
present:3653624kB managed:3555940kB mlocked:0kB dirty:40kB writeback:16kB
mapped:128kB shmem:4kB slab_reclaimable:696kB slab_unreclaimable:752kB
kernel_stack:24kB pagetables:7984kB unstable:0kB bounce:0kB free_cma:0kB
writeback_tmp:0kB pages_scanned:10254 all_unreclaimable? yes
[89423.749120] lowmem_reserve[]: 0 0 447 447
[89423.749837] Node 0 Normal free:7596kB min:7676kB low:9592kB
high:11512kB active_anon:186440kB inactive_anon:186540kB active_file:24kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:524288kB managed:457876kB mlocked:0kB dirty:0kB writeback:12kB
mapped:0kB shmem:0kB slab_reclaimable:8868kB slab_unreclaimable:26528kB
kernel_stack:688kB pagetables:12612kB unstable:0kB bounce:0kB free_cma:0kB
writeback_tmp:0kB pages_scanned:1060 all_unreclaimable? yes
[89423.756661] lowmem_reserve[]: 0 0 0 0
[89423.757356] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U)
1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) =
15908kB
[89423.759810] Node 0 DMA32: 131*4kB (UEM) 57*8kB (UE) 32*16kB (UEM)
17*32kB (UE) 10*64kB (UE) 4*128kB (UEM) 3*256kB (UE) 2*512kB (UE) 3*1024kB
(UEM) 2*2048kB (M) 12*4096kB (MR) = 61300kB
[89423.762832] Node 0 Normal: 189*4kB (UEM) 106*8kB (UE) 51*16kB (UEM)
22*32kB (UEM) 6*64kB (UE) 10*128kB (EM) 3*256kB (UM) 0*512kB 0*1024kB
1*2048kB (R) 0*4096kB = 7604kB
[89423.765626] 1056 total pagecache pages
[89423.766234] 1032 pages in swap cache
[89423.766806] Swap cache stats: add 10330251, delete 10329219, find
138864/168006
[89423.767972] Free swap = 0kB
[89423.768452] Total swap = 4063228kB
[89423.778866] 1048575 pages RAM
[89423.779414] 36075 pages reserved
[89423.779964] 524034 pages shared
[89423.780498] 990839 pages non-shared
[89423.781064] [ pid ] uid tgid total_vm rss nr_ptes swapents
oom_score_adj name
[89423.782314] [ 206] 0 206 10570 1 23 176
-1000 systemd-udevd
[89423.783695] [ 210] 0 210 9333 29 23 54
0 systemd-journal
[89423.785102] [ 265] 0 265 29121 13 26 98
-1000 auditd
[89423.786359] [ 276] 0 276 62058 0 78 2834
0 firewalld
[89423.787666] [ 277] 0 277 8276 1 20 122
0 systemd-logind
[89423.789040] [ 279] 0 279 64185 23 27 152
0 rsyslogd
[89423.790319] [ 280] 81 280 8137 68 16 50
-900 dbus-daemon
[89423.791653] [ 282] 0 282 30155 22 16 129
0 crond
[89423.792902] [ 293] 0 293 27180 1 10 32
0 agetty
[89423.794168] [ 295] 0 295 86661 104 53 203
0 NetworkManager
[89423.795532] [ 301] 999 301 93406 0 46 1037
0 polkitd
[89423.796808] [ 607] 0 607 25981 0 50 3112
0 dhclient
[89423.798106] [ 632] 0 632 20601 36 44 164
-1000 sshd
[89423.799335] [ 646] 0 646 22534 41 44 396
0 sendmail
[89423.800619] [ 666] 51 666 21395 1 40 377
0 sendmail
[89423.801908] [ 5219] 0 5219 34292 1 25 848
0 autotestd
[89423.803217] [ 5220] 0 5220 54607 17 65 3116
0 autotest
[89423.804499] [ 5222] 0 5222 32728 36 65 248
0 sshd
[89423.805731] [ 5225] 0 5225 54608 16 61 3112
0 autotest
[89423.807027] [ 5226] 0 5226 54608 16 61 3112
0 autotest
[89423.808317] [ 5239] 0 5239 34292 1 26 848
0 autotestd_monit
[89423.809696] [ 5242] 0 5242 26661 0 11 24
0 tail
[89423.810936] [ 5243] 0 5243 26661 0 11 24
0 tail
[89423.812183] [ 5253] 0 5253 81215 111 111 3982
0 autotest
[89423.813467] [ 5293] 0 5293 81216 25 108 4060
0 autotest
[89423.814763] [ 5294] 0 5294 81216 25 108 4060
0 autotest
[89423.816065] [ 5297] 0 5297 2074141 967661 3952 995778
0 potential_confl
[89423.817440] Out of memory: Kill process 5297 (potential_confl) score
939 or sacrifice child
[89423.818757] Killed process 5297 (potential_confl) total-vm:8296564kB,
anon-rss:3870644kB, file-rss:0kB
}}}
--
Ticket URL: <https://fedorahosted.org/autoqa/ticket/433#comment:1>
AutoQA <http://autoqa.fedorahosted.org>
Automated QA project
10 years, 9 months