Below are the notes about Autocloud update happened last week. There
were few things which went bad, few points worked well. I am trying to
list them here, and how do we plan to make sure that the bad things
never happen again. I am ccing patrick in this mail as he was the
contact point from fedora infrastructure (read: he did all the work).
## deployment using ansible to the newly installed systems
We only had to change yum to dnf, and also had to remove the old hotfix
part from the roles. The current playbooks, roles seems to be stable,
and can help us in the future too.
## fedmsg-hub on the backends double enqueue
We had to restart fedmsg-hubs few times in the backends servers. In
between we also found that fedmsg-hub service was happily enqueueing the
jobs twice (for each compose), and then it got fixed automagically,
nothing was changed in our configuration or in code.
We are still not sure why this happened, but we are trying to dig more
## fedmsg-hub broke due to a faulty dependency
Even though we kept our code up and running for weeks, after the
production deployment we found one of the dependency (fedfind, adamw is
the upstream author) was broken with the fedora atomic image names, and
causing our fedmsg-hub instances go crazy. We have informed upstream,
and got a quick hotfix deployment in few hours after finding the issue.
For the next release we will make sure if keep it running for longer on
our internal hardware with messages from production fedmsg. This
dependency failure was something we should have caught, but could not.
## missing fedmsg(s) from autocloud on completing the testing of a compose
Now this was a known part of the whole development+release cycle. Sayan had
submitted the patch , but there was some slight miscommunication. In my part
i missed to track the state of this dependency.
For the next release, i will make a release/deployment checklist for
autocloud, and get it validated from everyone involved. Most probably we
will add too many minor details to this checklist, but that will help us
to keep things in track about any future deployment. sayan is currently
working to get that particular change in production so that we can send
out fedmsg(s) as required by adam.
## missing package dependency causing missing bridge on libvirt backend
We also found that a missing link in the dependency chain caused a
missing virtual bridge in the libvirt backend. Patrick helped to find
that adding libvirt as dependency for that particular box will fix the
issue in future.
We should test on clean installations while developing next time to make
sure this is not repeated. Plus we should think about getting better on
the stage environment.
## the new webfrontend is better
Sayan did a good job in making the new webfrontend. We can now point out
to the exact failures .
In future we should try to get more input about the features of the
webfrontend. Even though the whole service is made for automation, but
this frontend helps us to find, and point to the right issues found in
## autocloud+tunir did what they are supposed to do
After fixing the hiccups, the autocloud service is doing what it
supposed to do, testing the images. We will push our effort in having
better test coverage in the coming months to take the advantage from
this new deployment.
Please comment/suggest whatever you think about the work. This will help
us to improve in the future releases.
Fedora Cloud Engineer
CPython Core Developer
IRC handle: warlock20
Programming languages used : C, C++, Java, Python...
Systems administration : Not much experience, But I wish to learn more
about security, the architecture of fedora.
Current status: Master student in Computer Science, Specializing in
Communication System at TU Kaiserslautern Germany. Also working as Student
research assistant at Networked Systems Group, TU Kaiserslautern.
What I like to work on: Interested in everything, Mainly interested in
networking side but open to anything.
I wish to learn more and I hope I can improve my skills by contributing to
In my continuing quest to get our daily ansible check/diff report to be
0 and all playbooks to be idempotent I have run into a case I would
like to ask everyone about. :)
The ansible git module is in use in a number of playbooks/roles now. By
default (or if you specify update=yes) it will do a git pull to pull
the latest changes, so it's not idempotent (ie, when running --check it
always shows such tasks as changed because it cannot know if there's
going to be new data or not)
There's a number of ways we could handle this:
1. Set all git: module usage to have 'update=no'. This means you would
need a manual playbook or 'git pull' to pick up changes in the repo(s).
Also, if there was an existing repo on an old commit that was working
and the machine was reprovisioned, it would have a different checkout
and could perhaps not work.
2. Set all git: module usage to use 'version=SHA-1'. This means a
specific commit is checked out. There wouldn't be any changes normally
in check mode. Rebuilding a machine would mean you get the same exact
(hopefully working) SHA-1 you had before. Downside would be that you
would have to update this anytime you needed a new commit.
3. Set all git: module usage to have 'changed_when: False' so they
would never show as changed. However, this would break places where
there's handlers that run when the git repo updates.
4. Set all git: modules to when: not ansible_check_mode, so they don't
even run in check mode. However, this will break all the places that
register an output from the git module, so it won't work there.
5. Weed out these git changes from our reports so they still are
changed, but don't annoy me.
Personally, I am torn between 1 and 2 and dislike 3 and 4 and really
Any other ideas? Or opinions on which way to go?
Can someone please run the push-badges playbook? I committed the
change below to the badges git repo as well as updating the badges
testing rules according to;
Step 8 (Badges) (final release only)
It could be that I should have access to run the playbook and am doing
something wrong (or maybe locked myself out with incorrect passwords?)
but it's been a long day and I really dont want anyone to get paged on
my account. (I can't imagine how a change to a README or rules would
cause this, but the changes aren't hyper-urgent either ;)
#aikidouke (who needs to copy his gpg key over to this laptop)
----- Forwarded message from Fedora Infrastructure <trac(a)fedorahosted.org> -----
Date: Thu, 07 Jul 2016 02:17:40 -0000
From: Fedora Infrastructure <trac(a)fedorahosted.org>
Subject: Re: [Fedora Infrastructure] #5386: Update README for
generating STL badges
X-Mailer: Trac 0.12.5, by Edgewall Software
#5386: Update README for generating STL badges
Reporter: jflory7 | Owner: sysadmin-members
Type: enhancement | Status: new
Priority: minor | Milestone: HANDWAVY-FUTURE
Component: General | Version: Production
Severity: Normal | Resolution:
Keywords: | Blocked By:
Blocking: | Sensitive: 0
Comment (by aikidouke):
I committed your patch - but cant run the proper playbook to push the
changes to production. I will email the infra list so someone can do it
soon. Thanks for looking at this!
Ticket URL: <https://fedorahosted.org/fedora-infrastructure/ticket/5386#comment:1>
Fedora Infrastructure <http://fedoraproject.org/wiki/Infrastructure>
Fedora Infrastructure Project for Bugs, feature requests and access to our source code.
----- End forwarded message -----
Good Morning Everyone,
I just made a new pkgdb2 release: 2.4
Here is its changelog:
* Wed Jul 06 2016 Pierre-Yves Chibon <pingou(a)pingoured.fr> - 2.4-1
- Update to 2.4
- Fix some timezone-sensitivity in the tests (Ralph Bean)
- Check namespace policy on branch request (Ralph Bean)
- Do not hard-code packager and provenpackager in the code
- Properly handle bugzilla urls for unretirement requests (Till Maas)
- Fix logger and return the user when emailing an exception
- Correctly fix the change in the API of python-psutil between 1.0 and 2.0+
- Let the Package page return everything regardless of the namespace
- Do not hardcode the URLs to koji, bodhi, packages, bugz... on the package page
- Fix pagination on the list packages and packagers pages
- Add an option to skip the entire update rawhide/creation new branch step in
the pkgdb2_branch script
It is currently running fine in stg and prod.
Happy packaging everyone!
Good Morning Everyone,
This morning Patrick found a security bug in pagure. We fixed it and made a new
release: 2.2.2 with the fix.
This is the corresponding changelog:
* Mon Jul 04 2016 Pierre-Yves Chibon <pingou(a)pingoured.fr> - 2.2.2-1
- Update to 2.2.2
- Security fix release blocking all html related mimetype when displaying the
raw files and forces the browser to download them instead (Thanks to Patrick
Uiterwijk for finding this issue)
Prod and stg have been upgraded for it.
If you are running your own pagure instance, make sure to pull/apply the
following fix: https://pagure.io/pagure/c/dbcc8abdde2e78acd6bae7fe5cc095294193686b
Thanks for your attention,
I have just been informed that there will be maintenance on the
network equipment that is serving
the Fedora Infra Cloud, where one switch will be forced to reproduce
an issue it hit recently.
This maintenance starts around 23:00 UTC and has an unknown duration,
but hopefully less than
There should be no considerable impact on our availability during
this, since we will be routed over