Greetings communishift group (and infrastructure list).
I was working on the communishift cluster trying to fix it's failing to
upgrade as well as some cert issues, and managed to munge up the cluster
but good. ;( It's a tribute to the resillance of OpenShift that it's up
and serving applications still. :)
In any event, I think the easiest way to clean things up and get back to
normal is for us to just reinstall it. With that in mind, I am planning
to do so starting at 21UTC on 2019-10-21 (monday).
If everyone could oc export any config or data they wish to save before
then that would be great.
Sorry for the trouble, but hopefully we will be back on track after
Good Morning Everyone,
This morning I found out that https://pagure.io/fedora-infrastructure was not
available, it was throwing a 500 error on every page/call.
I checked the logs and found:
GitError: Error performing curl request: (60): Peer certificate cannot be
authenticated with given CA certificates
The combination and "GitError" and a SSL related error led me to repoSpanner.
So with the help of Patrick, we confirmed that the SSL cert for pagure01 was
expiring on Oct 15th 2019.
We then regenerated that SSL cert.
We thought the repospanner playbook was going to redeploy that cert so I ran it,
but it did not change anything (both in its run as well as in the symptoms
We then found out that this piece is actually part of the pagure.yml playbook,
so I've ran it with `-t repospanner/server` to limit its effect.
Then I've restarted httpd, stunnel and repospanner(a)ansible.service on pagur01.
The first two were likely not necessary, the last one was to get the new cert in
So I would like retro-active approval for my actions since the systems I've
touched are frozen.
You are kindly invited to the meeting:
Fedora Infrastructure on 2019-10-17 from 15:00:00 to 16:00:00 UTC
The meeting will be about:
Weekly Fedora Infrastructure meeting. See infrastructure list for agenda a day before.
Hey, folks. Requesting a freeze break for this PR (as it applies to
In F31 'dnf-yum' is no more and 'yum' obsoletes it, but this was not
changed in comps. As a result, clean installs of F31 (and Rawhide) have
no 'yum' command, as we clearly intend they should.
This isn't likely to make any images go oversize as all the 'yum'
package contains is a symlink linking /usr/bin/yum to /usr/bin/dnf-3
and a manpage; /usr/bin/dnf-3 is part of python3-dnf which would
already be in all the images.
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
I spent the last few weeks studying repoSpanner with the goal of
developing a plan to improve its performance. I started by testing its
performance with a few common git operations with a couple repos (our
Infrastructure Ansible repository since it is on the large side, and
Bodhi since I had it cloned already and is perhaps a "typical" medium
sized project). I wrote an initial report about those tests here.
Since the time of that report, I have done some performance profiling
on the git push for the Bodhi repository, since that by far was the
slowest operation that I tested.
I found that the most significant time was spent interacting with
sqlite. sqlite is used today by repoSpanner as a task queue. There are
two different workflows. The first is that it creates a table per
repoSpanner node, and each row of the table represents a git object ID
that needs to be pushed to that node. The second is that there is
another table that tracks each object ID along with how many nodes that
particular object ID has been successfully pushed to.
Early on in my sprint, I was able to find an easy way to gain a speed
boost - I found that the query to retrieve a node table's object ID was
being called once per node per object ID, resulting in very large
numbers of read queries (as an example, the Bodhi repo has 40k objects,
so if I had a 3 node cluster, this would result in 80k SELECT
statements, since there will be tables to sync those objects to the
other two nodes). It was relatively easy to refactor the code to
retrieve a group of object IDs per query and get a quick win. I posted
up a pull request with a patch that does this that achieved a 51% boost
on pushing Bodhi into repoSpanner.
After achieving that gain, I attempted to continue down a similar path
as the next significant block seemed to be the code that wrote the data
into that table. However, it quickly became clear that it was a more
significant refactor to alter the writing code to batch insert than it
had been to alter the reading code to batch select. If I was going to
have to do a larger refactor, it became clear that it would be worth
exploring designs that avoid or reduce the use of sqlite. I had reached
a "local minima", so to speak.
I had a few calls with Patrick Uiterwijk, and it turned out that he had
also been thinking about ways to solve this problem, and he was in
favor of removing sqlite from the project. He gave me the background on
why sqlite had been used in the first place, and suggested that we
could create a file backed go chan to achieve similar goals with higher
Last week I put together a prototype of the "file backed chan" that he
and I designed together and I also refactored the repoSpanner code to
use the new chan. This is very much prototype and not at all pull
request worthy code (at the time of writing, it contains a git commit
with the message "Test", if that tells you anything), so please be
forgiving of its messy state, but for those who are curious, you can
see what I've been experimenting with at .
I've found that I am able to push the Bodhi repository into repoSpanner
in about 25 minutes with that patch, where it took about 58 minutes
before. This is approximately a 57% speed improvement, which is a
little bit better than the 51% speed improvement of the other patch.
There is still one remaining use of sqlite - the table that records how
many nodes that each object has been synced to. This is now the largest
bottleneck in repoSpanner push performance and is the next obvious
thing to eliminate. I've talked to Patrick about some ideas around
this, and we are considering eliminating the feature of tracking each
object individually and instead tracking the entire operation - i.e.,
consider a push successful only if all objects made it together to the
same majority of nodes. This is in contrast to today's feature, where
each object is considered individually successfully pushed if it made
it to a majority of nodes - i.e., it allows the objects not to have to
make it to the *same* majority of nodes. If we eliminate that feature,
we no longer have to perform individual tracking of which git objects
made it to which nodes and we can eliminate sqlite entirely. I expect
this will make the most significant difference to the performance of
git push, though it is difficult to estimate how much of a difference
it will make without prototyping it.
Another area that is known to be problematic is the speed of a git
pull. Today repoSpanner builds gitpack files for the repo every time it
is pulled. I haven't done very much profiling here, but Patrick has
suggested caching git pack files to help in this area. I think it's an
area we should focus on improving in the future.
As for the immediate future, I plan to clean up my patches for the
sqlite changes I have been experimenting with this week so I can
propose them in a pull request. They will supercede my existing pull
request, so I plan to close that one. Then I think it will be sensible
to do another prototype/sprint where we explore eliminating sqlite
Thanks for reading, and let me know if you have any ideas or questions!
 As written in , I tested a git push to a new repository, a git
clone, a git push of a new commit, and a git pull of a new commit.
 He wanted to avoid keeping large numbers of objects in memory,
while also allowing users to push objects faster than nodes could
write them. sqlite was an easy way to achieve this, since it
records the data to disk with an easily addressable and well known