First, I noticed we are running the full sync twice right now, at the
[root@mm-backend01 cron.d][PROD]# cat /etc/cron.d/s3.sh
0 0,11 * * * s3-mirror /usr/local/bin/lock-wrapper s3sync /usr/local/bin/s3.sh 2>&1 | /usr/local/bin/nag-once s3.sh 1d 2>
0 0 * * * s3-mirror /usr/local/bin/lock-wrapper s3sync-main /usr/local/bin/s3.sh 2>&1 | /usr/local/bin/nag-once s3.sh 1d
Second, the attached patch changes the sync scripts to:
* do one sync with no --delete and excluding repodata
* do another one with --delete and including repodata
* invalidate the repodata
I adjusted the cron jobs to handle the repodata invalidate (I think).
TODO: only sync when things have changed.
Last week during our weekly meeting we have discuss trying to better
prioritize our work and tickets. In order to do that we are going to try to
use a "yummy vs trouble"  index and use a prioritization matrix  to
order our work.
Yummy representing the added value or benefit of a task and Trouble
representing how much effort it would take to complete the task. Each
property being either small, medium or large.
Starting this week, I ll send a email with a list of 5 tickets from our
backlog , asking opinions about each tickets yummy and trouble level. We
can use this weekly email to ask questions or provide more context. We will
then update these tickets with the outcome of the discussion, in case of
strong disagreement we can use our weekly IRC meeting to make decision.
This will hopefully help us focus on items that provides a high value to
our community and also provide a way for everyone to participate in this
 - https://www.process.st/prioritization-matrix/
title: CPE Weekly status email
tags: CPE Weekly, email
# CPE Weekly: 2020-03-06
The Community Platform Engineering group is the Red Hat team combining
IT and release engineering from Fedora and CentOS. Our goal is to keep
core servers and services running and maintained, build releases, and
other strategic tasks that need more dedicated time than volunteers
For better communication, we will be giving weekly reports to the
CentOS and Fedora communities about the general tasks and work being
done. Also for better communication between our groups we have
created #redhat-cpe on Freenode IRC! Please feel free to catch us
there, a mail has landed on both the CentOS and Fedora devel lists
with context here.
## Fedora Updates
* Fedora Minimal Compose is being worked on currently for F32 beta
### Data Centre Move
* Please start to plan for 2-3 week outage of communishift starting
2020-04-12 to allow for the move
* Due to the data centre move, we cannot get a new box to run odcs-backend
* We are also scoping the work required for 'Minimum Viable Fedora' -
here is the link to the mail as a refresher of what to expect, and
what not https://firstname.lastname@example.org...
Our move dates are as follows:
* Move 1 April 13 to IAD2 (essential hardware)
* Move 2 April 13 to RDU-CC (communishift)
* Move 3 May 11 to IAD2 (QA equipment)
* Move 4 June 1 to IAD2 (anything and everything else)
### AAA Replacement
* Sprint 5 will see the team focusing on integration with FASJSON API
and 2FA token
* We have also decided to postpone testing of the new solution until
post data centre move to make sure it is
* As always, check out our progress on github here
* Monitor-Gating: Blocked in staging because F32 isn’t branched off
there yet (Koji, Bodhi, PDC, https://pagure.io/releng/issue/9293)
* Automatic Release Tags and Changelog: Ongoing Devel-list thread here
* The team have moved to Jitsi video conferencing with interested
community members (ngompa, clime, mboddu, mhroncok) about the
different approaches to automatic release tags. Result: Approach
considering existing EVRs most palatable (over <#commits>.<#builds>).
* We now have a more official looking repo:
* The team also created implementation details and roadmap:
### Sustaining Team
* Old cloud is now officially retired
* Bodhi XSS vulnerability patched
* The team are also looking to prepare a Bodhi 5.2 release
* Fedora Minimal Compose (Use ODCS to trigger test composes)
* The team are also scoping the Mbbox upgrade and Task breakdown
* Support community members helping with Badges outage
* The teams has started a conversation about Infra Ticket prioritization
### Misc Updates/Review Requests
* Initial f32-updates-testing push is fixed
* Fixed perms on f32 ostree to finish updates pushes
* Anitya tests are fixed
* Failing tests in the-new-hotness being resolved
* Please review Packit integration in the-new-hotness
* Please review KeepassXC flatpak issue
* Please review Jms-messaging-plugin reviews
## CentOS Updates
* Ppc64le kickstarter added for CentOS 8
* This will still need to be tested and added in production
* The infrastructure is stable overall though!
### CentOS Stream
* We are now working on the sync-to-git process
* Tycho module was successfully added to Stream in development!
* We are also getting closer to having a contributor workflow model
available for later this year - watch this space!
* We are also working with upstream to generate reports for Stream too
As always, feedback is welcome, and we will continue to look at ways
to improve the delivery and readability of this weekly report.
Have a great weekend!
Community Platform Engineering Team
Red Hat EMEA
Current pungi configs are using kickstarts from master branch, instead
they should use f32 branch.
This should be fixed by:
Look at the last change which is this FBR is totally about.
(Posting to many mailing lists for visibility. I apologize if you see
this more times than you'd like.)
You may have already seen my Community Blog post about changing the
Release Readiness meeting process. The meeting has questionable value
in the current state, so I want to make it more useful. We'll do this
by having teams self-report readiness issues on a dedicated wiki
page beginning now. This gives the community time to chip in and
help with areas that need help without waiting until days before the
I invite teams to identify a representative to keep the wiki page up
to date. Update it as your status changes and I'll post help requests
in my weekly CommBlog posts and the FPgM office hours IRC
meeting. The Release Readiness meeting will be shortened to one hour
and will review open concerns instead of polling for teams that may or
may not be there. We will use the logistics mailing list to discuss
issues and make announcements, so I encourage representatives to join
He / Him / His
Senior Program Manager, Fedora & CentOS Stream
Last night koji alerted due to slowness. It was not backups or anything,
but rather the database hitting the limit I raised in commit c678f73b:
-autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum
+autovacuum_freeze_max_age = 300000000 # maximum XID age before forced vacuum
What this means is basicially: postgres records the xid (transaction id)
that can 'see' other transactions in the table rows. However, xid is a
32 bit value, meaning there can only be about 2.1billion transactions
before it 'wraps around'. When it does so, all the 'old' XID's need to
be gone or it will confuse it. It removes the old xids by marking old
transactions as 'frozen' (so any other transaction should see them).
So, this value tells the autovacuumer to start processing the table for
old xids and frezzing them, so by the time the wrap around happens
everything will be set.
Unfortunately, it's doing this on the buildroot_listing table, which is:
public | buildroot_listing | table | koji | 219 GB |
So, the i/o load is heavy and koji is slow to respond to real requests.
There's (at least) tree things we could do:
1. Bump the autovacuum_freeze_max_age up to 600million. The 100million
bump I did in january gave us about 1.5 months, so if we do 600, we
might last until june, when we will be migrating to the new datacenter.
600million is still a long way from 2.1 billion, so it should be fine.
At that point I hope to move db-koji01 to a rhel8 instance and much
newer postgresql. We could also run the vacuum duing downtime and let it
2. Just let it finish now. Things will be slow, I don't know for how
long. Users will complain and it will take longer for people to get
things done, but at the end we should be in better shape and there's
basically no action we need to take (other than handling complaints)
3. Schedule an outage and take the db offline and run the vacuum. This
might be quicker than letting the autovac finish, I am not sure.
Thoughts? please +1 the freeze break of the option you thing is best, or
feel free to ask more info or suggest other options.
You are kindly invited to the meeting:
Fedora Infrastructure on 2020-03-05 from 15:00:00 to 16:00:00 UTC
The meeting will be about:
Weekly Fedora Infrastructure meeting. See infrastructure list for agenda a day before.