Re: Datacenter move day 2
by Kevin Fenzi
On Wed, Jun 10, 2020 at 04:31:39PM +0200, Dominik 'Rathann' Mierzejewski wrote:
> On Wednesday, 10 June 2020 at 08:24, Kevin Fenzi wrote:
> > Greetings everyone.
> >
> > Just a quick recap of our day 2 datacenter move.
>
> Thanks for all your hard work and for providing these (not so little)
> updates!
You're welcome.
>
> I hope you get some well-deserved rest after all is done.
Me too. ;) I look forward to getting back to helping people get cool
things done in our community.
kevin
3 years, 3 months
Datacenter move day 2
by Kevin Fenzi
Greetings everyone.
Just a quick recap of our day 2 datacenter move.
Today was the buildsystem and all it's assocated things.
* src.fedoraproject.org took longer than anticipated due to some
repeated syncs to make sure we got all the content bit for bit.
It's up now and should be working mostly normally. There's an issue with
fedora-messaging bus that we hope to fix in the morning.
* koji took a long time to move. Almost all of it in a database
dump/restore. We actually did this instead of a replication or the like
because we wanted to get out of the rhel7 postgresql 9.2 to a more
recent version. We are now on rhel8 with postgresql 12.2. This gives us
ability moving forward to do nice things like partitioning the large
tables we have and in general I hope it will be faster. koji is also now
on 1.21.0 release and all the builders (and now the hubs too) are
Fedora 32.
As for builders in the new datacenter, we have 32 buildvm's, 40
buildvm-ppc64's, the same number of s390x (15) and 15 each of
buildvm-a64 (aarch6) and buildvm-a32 (armv7/armhfp). So, likely arm
builds are going to be the last to finish for many. Luckily this only
needs to last us a few weeks until we can get the rest of our arm
builders shipped and re-added.
Our signing infrastructure is up and tested, but we have a few tweaks to
make to autosigning tomorrow morning. So, some builds may wait to be
signed for a bit until thats processing away again.
We have not yet does updates pushes or rawhide compose, it's 11pm now
for me, so I think tomorrow is the better time to try those and clean up
anything we hit. Additionally, adding new packages seems to mostly work,
but it's emitting a error also, so we want to check that in the morning
before processing down the existing queue of requests.
ODCS, mbs, osbs and resultsdb were all moved over, but there's various
issues bringing them up, so we will be doing that tomorrow.
Tomorrow is fixing those things above, and moving mailman and
datagrepper. After today we are on the downhill side I think.
Thanks everyone for being patient while we get everything back to fully
working.
kevin
3 years, 3 months
Day one of the datacenter service migration
by Kevin Fenzi
Greetings everyone.
I thought I'd share with everyone how things went today and where we are
at on the datacenter service migration. :)
We did get everything we planns to migrated today:
* staging services shutdown and machines with those resources readied to
be shipped out.
* fedora-messaging/fedmsg buses/clusters. We ran into a brief hiccup
here as I hadn't properly issued the rabbitmq cluster certs with an
alternative name for 'rabbitmq.fedoraproject.org', but we found it and
fixed it. All consumers and producers should be connecting to the new
rabbitmq cluster in the new datacenter now.
* notifications service. Since this service is brittle and ancient we
elected to just copy the vm over and adjust its settings for the new dc
after. I am not 100% sure it's functioning properly yet (as it takes a
while to start up), but it seems to be close.
* pdc. This one was much longer than I expected, and I am sorry about
that as it prevents people from committing. It turns out the pdc
database is pretty gigantic, so it took a few hours to dump it out and
load it in the new db server. ;( It's up and working now.
* mirrormanager. There's some stats reporting and a few crons that may
need twekaing, but I think the basic service is working.
* Authenitcation stack (ipa, fas, ipsilon). I ran into a few snags here
with routing and vpns, but everything should be moved over and working
normally.
On the unplanned side it turned out to be more complex than I had though
to just move some of our openshift apps and not the others. Because of
that I made the decision to just move all the (user facing) ones today.
That includes: fas, ipsilon, bodhi, compose-tracker, elections,
greenwave, waiverdb, mdapi and a few more non user facing ones.
The openshift apps move caused a outage for the elections app (both a
short one while it was entirely down, and another short time when
authentication wasn't yet working), and additionally when bodhi was
moved it was inadvertently restarted before it's database was synced
over, so if you say a bunch of bodhi actions today where it said it was
pushing things to stable that were already stable, that was the cause.
;( I quicked stopped the app and resynced the db, and hopefully not much
damage was done.
Tomorrow is the big day for koji and all it's assosicated services.
This work will start at 15UTC, so if you have any builds to do, make
sure they finish before then. Any in progress builds will be canceled at
around 15UTC, you can resubmit them once things are back up.
Thanks again for everyones patience with this move and hopefully we will
survive the week. :)
kevin
3 years, 3 months
CPE Weekly: 2020-06-07
by Aoife Moloney
---
title: CPE Weekly status email
tags: CPE Weekly, email
---
# CPE Weekly: 2020-06-07
Background:
The Community Platform Engineering group is the Red Hat team combining
IT and release engineering from Fedora and CentOS. Our goal is to keep
core servers and services running and maintained, build releases, and
other strategic tasks that need more dedicated time than volunteers
can give.
See our wiki page here for more
information:https://docs.fedoraproject.org/en-US/cpe/
## General Project Updates
Please check out our updated initiative timetable for briefing in new
projects to our team
here:https://docs.fedoraproject.org/en-US/cpe/time_tables/
*Note: Initiatives are large pieces of work that require a team of
people and weeks/months to complete. Please continue to open tickets
in the normal way for bugs, issues, etc.
Dont forget to view our taiga board to see the projects we are
currently working on, what we have scoped and whats in our backlog
https://tree.taiga.io/project/amoloney1-cpe-team-projects/kanban?epic=null
CPE Product Owner Office Hours: Thursdays @ 1300 UTC on #fedora-meeting-1
## Fedora Updates
### Data Centre Move
* A reduced services offering of Fedora will begin tomorrow, June 8th
until July 28th, est.
* This is to complete the final shipment of hardware from Phoenix to
Washington, so please be patient and understanding during this
timeframe as some services will be off and the rest, much slower.
* Please read the below email sent by kfenzi if you have not already
done so: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedora...
* Details on what this move may mean for you can be found here
https://lists.fedoraproject.org/archives/list/devel-announce@lists.fedora...
* If an application is not working correctly at all, please check this
list https://hackmd.io/hpYYJQRjQy-oHxUS7IonIA?view before opening a
ticket to make sure its not listed as being moved. If it is being
moved, please wait a day or two, then try again.
* Similarly, please be patient when opening tickets for service issues
in general as we have now reached the critical point in this move and
all of our sys-admins and wider teams will be assisting in the
successful bringup of the reduced Fedora service and facilitation of
the final hardware shipment and move.
### AAA Replacement
* The team are working on implementing an aspect of the service that
will allow users to select their applicable contributor agreement(s)
as we are merging both Fedora and CentOS authentication system under
the Noggin' solution.
* The team have also added a blog pref to the feature set in Noggin
* They have added pagination to the solution
* And are working through redirecting applications to interface with the new API
* Please feel free to check out the team kanban board for more
information on the features the team are working on and have already
completed here https://github.com/orgs/fedora-infra/projects/6
### Mbbox
* Project Dashboard here https://github.com/fedora-infra/mbbox/projects/1
* Sprint 4 is underway with the following work being addressed:
* Kojira CRD & Documentation completed
* MBS Backend CRD is almost done
* And the team are now waiting on a staging environment to deploy
and test in. We hope to have this in place by the end of this week
### Gitforge
Good discussion on the CPE PO Office Hours meeting this Thursday, 4th
June around the possibility of scheduling an AMA session/technical
panel session with some folks from GitLab to allow the Fedora
community a direct line to discuss services, potential blockers, etc.
General feedback is that this will be welcome so I will work with
GitLab and cverna, who is still leading the technical side of the
project (link to ticket below) to plan the best time for this session
& work with you all to set an agenda/discussion points for the
meeting. NOTE: Due to the data centre move, this session will not
happen before late August or early September at a minimum. Thank you
for your engagement with us on this!
Link to public issue ticket:
https://gitlab.com/gitlab-org/gitlab/-/issues/217350
Meeting minutes log
https://meetbot.fedoraproject.org/fedora-meeting-1/2020-06-04/cpe_po_offi...
## CentOS Updates
### CentOS
* CentOS Linux 8.2.2004 RC composes are with QA
### CentOS Stream
* The team resolved all of the 8.2 build failures in CentOS Stream in
their recent sprint (May 21st - June 5th)
* An issue where a SIG was unable to push new content to
git.centos.org was resolved, and is currently in staging
* The team are also investigating how best to separate CentOS Linux
branding from CentOS Stream
* We are also focusing on building more packages in Stream for our
current sprint (June 5th - 19th) and working on more automation and
variance to help improve the current process and time it takes for our
team to build packages
As always, feedback is welcome, and we will continue to look at ways
to improve the delivery and readability of this weekly report.
Have a great week ahead!
Aoife
Source: https://hackmd.io/8iV7PilARSG68Tqv8CzKOQ?view
--
Aoife Moloney
Product Owner
Community Platform Engineering Team
Red Hat EMEA
Communications House
Cork Road
Waterford
3 years, 3 months
Upcoming fedoraproject Datacenter move reminder and plans
by Kevin Fenzi
Greetings.
As previously announced, fedoraproject is moving many of it's servers
from one datacenter (phx2 near phoenix, arizona, usa) to another (iad2:
near arlington, virginia, usa).
As we move from the old datacenter to the new, we will have a temporary
reduction in capacity. The new datacenter has a smaller, less-redundant,
lower-capacity version of our infrastructure. Over the next two weeks,
we will migrate services to it so that we can finish moving out of the
old datacenter.
After everything is moved from the old datacenter, many of the servers
there will be shipped to the new datacenter and then re-added to bring
us back to full redundency and capacity.
Out detailed checklist for these migrations is available at
https://hackmd.io/@fedorainfra2020/rJpsA4FLL
To summarize what we are moving when:
2020-06-03 wed: The fedoraproject master mirrors will move to IAD2. A
very small outage may be noticed as dns changes. There may be some
mirroring slowdowns as we work out bugs.
2020-06-04 thu: Our internal ansible control host and the fedoraproject
wiki will move. The wiki will be down for a few hours.
2020-06-05 fri: Our meeting minutes archive
(https://meetbot.fedoraproject.org) and our freenode irc bot (zodbot).
These two services will see a hour outage or less.
2020-06-07 sun: We will pause for the next week adding new packages and
unretiring packages to avoid problems.
2020-06-08 mon: Our fedora-messaging bus and gateways to it
(github2fedmsg, bugzilla2fedmsg), mirrormanager, product definition
center (pdc), and our identity and authentication systems. Messages over our
message bus may be slow or missing and users may be unable to login at
various times as we migrate services over.
Additionally, we will be stopping services that will not be back until
later in the month.
These include:
* Fedocal
* Badges
* Nuancier
* koschei
* simple-koji-ci
* All staging services (*.stg.fedoraproject.org)
2020-06-09 tue: The build and packaging ecosystem. This includes koji,
src.fedoraproject.org, osbs, odcs, container registries, bodhi (updates
system). During this day maintainers should avoid builds/updates if at
all possible as they may or may not work at various times.
2020-06-10 wed: Various small apps (mdapi, anitya, waiverdb, greenwave,
etc), mailman/lists.fedoraproject.org, and our datagrepper/datanommer
services. Mailing lists will be down for several hours as data is
migrated. Datagrepper will be down for most of the day as it's database
is moved. Other services will be down for short amounts of time while
they are moved.
2020-06-11 thu: Various small site building apps (docs building, fedora
websites building, reviewstats, blockerbugs) and elections will be
moved. elections will be up until the currently running elections
complete. (GO VOTE! https://elections.fedoraproject.org)
2020-06-12 fri: Catch up and fix issues day, along with re-enabling
package unretirements/new packages, and other 'paused' items.
The week after this servers will be shipped and the week after that we
expect to start setting them up and getting them re-added. During this
time, we may have to make further changes to what services are available
in order to deal with load changes.
If you have any questions or concerns, please file an infrastructure
ticket ( https://pagure.io/fedora-infrastructure) or come talk to us in
#fedora-admin on irc.freenode.net.
Finally, I'd like to ask everyone to be patient as we do this move. I
know that it's painful when you are unable to contibute something when
you have time to do so, but rest assured that we are trying to migrate
things as quickly and smoothly as we can.
Thanks.
kevin
3 years, 4 months
Updating nauncier to work with Python 3
by Jonathan Trossbach
Hi everyone!
I am a first time contributor to the Fedora project. As my first contribution, I am going to try and update nauncier to work with Python 3.
I started this thread for two reasons: (1) to make sure that nobody has updated it already and (2) to ask questions should I get stuck somewhere.
If you are aware of anyone doing this work already please let me know by responding here or if you have any suggestion on how to complete this work more efficiently also respond here.
Thank you,
Jon Trossbach
3 years, 4 months
Updating fedocal to work with Python 3
by Jonathan Trossbach
Hi everyone!
My name is Jon Trossbach and I am a first time contributor to the Fedora project. My first goal is to get fedocal working with Python 3.
I am currently trying to use this set of Ansible playbooks (https://pagure.io/fedora-infra/ansible) to deploy fedocal in a CentOS 7 virtual machine using VirtualBox and Vagrant but I can't get the playbook to finish without running into some errors. So I guess my first question is: am I using these Ansible playbooks for their intended purpose in trying to set up an individual development environment for fedocal? Or should I just follow the README in its github repository (https://github.com/fedora-infra/fedocal)? I've attempted both ways of setting up a development environment and can't seem to get either to work.
Once I know which way I should set up my development environment for fedocal I can share the spot where I'm getting stuck using that method. Thanks for your time in reading this.
sincerely,
Jon Trossbach
3 years, 4 months