Request for Feedback/Resources for Packager Dashboard/Oraculum

Tuesday, 10 November 2020

Hello,

many of you already heard about Fedora Packager Dashboard [0]. In short,
for those who have not, it's a web application and backend aiming to make
the life of Fedora packagers easier. It combines data from multiple sources
(Pagure, Bugzilla, Bodhi, Koschei,...) relevant to the maintainers of
Fedora packages.

Tracking all these sites can be time consuming, especially if you maintain
dozens of different packages, so the Dashboard provides everything a
packager might need (or at least what we've thought of) - condensed,
cached, searchable and filterable on one page.

You can check it out/play with it even if you don't maintain any Fedora
packages, there is no authentication needed, just enter any packager’s
username. Feature-wise, it's pretty robust already, and there are more
things to come (like allowing users to authenticate and see private bugs
and more), but the original planned feature set is complete.

Currently, it's processing only publicly available data. When we'll start
implementing the ability to authenticate and process any private data,
we'll work closely with the Infra team to make sure there are no open holes.

Since the announcement of the testing phase, it has been running on a
temporary server which is barely keeping up, so I'd like to open up a
conversation about migration to the Infra OpenShift cluster.

Here is a brief overview of it's internals (I can elaborate more if anybody
needs me to):

Backend is a Flask/Python application leveraging Celery for planning and
executing the cache refreshes. The API is striving to be as non-blocking as
possible, using asynchronous-inspired behaviour. The client is given the
data currently available in cache, and advised about the completeness
(complete/partial cache misses) via HTTP status code.

The parameters for cache refreshes can be customized, depending on the
resources we have/can get. Currently, it's retrieving data for most of the
items every 2 hours (with exceptions like package versions which run daily
and are terribly slow). Backend is caching data for PRs, bugs, and
pre-calculated updates/overrides data for users visiting the app at least
once in two weeks. The main storage is a PostgreSQL database, and
optionally, if RAM is not an issue, we have a local memory-cached layer
that can be enabled (we find it not necessary ATM).

Apart from storing the pre-crunched information from public sources, we
keep timestamps of the last visit for each user.

Frontend is a React app fetching and displaying data from the backend
(really nothing to add here :) ).

Based on the OpenShift testing, I've come to the following schema of pods:

   -

    Redis pod (Celery backend)
   -

   PostgreSQL pod (cache and watchdog data storage)
   -

   Beat pod (scheduling tasks for data refresh)
   -

   Flower pod (backend monitoring; nice to have, not absolutely necessary)
   -

   Gunicorn/NGINX pod (Oraculum backend)
   -

   NGINX pod (Packager Dashboard front-end).

On top of that, we need a number of worker pods completing the scheduled
Celery tasks.

I am not sure how much RAM each pod in Infra OpenShift has, currently the
workers seem to be performing best with about 512 MB memory limit. Ideally
we’d like to have at least 12-16 Celery workers (more workers can, of
course, run on a single pod).

Resource-wise it's not a small application (at least from my perspective :D
), but we believe it's a great value application which saves time for Red
Hat and community package maintainers.

I'd like to hear your feedback, questions, requests for changes if there is
anything preventing it from deployment in Infra OpenShift
(architecture/code wise). And of course opinions on the feasibility of
moving Oraculum and Packager Dashboard into the Infra OpenShift cluster.

Thanks!

[0] https://packager.fedorainfracloud.org/

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006