On Tue, Nov 10, 2020 at 04:57:20PM +0100, Frantisek Zatloukal wrote:
many of you already heard about Fedora Packager Dashboard . In short,
for those who have not, it's a web application and backend aiming to make
the life of Fedora packagers easier. It combines data from multiple sources
(Pagure, Bugzilla, Bodhi, Koschei,...) relevant to the maintainers of
Tracking all these sites can be time consuming, especially if you maintain
dozens of different packages, so the Dashboard provides everything a
packager might need (or at least what we've thought of) - condensed,
cached, searchable and filterable on one page.
You can check it out/play with it even if you don't maintain any Fedora
packages, there is no authentication needed, just enter any packager’s
username. Feature-wise, it's pretty robust already, and there are more
things to come (like allowing users to authenticate and see private bugs
and more), but the original planned feature set is complete.
Currently, it's processing only publicly available data. When we'll start
implementing the ability to authenticate and process any private data,
we'll work closely with the Infra team to make sure there are no open holes.
Since the announcement of the testing phase, it has been running on a
temporary server which is barely keeping up, so I'd like to open up a
conversation about migration to the Infra OpenShift cluster.
Here is a brief overview of it's internals (I can elaborate more if anybody
needs me to):
Backend is a Flask/Python application leveraging Celery for planning and
executing the cache refreshes. The API is striving to be as non-blocking as
possible, using asynchronous-inspired behaviour. The client is given the
data currently available in cache, and advised about the completeness
(complete/partial cache misses) via HTTP status code.
The parameters for cache refreshes can be customized, depending on the
resources we have/can get. Currently, it's retrieving data for most of the
items every 2 hours (with exceptions like package versions which run daily
and are terribly slow). Backend is caching data for PRs, bugs, and
pre-calculated updates/overrides data for users visiting the app at least
once in two weeks. The main storage is a PostgreSQL database, and
optionally, if RAM is not an issue, we have a local memory-cached layer
that can be enabled (we find it not necessary ATM).
Has there been any thought about using fedora-messaging to update the
cache? ie, sync, then listen for messages and update as you go and only
need to do the full sync on startup?
Apart from storing the pre-crunched information from public sources,
keep timestamps of the last visit for each user.
Frontend is a React app fetching and displaying data from the backend
(really nothing to add here :) ).
Based on the OpenShift testing, I've come to the following schema of pods:
Redis pod (Celery backend)
PostgreSQL pod (cache and watchdog data storage)
In fedora infrastructure we use a external database server (non
openshift). This allows us to do backups nicely and let apps avoid
needing to manage their own db.
Beat pod (scheduling tasks for data refresh)
Flower pod (backend monitoring; nice to have, not absolutely necessary)
Gunicorn/NGINX pod (Oraculum backend)
NGINX pod (Packager Dashboard front-end).
Do you do ssl termination in the nginx pod?
In fedora infra our proxies do the ssl termination (This allows us to
keep wildcard certs only on proxies).
On top of that, we need a number of worker pods completing the
I am not sure how much RAM each pod in Infra OpenShift has, currently the
workers seem to be performing best with about 512 MB memory limit. Ideally
we’d like to have at least 12-16 Celery workers (more workers can, of
course, run on a single pod).
Well, right now... lets see...
We have 5 compute nodes:
693m 17% 8404Mi 35%
381m 9% 16628Mi 69%
2291m 57% 20539Mi 86%
387m 9% 16014Mi 67%
278m 6% 7764Mi 32%
We can add more if needed.
We might get more a sense of needs deploying this in staging...
Resource-wise it's not a small application (at least from my perspective :D
), but we believe it's a great value application which saves time for Red
Hat and community package maintainers.
Yeah, it's pretty awesome. ;)
I'd like to hear your feedback, questions, requests for changes
if there is
anything preventing it from deployment in Infra OpenShift
(architecture/code wise). And of course opinions on the feasibility of
moving Oraculum and Packager Dashboard into the Infra OpenShift cluster.
Thanks for working on this and bringing it up.