On Mon, 11 Jul 2022 at 18:02, Demi Marie Obenour <demiobenour@gmail.com> wrote:

> While notifications is a great idea, it is also a hard problem. Please help
> the team who will be working on it.

In the case of email, is it possible to offload everything to a mail
server like Postfix?  That should be able to deal with a huge queue
without too many problems and without Fedora needing to write email
handling code.  Just send the email to the mail server and let the
mail server deal with delivery.


Sorry I was not clear on the complexity of the system. The FMN does not handle email itself. It just sends it to postfix which then sends it to the outbound email servers (which are also postfix). It is those queues which get backed up on the bastion servers and regularly end up with hundreds of thousands of queued up fmn. It also then queues up every other list, general and other emails as every mail receiver that gets a firehose of emails from us puts us on a 'you have sent too many emails from this ip range, please wait 2 hours to resend.' type queue. This has gotten worse over the years as more and more email addresses have moved to using Google, Microsoft or whoever the third largest email sender is these days. [You then spend a week of getting complaints about no email to some university which decided to move their email to an outsourced email provider and find its because too many other domain mailboxes got full.]

 
For other services, one option would be a language
(Erlang? Elixir?) designed for very high levels of concurrency and
reliability.  That might be a better choice than Python.

I expect you know the following already, but this list does get read by people who are fresh to software and aren't sure of all the 'shared' caveats experienced coders like you have in your head when you made that suggestion. 

Changing languages is expensive.
a) people who knew those languages to code. [Having coders swap between languages is fine for some coders but tends to make people double their mistakes as they context switch between 'oooh I am needing to do it this way']
b) people who knew those languages to debug/sysadmin. [Same context switching.]
c) volunteers who don't through their hands up when faced with this box is running X and this box is running Y. 

There are not a lot of people who have volunteer time to work on infrastructure code; in my 14 years of working with Fedora most of the volunteer code ends up being rewritten/maintained by Red Hat staff because the volunteer got it so far but has to focus on real life again. 

There are also not a lot of people who have volunteer time to work in infrastructure. There is a high trust bar to allowing people to touch servers which potentially could alter builds or code. That means doing 'crap' work for a long time until someone feels you can get into a sysadmin cycle which would allow various work to happen.

And there is a very limited budget for paying people to write code for infrastructure. These days it is generally 'cheaper' to outsource various things to some web-company than try to hire the 4-5 people minimal to build a code base which is long term maintainable. 

For this reason, the infrastructure group has stuck to a very limited set of languages. Every time we have had something written in other languages, they work great until the original person leaves. At which point every change elsewhere in the infrastructure tends to cause that code to become more and more fragile. We then try to find someone who can fix it.. and then it ends up breaking worse. Eventually, if we have the time/ability it is replaced with a python replacement. 

--
Stephen Smoogen, Red Hat Automotive
Let us be kind to one another, for most of us are fighting a hard battle. -- Ian MacClaren