Hi,
I thought the notification delay mess was fixed. Apparently, I was wrong.
I just received this:
<snip> Subject: corsepiu pushed 1 commit to rpms/perl-Sub-HandlesVia (rawhide) Date: Sat, 9 Jul 2022 10:39:46 +0000 (GMT) From: notifications@fedoraproject.org ... Notification time stamped 2022-07-01 05:57:40 UTC ... </snip>
Ralf
On Sat, 9 Jul 2022 at 09:25, Ralf Corsépius rc040203@freenet.de wrote:
Hi,
I thought the notification delay mess was fixed. Apparently, I was wrong.
No, I believe the service which is behind these emails is called FMN. It is very fragile for multiple reasons where it falls over for different reasons all the time. It is the reason why it is on the top of being replaced by CPE in this quarter (aka by October-ish). Until that happens, please be aware that these notifications are likely to come in bursts as things go up and down. I would also suggest that turning off as many notifications as you can would help the load as one of the largest email problems Fedora Infrastructure has is the many people who have turned on getting email on a lot of events.
I just received this:
<snip> Subject: corsepiu pushed 1 commit to rpms/perl-Sub-HandlesVia (rawhide) Date: Sat, 9 Jul 2022 10:39:46 +0000 (GMT) From: notifications@fedoraproject.org ... Notification time stamped 2022-07-01 05:57:40 UTC ... </snip>
Ralf _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Am 09.07.22 um 15:36 schrieb Stephen Smoogen:
On Sat, 9 Jul 2022 at 09:25, Ralf Corsépius <rc040203@freenet.de mailto:rc040203@freenet.de> wrote:
Hi, I thought the notification delay mess was fixed. Apparently, I was wrong.
No, I believe the service which is behind these emails is called FMN. It is very fragile for multiple reasons where it falls over for different reasons all the time. It is the reason why it is on the top of being replaced by CPE in this quarter (aka by October-ish). Until that happens, please be aware that these notifications are likely to come in bursts as things go up and down. I would also suggest that turning off as many notifications as you can would help the load as one of the largest email problems Fedora Infrastructure has is the many people who have turned on getting email on a lot of events.
Why don't you turn this stuff off globally and send the guys behind it back to the drawing board?
In its present shape it's just dysfunctional and not helpful at all.
Ralf
On Sat, 9 Jul 2022 at 11:18, Ralf Corsépius rc040203@freenet.de wrote:
Am 09.07.22 um 15:36 schrieb Stephen Smoogen:
On Sat, 9 Jul 2022 at 09:25, Ralf Corsépius <rc040203@freenet.de mailto:rc040203@freenet.de> wrote:
Hi, I thought the notification delay mess was fixed. Apparently, I was wrong.
No, I believe the service which is behind these emails is called FMN. It is very fragile for multiple reasons where it falls over for different reasons all the time. It is the reason why it is on the top of being replaced by CPE in this quarter (aka by October-ish). Until that happens, please be aware that these notifications are likely to come in bursts as things go up and down. I would also suggest that turning off as many notifications as you can would help the load as one of the largest email problems Fedora Infrastructure has is the many people who have turned on getting email on a lot of events.
Why don't you turn this stuff off globally and send the guys behind it back to the drawing board?
In its present shape it's just dysfunctional and not helpful at all.
I do apologize for this. Thank you for your feedback.
On Sat, Jul 09, 2022 at 05:16:55PM +0200, Ralf Corsépius wrote:
Am 09.07.22 um 15:36 schrieb Stephen Smoogen:
On Sat, 9 Jul 2022 at 09:25, Ralf Corsépius <rc040203@freenet.de mailto:rc040203@freenet.de> wrote:
Hi, I thought the notification delay mess was fixed. Apparently, I was wrong.
No, I believe the service which is behind these emails is called FMN. It is very fragile for multiple reasons where it falls over for different reasons all the time. It is the reason why it is on the top of being replaced by CPE in this quarter (aka by October-ish). Until that happens, please be aware that these notifications are likely to come in bursts as things go up and down. I would also suggest that turning off as many notifications as you can would help the load as one of the largest email problems Fedora Infrastructure has is the many people who have turned on getting email on a lot of events.
Why don't you turn this stuff off globally and send the guys behind it back to the drawing board?
In its present shape it's just dysfunctional and not helpful at all.
It is very much self-service, so you can easily turn off the notifications for your account if you wish to: https://apps.fedoraproject.org/notifications
Pierre
On Sat, Jul 09, 2022 at 09:36:14AM -0400, Stephen Smoogen wrote:
On Sat, 9 Jul 2022 at 09:25, Ralf Corsépius rc040203@freenet.de wrote:
Hi,
I thought the notification delay mess was fixed. Apparently, I was wrong.
No, I believe the service which is behind these emails is called FMN. It is very fragile for multiple reasons where it falls over for different reasons all the time. It is the reason why it is on the top of being replaced by CPE in this quarter (aka by October-ish).
Just a quick note, I doubt the app will be re-written and deploy in a single quarter, I expect this to be more a multi-months effort. So while it has the top of the priority list for CPE to work on, it'll still take a little time before we see a v4 (we've already restructured it 3 times).
Pierre
On Mon, 2022-07-11 at 09:35 +0200, Pierre-Yves Chibon wrote:
On Sat, Jul 09, 2022 at 09:36:14AM -0400, Stephen Smoogen wrote:
On Sat, 9 Jul 2022 at 09:25, Ralf Corsépius rc040203@freenet.de wrote:
Hi,
I thought the notification delay mess was fixed. Apparently, I was wrong.
No, I believe the service which is behind these emails is called FMN. It is very fragile for multiple reasons where it falls over for different reasons all the time. It is the reason why it is on the top of being replaced by CPE in this quarter (aka by October-ish).
Just a quick note, I doubt the app will be re-written and deploy in a single quarter, I expect this to be more a multi-months effort. So while it has the top of the priority list for CPE to work on, it'll still take a little time before we see a v4 (we've already restructured it 3 times).
I think you should disable the service or allocate someone to do something that mitigate the problem or reboot the service every time it stops
Pierre _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Mon, 11 Jul 2022 at 06:22, Sérgio Basto sergio@serjux.com wrote:
On Mon, 2022-07-11 at 09:35 +0200, Pierre-Yves Chibon wrote:
On Sat, Jul 09, 2022 at 09:36:14AM -0400, Stephen Smoogen wrote:
On Sat, 9 Jul 2022 at 09:25, Ralf Corsépius rc040203@freenet.de wrote:
Hi,
I thought the notification delay mess was fixed. Apparently, I was wrong.
No, I believe the service which is behind these emails is called FMN. It is very fragile for multiple reasons where it falls over for different reasons all the time. It is the reason why it is on the top of being replaced by CPE in this quarter (aka by October-ish).
Just a quick note, I doubt the app will be re-written and deploy in a single quarter, I expect this to be more a multi-months effort. So while it has the top of the priority list for CPE to work on, it'll still take a little time before we see a v4 (we've already restructured it 3 times).
I think you should disable the service or allocate someone to do something that mitigate the problem or reboot the service every time it stops
Like many services in Fedora, it is intertwined with other tools with a bunch of duct tape and bailing wire that was going to be fixed when things slowed down. You can't just disable it because then you cause other items inside and outside of Fedora to break. You can't just reboot it because the problem is usually some other service which is causing the holdup.
There are days when I regret leaving Fedora Infrastructure, but it is email threads like this which remind me why I needed to.
Do we have any notification alternative interface other than IRC? What about pushing notifications to matrix, XMPP or other protocol/service?
I think IRC has often also not-so-short delays. Is there something we can to to improve the situation?
On 09. 07. 22 15:24, Ralf Corsépius wrote:
Hi,
I thought the notification delay mess was fixed. Apparently, I was wrong.
I just received this:
<snip> Subject: corsepiu pushed 1 commit to rpms/perl-Sub-HandlesVia (rawhide) Date: Sat, 9 Jul 2022 10:39:46 +0000 (GMT) From: notifications@fedoraproject.org ... Notification time stamped 2022-07-01 05:57:40 UTC ... </snip>
Ralf _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Mon, 11 Jul 2022 at 09:36, Petr Menšík pemensik@redhat.com wrote:
Do we have any notification alternative interface other than IRC? What about pushing notifications to matrix, XMPP or other protocol/service?
I think IRC has often also not-so-short delays. Is there something we can to to improve the situation?
At this point the code using FMN is getting rewritten and any additional notifications will need to be in that rewrite. That said, IRC is actually one of the fastest ones we can push to. Each additional service adds in slowdowns as the communication requires accounts, special routers, people who actually use those services to know how they work, and various other infrastructure with the only return being a ton of complaints when there is any problem in them.
Every person who uses notifications is an endpoint who has to be 'polled' by that notification, and there can be a lot of overhead in doing so even when people think of it as a fire-and-forget-bus. Every failed or slow or bad end-point slows down the entire system which grows over time. Email delivery problems from mailing lists are generally caused by the N hundred developers who decided they needed every notification as an email and have clogged up the outgoing queues because every mail system decides that a mass rebuild is a spam attack. The same goes with other systems (IRC, etc) where a large number of notifications do not look much different from a DOS.
While notifications is a great idea, it is also a hard problem. Please help the team who will be working on it.
On 7/11/22 10:05, Stephen Smoogen wrote:
On Mon, 11 Jul 2022 at 09:36, Petr Menšík pemensik@redhat.com wrote:
Do we have any notification alternative interface other than IRC? What about pushing notifications to matrix, XMPP or other protocol/service?
I think IRC has often also not-so-short delays. Is there something we can to to improve the situation?
At this point the code using FMN is getting rewritten and any additional notifications will need to be in that rewrite. That said, IRC is actually one of the fastest ones we can push to. Each additional service adds in slowdowns as the communication requires accounts, special routers, people who actually use those services to know how they work, and various other infrastructure with the only return being a ton of complaints when there is any problem in them.
Every person who uses notifications is an endpoint who has to be 'polled' by that notification, and there can be a lot of overhead in doing so even when people think of it as a fire-and-forget-bus. Every failed or slow or bad end-point slows down the entire system which grows over time. Email delivery problems from mailing lists are generally caused by the N hundred developers who decided they needed every notification as an email and have clogged up the outgoing queues because every mail system decides that a mass rebuild is a spam attack. The same goes with other systems (IRC, etc) where a large number of notifications do not look much different from a DOS.
While notifications is a great idea, it is also a hard problem. Please help the team who will be working on it.
In the case of email, is it possible to offload everything to a mail server like Postfix? That should be able to deal with a huge queue without too many problems and without Fedora needing to write email handling code. Just send the email to the mail server and let the mail server deal with delivery.
For other services, one option would be a language (Erlang? Elixir?) designed for very high levels of concurrency and reliability. That might be a better choice than Python.
On Mon, 11 Jul 2022 at 18:02, Demi Marie Obenour demiobenour@gmail.com wrote:
While notifications is a great idea, it is also a hard problem. Please
help
the team who will be working on it.
In the case of email, is it possible to offload everything to a mail server like Postfix? That should be able to deal with a huge queue without too many problems and without Fedora needing to write email handling code. Just send the email to the mail server and let the mail server deal with delivery.
Sorry I was not clear on the complexity of the system. The FMN does not handle email itself. It just sends it to postfix which then sends it to the outbound email servers (which are also postfix). It is those queues which get backed up on the bastion servers and regularly end up with hundreds of thousands of queued up fmn. It also then queues up every other list, general and other emails as every mail receiver that gets a firehose of emails from us puts us on a 'you have sent too many emails from this ip range, please wait 2 hours to resend.' type queue. This has gotten worse over the years as more and more email addresses have moved to using Google, Microsoft or whoever the third largest email sender is these days. [You then spend a week of getting complaints about no email to some university which decided to move their email to an outsourced email provider and find its because too many other domain mailboxes got full.]
For other services, one option would be a language (Erlang? Elixir?) designed for very high levels of concurrency and reliability. That might be a better choice than Python.
I expect you know the following already, but this list does get read by people who are fresh to software and aren't sure of all the 'shared' caveats experienced coders like you have in your head when you made that suggestion.
Changing languages is expensive. a) people who knew those languages to code. [Having coders swap between languages is fine for some coders but tends to make people double their mistakes as they context switch between 'oooh I am needing to do it this way'] b) people who knew those languages to debug/sysadmin. [Same context switching.] c) volunteers who don't through their hands up when faced with this box is running X and this box is running Y.
There are not a lot of people who have volunteer time to work on infrastructure code; in my 14 years of working with Fedora most of the volunteer code ends up being rewritten/maintained by Red Hat staff because the volunteer got it so far but has to focus on real life again.
There are also not a lot of people who have volunteer time to work in infrastructure. There is a high trust bar to allowing people to touch servers which potentially could alter builds or code. That means doing 'crap' work for a long time until someone feels you can get into a sysadmin cycle which would allow various work to happen.
And there is a very limited budget for paying people to write code for infrastructure. These days it is generally 'cheaper' to outsource various things to some web-company than try to hire the 4-5 people minimal to build a code base which is long term maintainable.
For this reason, the infrastructure group has stuck to a very limited set of languages. Every time we have had something written in other languages, they work great until the original person leaves. At which point every change elsewhere in the infrastructure tends to cause that code to become more and more fragile. We then try to find someone who can fix it.. and then it ends up breaking worse. Eventually, if we have the time/ability it is replaced with a python replacement.
On Mon, 11 Jul 2022 at 15:06, Stephen Smoogen wrote:
That said, IRC is actually one of the fastest ones we can push to.
Is https://apps.fedoraproject.org/notifications/ really still sending notifications to freenode though?
Hasn't everybody moved to libera?
crossposting to infra list to keep folks there in the loop...
On Tue, Jul 12, 2022 at 02:53:24PM +0100, Jonathan Wakely wrote:
On Mon, 11 Jul 2022 at 15:06, Stephen Smoogen wrote:
That said, IRC is actually one of the fastest ones we can push to.
Is https://apps.fedoraproject.org/notifications/ really still sending notifications to freenode though?
Hasn't everybody moved to libera?
yes. It's just incorrect/out of date there.
Let me sum up what I know and perhaps I can point people to this post later. :)
The current state is bad. We know it's bad. We have known for a long time that it's bad. It's bad for all of the following reasons:
* It's running a python2 codebase. Upstream development/PR's have long ago moved to python3, but thats not the version we current have deployed.
* It's heavily tied to the old account system. Several of us spent many hours last week trying to untangle it. I think we might be getting close now, but it's really hard to tell.
* It's pretty heavily tied to fedmsg (not fedora-messaging). fedmsg still works of course, but it's another layer of confusion.
* It's rules/interface lets you do all kinds of cool things, but it's complex and confusing to most everyone that tries to use it.
* It's running on a end of life OS version. ;(
* In order to try to scale it has a number of layers of things which makes it hard to debug. For example it uses redis, it's own rabbitmq instance with a bunch of queues, multiple workers talking to all those things and multiple backends for irc and email. You might think performance shouldn't be a big deal, but when you have thousands of users, each with their own custom rulesets, that means you have to potentially match a incoming message against every single ruleset of every user and you have to do it fast enough to keep up with the incoming flow of messages and the outgoing flow of notifications. :(
Short term, I would like to hope that the python2/current version can catch mostly up. (Its currently 4 days behind). Once it does, I would very much like to try switching to the python3 version. It has a lot of the problems the existing one does, but it should also have some advantages, like running on a supported OS, much easier to release and deploy new versions, etc.
Longer term, we are just now starting an iniative to re-write FMN from the ground up. It's going to be a while until it's ready, but I really hope it will be much better for everyone involved. Better/easier interface, much better handling of messages, etc.
Hope that helps clarify things...
kevin
On Mon, Jul 11, 2022 at 03:36:08PM +0200, Petr Menšík wrote:
Do we have any notification alternative interface other than IRC? What about pushing notifications to matrix, XMPP or other protocol/service?
There were other channels like email, but they do not work, too. I seem to recevive last notification by email in the middle of November 2021.