External access to the AMQP broker

FBR to add fedora-30 key to bodhi...

Bodhi stakeholder's meeting

Aurelien Bompard

Wednesday, 27 February 2019 Wed, 27 Feb '19

3:49 a.m.

Hey y'all, Fedora Messaging, the replacement for fedmsg, is using AMQP and thus a message broker. The current clusters we have deployed in staging and prod are only accessible from inside our infrastructure. There are two needs for an externally accessible broker: - the CentOS folks, who are outside of our infrastructure, would like to send messages - people from the community would like to subscribe to messages and do things based on them We have several options to make that happen. 1. Use our existing cluster and expose it to the world The advantage is we don't maintain another cluster, but the downside is in the case of a DoS attack we're directly affected. With RabbitMQ 3.7 there are some limits[0] you can set on vhosts (max connections and max queues), but we're not yet on 3.7. [0] https://www.rabbitmq.com/vhosts.html#limits 2. Use a separate cluster and copy messages over We could deploy a separate cluster that would get a copy of all messages, and would be more limited in resources. It truly isolates infrastructure, so it's better protected against DoS, but it's more work for sysadmins. In both cases, there are several paths we can take as regards to authentication. A: make a single readonly account for everybody in the community to use, and a few read-write accounts (with X509 certs) for people who need to publish, ie CentOS CI. If we choose a separate broker we can copy those messages back to the main cluster. The issue here is that everybody in the community will be using the same account, so it's harder to shut down bad actors. It would also be theoretically possible for someone to consume from somebody else's queue (unless people make sure they use UUIDs in queues, I think we can enforce that but it way have side effects). However, it enables the same kind of usage that fedmsg provided before. B: require authentication with username & password but make it easy to get accounts. People could require accounts via tickets for example. It will make it much harder to abuse the service, and we could easily shut down bad actors. However it's an obviously heavier load on the people who will handle the tickets and create the accounts. My personal preference would be option 2A, so an external broker with an anonymous read-only account, but all combinations of options inflict different loads on the sysadmin (on deployment and in the longer term), so I think it's really up to them. What do you guys think? Thanks Aurélien

Show replies by date

Clement Verna

Wednesday, 27 February Wed, 27 Feb

4:16 a.m.

On Wed, 27 Feb 2019 at 10:57, Aurelien Bompard <abompard(a)fedoraproject.org> wrote:

...

What would it take to get 3.7 ? Could we react (shuting down the queues ? ) to DoS early with monitoring ? What would it take for someone to DoS our cluster, basically how easy would it be ?

...

2. Use a separate cluster and copy messages over We could deploy a separate cluster that would get a copy of all messages, and would be more limited in resources. It truly isolates infrastructure, so it's better protected against DoS, but it's more work for sysadmins. In both cases, there are several paths we can take as regards to authentication. A: make a single readonly account for everybody in the community to use, and a few read-write accounts (with X509 certs) for people who need to publish, ie CentOS CI. If we choose a separate broker we can copy those messages back to the main cluster. The issue here is that everybody in the community will be using the same account, so it's harder to shut down bad actors. It would also be theoretically possible for someone to consume from somebody else's queue (unless people make sure they use UUIDs in queues, I think we can enforce that but it way have side effects). However, it enables the same kind of usage that fedmsg provided before. B: require authentication with username & password but make it easy to get accounts. People could require accounts via tickets for example. It will make it much harder to abuse the service, and we could easily shut down bad actors. However it's an obviously heavier load on the people who will handle the tickets and create the accounts. My personal preference would be option 2A, so an external broker with an anonymous read-only account, but all combinations of options inflict different loads on the sysadmin (on deployment and in the longer term), so I think it's really up to them. What do you guys think?

I am generally not a fan of creating more work for hypothetical DoS, I my opinion we should try to mitigate as much as possible to effect of a possible DoS . Just my 0.02 $ since I have very little knowledge about RabbitMQ ;-) > > Thanks > Aurélien > _______________________________________________ > infrastructure mailing list -- infrastructure(a)lists.fedoraproject.org > To unsubscribe send an email to infrastructure-leave(a)lists.fedoraproject.org > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedora...

Stephen John Smoogen

6:24 a.m.

On Wed, 27 Feb 2019 at 05:17, Clement Verna <cverna(a)fedoraproject.org> wrote:

...

On Wed, 27 Feb 2019 at 10:57, Aurelien Bompard <abompard(a)fedoraproject.org> wrote: > > Hey y'all, > > Fedora Messaging, the replacement for fedmsg, is using AMQP and thus a > message broker. The current clusters we have deployed in staging and > prod are only accessible from inside our infrastructure. > There are two needs for an externally accessible broker: > - the CentOS folks, who are outside of our infrastructure, would like > to send messages > - people from the community would like to subscribe to messages and do > things based on them > > We have several options to make that happen. > > 1. Use our existing cluster and expose it to the world > The advantage is we don't maintain another cluster, but the downside > is in the case of a DoS attack we're directly affected. With RabbitMQ > 3.7 there are some limits[0] you can set on vhosts (max connections > and max queues), but we're not yet on 3.7. > [0] https://www.rabbitmq.com/vhosts.html#limits What would it take to get 3.7 ? Could we react (shuting down the queues ? ) to DoS early with monitoring ? What would it take for someone to DoS our cluster, basically how easy would it be ?

I expect that even with 3.7 the limits just make it so we don't shut down but find anything relying on messaging aren't working. So if gating, bodhi, pkgs relies on messages from fedora-messaging ot do tasks.. we are basically not working until the DoS is over.

...

> > 2. Use a separate cluster and copy messages over > We could deploy a separate cluster that would get a copy of all > messages, and would be more limited in resources. It truly isolates > infrastructure, so it's better protected against DoS, but it's more > work for sysadmins. > > In both cases, there are several paths we can take as regards to authentication. > > A: make a single readonly account for everybody in the community to > use, and a few read-write accounts (with X509 certs) for people who > need to publish, ie CentOS CI. If we choose a separate broker we can > copy those messages back to the main cluster. > The issue here is that everybody in the community will be using the > same account, so it's harder to shut down bad actors. It would also be > theoretically possible for someone to consume from somebody else's > queue (unless people make sure they use UUIDs in queues, I think we > can enforce that but it way have side effects). > However, it enables the same kind of usage that fedmsg provided before. > > B: require authentication with username & password but make it easy to > get accounts. People could require accounts via tickets for example. > It will make it much harder to abuse the service, and we could easily > shut down bad actors. However it's an obviously heavier load on the > people who will handle the tickets and create the accounts. > > My personal preference would be option 2A, so an external broker with > an anonymous read-only account, but all combinations of options > inflict different loads on the sysadmin (on deployment and in the > longer term), so I think it's really up to them. > > What do you guys think? I am generally not a fan of creating more work for hypothetical DoS, I my opinion we should try to mitigate as much as possible to effect of a possible DoS .

I can say that there are quite a few people out there who look for someone uttering "hypothetical DoS" to prove to them one will exist. So now that you have done so.. we should assume we will have one and plan on how to deal with.

...

Just my 0.02 $ since I have very little knowledge about RabbitMQ ;-) > > Thanks > Aurélien > _______________________________________________ > infrastructure mailing list -- infrastructure(a)lists.fedoraproject.org > To unsubscribe send an email to infrastructure-leave(a)lists.fedoraproject.org > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedora... _______________________________________________ infrastructure mailing list -- infrastructure(a)lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave(a)lists.fedoraproject.org Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedora...

-- Stephen J Smoogen.

Clement Verna

6:56 a.m.

On Wed, 27 Feb 2019 at 13:32, Stephen John Smoogen <smooge(a)gmail.com> wrote:

...

On Wed, 27 Feb 2019 at 05:17, Clement Verna <cverna(a)fedoraproject.org> wrote: > > On Wed, 27 Feb 2019 at 10:57, Aurelien Bompard > <abompard(a)fedoraproject.org> wrote: > > > > Hey y'all, > > > > Fedora Messaging, the replacement for fedmsg, is using AMQP and thus a > > message broker. The current clusters we have deployed in staging and > > prod are only accessible from inside our infrastructure. > > There are two needs for an externally accessible broker: > > - the CentOS folks, who are outside of our infrastructure, would like > > to send messages > > - people from the community would like to subscribe to messages and do > > things based on them > > > > We have several options to make that happen. > > > > 1. Use our existing cluster and expose it to the world > > The advantage is we don't maintain another cluster, but the downside > > is in the case of a DoS attack we're directly affected. With RabbitMQ > > 3.7 there are some limits[0] you can set on vhosts (max connections > > and max queues), but we're not yet on 3.7. > > [0] https://www.rabbitmq.com/vhosts.html#limits > > What would it take to get 3.7 ? Could we react (shuting down the > queues ? ) to DoS early with monitoring ? What would it take for > someone to DoS our cluster, basically how easy would it be ? > I expect that even with 3.7 the limits just make it so we don't shut down but find anything relying on messaging aren't working. So if gating, bodhi, pkgs relies on messages from fedora-messaging ot do tasks.. we are basically not working until the DoS is over. > > > > 2. Use a separate cluster and copy messages over > > We could deploy a separate cluster that would get a copy of all > > messages, and would be more limited in resources. It truly isolates > > infrastructure, so it's better protected against DoS, but it's more > > work for sysadmins. > > > > In both cases, there are several paths we can take as regards to authentication. > > > > A: make a single readonly account for everybody in the community to > > use, and a few read-write accounts (with X509 certs) for people who > > need to publish, ie CentOS CI. If we choose a separate broker we can > > copy those messages back to the main cluster. > > The issue here is that everybody in the community will be using the > > same account, so it's harder to shut down bad actors. It would also be > > theoretically possible for someone to consume from somebody else's > > queue (unless people make sure they use UUIDs in queues, I think we > > can enforce that but it way have side effects). > > However, it enables the same kind of usage that fedmsg provided before. > > > > B: require authentication with username & password but make it easy to > > get accounts. People could require accounts via tickets for example. > > It will make it much harder to abuse the service, and we could easily > > shut down bad actors. However it's an obviously heavier load on the > > people who will handle the tickets and create the accounts. > > > > My personal preference would be option 2A, so an external broker with > > an anonymous read-only account, but all combinations of options > > inflict different loads on the sysadmin (on deployment and in the > > longer term), so I think it's really up to them. > > > > What do you guys think? > > I am generally not a fan of creating more work for hypothetical DoS, I > my opinion we should try to mitigate as much as possible to effect of > a possible DoS . > I can say that there are quite a few people out there who look for someone uttering "hypothetical DoS" to prove to them one will exist. So now that you have done so.. we should assume we will have one and plan on how to deal with.

I am all for assuming this will happen but I am also all for considering what would be the impact and how we could mitigate it. For example how easy would it be to turn off the possibility for external publisher to flood the broker ? Could this be automated from a nagios alert ? Can we configure the queues that are critical to have higher priority to the external ones ? If we have on public broker with authentication can we easily kill the accounts that are flooding us ? What are the consequences of the service been down ? What is an acceptable down time 1 min, 1h , 1 day , 1 week, 1 month ? I have the feeling that answering these questions would help in taking an informed decision. > > > Just my 0.02 $ since I have very little knowledge about RabbitMQ ;-) > > > > > > > > Thanks > > > Aurélien > > > _______________________________________________ > > > infrastructure mailing list -- infrastructure(a)lists.fedoraproject.org > > > To unsubscribe send an email to infrastructure-leave(a)lists.fedoraproject.org > > > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html > > > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > > > List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedora... > > _______________________________________________ > > infrastructure mailing list -- infrastructure(a)lists.fedoraproject.org > > To unsubscribe send an email to infrastructure-leave(a)lists.fedoraproject.org > > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html > > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > > List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedora... > > > > -- > Stephen J Smoogen. > _______________________________________________ > infrastructure mailing list -- infrastructure(a)lists.fedoraproject.org > To unsubscribe send an email to infrastructure-leave(a)lists.fedoraproject.org > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedora...

Stephen John Smoogen

7:58 a.m.

On Wed, 27 Feb 2019 at 07:57, Clement Verna <cverna(a)fedoraproject.org> wrote:

...

> > I can say that there are quite a few people out there who look for > > someone uttering "hypothetical DoS" to prove to them one will exist. > > So now that you have done so.. we should assume we will have one and > > plan on how to deal with.

...

> I am all for assuming this will happen but I am also all for > considering what would be the impact and how we could mitigate it. For > example how easy would it be to turn off the possibility for external > publisher to flood the broker ? Could this be automated from a nagios > alert ? Can we configure the queues that are critical to have higher Usually monitoring is only going to see it way after it has started. It also may also be on the bus with the new monitoring consuming and deploying things there [because everyone loves buses.] There are > priority to the external ones ? If we have on public broker with > authentication can we easily kill the accounts that are flooding us ? > What are the consequences of the service been down ? What is an > acceptable down time 1 min, 1h , 1 day , 1 week, 1 month ?

...

These days.. people seem to think 1 minute downtime is way too long for anything we do. With the faster artifact creation, we are probably wanting to make sure we are 24x7x52 with large amounts of bus traffic. I think in analogies so here is how I am looking at it. We have cities of data and we have buses running through the city. One design has us having every bus go to every city in the world just in case there is a passenger which might want to get on. And if someone decides to send a million busses into our city there isn't anything we can do. The same with the other cities who we are asking to join our city transit system. [This may sound incredibly daft, but that has sort of happened in the past with early transit systems.. when anyone could put a train on the early train systems... people did.. Another time someone saw that the rules said a bus system was supposed to arrive at every stop so they had to send busses around places to even other towns which delayed things. Modern city congestion is mostly everyone having a car and dropping it on the road because hey eveyrone else is and no one is going to notice my car.] Smart transit designs assume that you want to control traffic for various reasons at certain points. It can be everything from security to capacity to latency. I would prefer to have a train system between us and CentOS and Brno and India versus finding out that we have timeouts somewhere because a service assumed local bus and the response from PHX2 to Pune was too long. Does that make sense? > I have the feeling that answering these questions would help in taking > an informed decision.

...

> > > Just my 0.02 $ since I have very little knowledge about RabbitMQ ;-) > >

...

> > >

...

> > > > Thanks > > > > Aurélien > > > > _______________________________________________ > > > > infrastructure mailing list -- infrastructure(a)lists.fedoraproject.org > > > > To unsubscribe send an email to infrastructure-leave(a)lists.fedoraproject.org > > > > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html > > > > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > > > > List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedora... > > > _______________________________________________ > > > infrastructure mailing list -- infrastructure(a)lists.fedoraproject.org > > > To unsubscribe send an email to infrastructure-leave(a)lists.fedoraproject.org > > > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html > > > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > > > List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedora... >

...

> > -- > > Stephen J Smoogen. > > _______________________________________________ > > infrastructure mailing list -- infrastructure(a)lists.fedoraproject.org > > To unsubscribe send an email to infrastructure-leave(a)lists.fedoraproject.org > > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html > > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > > List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedora... > _______________________________________________ > infrastructure mailing list -- infrastructure(a)lists.fedoraproject.org > To unsubscribe send an email to infrastructure-leave(a)lists.fedoraproject.org > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedora... -- Stephen J Smoogen.

Kevin Fenzi

Wednesday, 6 March Wed, 6 Mar

2:46 p.m.

So, we talked about this today and came up with a tenative first plan: * We are going to open 2 ports on our proxy01/10 (the ones in phx2). * We are going to have haproxy forward in connections on those to the rabbitmq cluster. * One of them ( 5671 ) the normal rabbitmq port, will be used for read-only access. * The other will be used for applications that need to send messages as well as receive them. * We will configure rabbimq to use a different vhost with different settings for the readonly people. Additionally if there's issues with that, we can redirect haproxy to point to somewhere else down the road. Hopefully that works for everyone. kevin

Aurelien Bompard

Wednesday, 27 February Wed, 27 Feb

11:18 a.m.

I'm assuming you're considering the solution where we have a single broker and we make it publicly accessible (option 1).

...

how easy would it be to turn off the possibility for external publisher to flood the broker ?

External clients won't publish anything, they'll be read-only (with a few exceptions like the CentOS CI folks). However they can create a huge amount of queues, subscribe to everything and never consume anything. We can mitigate that by setting up another vhost (in the cluster we already have) for external clients, limit the number of queues on that vhost, and enforce a time to live on messages in the queues. It'll require some fine tuning, though, and external clients will still be able to DoS other external clients if we don't do authentication (option A). I value option 2 (separate broker) higher than option 1 (same broker) because I'm not entirely sure those limits can prevent any kind of DoS on the broker. Attackers are creative. It's easier to make sure the resources used by a 2nd cluster don't impact the resources of the 1st cluster.

...

Can we configure the queues that are critical to have higher priority to the external ones ?

Yes, by using a different vhost for internal (and CentOS) stuff and external stuff, and replicating messages from the internal to the external vhost.

...

If we have on public broker with authentication can we easily kill the accounts that are flooding us ?

Yes, that's the main advantage of option B.

...

What are the consequences of the service been down ? What is an acceptable down time 1 min, 1h , 1 day , 1 week, 1 month ?

I would say that the internal messaging service needs a high availability, while the SLA for the external service can be lower. That's also a reason for me prefering option 2. I hope that clarifies a bit. Aurélien

Clement Verna

1:10 p.m.

On Wed, 27 Feb 2019 at 18:27, Aurelien Bompard <abompard(a)fedoraproject.org> wrote:

...

I'm assuming you're considering the solution where we have a single broker and we make it publicly accessible (option 1). > how easy would it be to turn off the possibility for external > publisher to flood the broker ? External clients won't publish anything, they'll be read-only (with a few exceptions like the CentOS CI folks). However they can create a huge amount of queues, subscribe to everything and never consume anything. We can mitigate that by setting up another vhost (in the cluster we already have) for external clients, limit the number of queues on that vhost, and enforce a time to live on messages in the queues. It'll require some fine tuning, though, and external clients will still be able to DoS other external clients if we don't do authentication (option A).

...

I value option 2 (separate broker) higher than option 1 (same broker) because I'm not entirely sure those limits can prevent any kind of DoS on the broker. Attackers are creative. It's easier to make sure the resources used by a 2nd cluster don't impact the resources of the 1st cluster. > Can we configure the queues that are critical to have higher > priority to the external ones ? Yes, by using a different vhost for internal (and CentOS) stuff and external stuff, and replicating messages from the internal to the external vhost. > If we have on public broker with authentication can we easily kill the accounts that are flooding us ? Yes, that's the main advantage of option B. > What are the consequences of the service been down ? What is an > acceptable down time 1 min, 1h , 1 day , 1 week, 1 month ? I would say that the internal messaging service needs a high availability, while the SLA for the external service can be lower. That's also a reason for me prefering option 2. I hope that clarifies a bit.

Yes it does thanks for the answers :-), my overall feeling is that the risk of DoS should be one of the factor we take into account to make the decision but we should also consider how easy is it to use, how easy is it to maintain, how much effort is it to setup. I feel that are focusing on the risk of DoS as the main factor to favour one option against the other and I am not sure this is right but that 's my personal feeling and I am happy to be wrong on that. On the SLA I really think that in the case of DoS attack we would not have much trouble communicating with the community that we are facing an attack and the service will be down or deprecated for X days. Overall I think we should start being OK with taking the risk to have our services down for multiple hours, days, ... if that allow us to save on the daily maintenance burden. Again just my 2 cents on the subject, so feel free to ignore it :-) > > Aurélien > _______________________________________________ > infrastructure mailing list -- infrastructure(a)lists.fedoraproject.org > To unsubscribe send an email to infrastructure-leave(a)lists.fedoraproject.org > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedora...

Aurelien Bompard

Thursday, 28 February Thu, 28 Feb

4:18 a.m.

...

my overall feeling is that the risk of DoS should be one of the factor we take into account to make the decision but we should also consider how easy is it to use, how easy is it to maintain, how much effort is it to setup.

I agree, and since both burdens (daily maintenance and dealing with DoS) are going to fall on the shoulders of the sysadmins, that's why I'd rather let them order the evils and choose the lesser ;-) Aurélien

Clement Verna

4:49 a.m.

On Thu, 28 Feb 2019 at 11:26, Aurelien Bompard <abompard(a)fedoraproject.org> wrote:

...

> my overall feeling is that the > risk of DoS should be one of the factor we take into account to make > the decision but we should also consider how easy is it to use, how > easy is it to maintain, how much effort is it to setup. I agree, and since both burdens (daily maintenance and dealing with DoS) are going to fall on the shoulders of the sysadmins, that's why I'd rather let them order the evils and choose the lesser ;-)

Sure that makes sense ;-) > > Aurélien > _______________________________________________ > infrastructure mailing list -- infrastructure(a)lists.fedoraproject.org > To unsubscribe send an email to infrastructure-leave(a)lists.fedoraproject.org > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedora...

Jeremy Cline

9:04 a.m.

On 2/27/19 2:10 PM, Clement Verna wrote:

...

On Wed, 27 Feb 2019 at 18:27, Aurelien Bompard <abompard(a)fedoraproject.org> wrote: > > I'm assuming you're considering the solution where we have a single > broker and we make it publicly accessible (option 1). > >> how easy would it be to turn off the possibility for external >> publisher to flood the broker ? > > External clients won't publish anything, they'll be read-only (with a > few exceptions like the CentOS CI folks). However they can create a > huge amount of queues, subscribe to everything and never consume > anything. > We can mitigate that by setting up another vhost (in the cluster we > already have) for external clients, limit the number of queues on that > vhost, and enforce a time to live on messages in the queues. It'll > require some fine tuning, though, and external clients will still be > able to DoS other external clients if we don't do authentication > (option A). > I value option 2 (separate broker) higher than option 1 (same broker) > because I'm not entirely sure those limits can prevent any kind of DoS > on the broker. Attackers are creative. It's easier to make sure the > resources used by a 2nd cluster don't impact the resources of the 1st > cluster. > >> Can we configure the queues that are critical to have higher >> priority to the external ones ? > > Yes, by using a different vhost for internal (and CentOS) stuff and > external stuff, and replicating messages from the internal to the > external vhost. > >> If we have on public broker with authentication can we easily kill the accounts that are flooding us ? > > Yes, that's the main advantage of option B. > >> What are the consequences of the service been down ? What is an >> acceptable down time 1 min, 1h , 1 day , 1 week, 1 month ? > > I would say that the internal messaging service needs a high > availability, while the SLA for the external service can be lower. > That's also a reason for me prefering option 2. > > I hope that clarifies a bit. Yes it does thanks for the answers :-), my overall feeling is that the risk of DoS should be one of the factor we take into account to make the decision but we should also consider how easy is it to use, how easy is it to maintain, how much effort is it to setup. I feel that are focusing on the risk of DoS as the main factor to favour one option against the other and I am not sure this is right but that 's my personal feeling and I am happy to be wrong on that.

I think you are under-estimating how often denial-of-service attacks happen, especially in situations where you only need to have kilobytes of bandwidth to start causing trouble. People do it just because they think it's funny and I don't think it's a matter of *if* someone does it, it's just a matter of when. It'd take a couple minutes to create a few hundred thousand queues, eating through broker resources until no one else can do anything. From a user perspective, both options are identical (except for the possibility of authentication being on). It's really just a question of whether the effort of maintaining a second cluster is worth the increased isolation.

...

On the SLA I really think that in the case of DoS attack we would not have much trouble communicating with the community that we are facing an attack and the service will be down or deprecated for X days. Overall I think we should start being OK with taking the risk to have our services down for multiple hours, days, ... if that allow us to save on the daily maintenance burden. Again just my 2 cents on the subject, so feel free to ignore it :-)

If you're still considering the single broker setup with this approach, the cost of being down is that everything grinds to a halt. Package signing is message driven. CI/CD is message driven. Pretty much everything relies on messages. A big CVE gets announced, and then someone attacks the messaging infrastructure to hinder getting the package built, signed, and shipped. I agree that there are plenty of services that are fine with outages of hours or even days (and they can recover if they use messages because it'll all still be queued!), but the message broker isn't one we should allow users to take down. - Jeremy

Clement Verna

10:15 a.m.

On Thu, 28 Feb 2019 at 16:04, Jeremy Cline <jeremy(a)jcline.org> wrote:

...

On 2/27/19 2:10 PM, Clement Verna wrote: > On Wed, 27 Feb 2019 at 18:27, Aurelien Bompard > <abompard(a)fedoraproject.org> wrote: >> >> I'm assuming you're considering the solution where we have a single >> broker and we make it publicly accessible (option 1). >> >>> how easy would it be to turn off the possibility for external >>> publisher to flood the broker ? >> >> External clients won't publish anything, they'll be read-only (with a >> few exceptions like the CentOS CI folks). However they can create a >> huge amount of queues, subscribe to everything and never consume >> anything. >> We can mitigate that by setting up another vhost (in the cluster we >> already have) for external clients, limit the number of queues on that >> vhost, and enforce a time to live on messages in the queues. It'll >> require some fine tuning, though, and external clients will still be >> able to DoS other external clients if we don't do authentication >> (option A). > >> I value option 2 (separate broker) higher than option 1 (same broker) >> because I'm not entirely sure those limits can prevent any kind of DoS >> on the broker. Attackers are creative. It's easier to make sure the >> resources used by a 2nd cluster don't impact the resources of the 1st >> cluster. >> >>> Can we configure the queues that are critical to have higher >>> priority to the external ones ? >> >> Yes, by using a different vhost for internal (and CentOS) stuff and >> external stuff, and replicating messages from the internal to the >> external vhost. >> >>> If we have on public broker with authentication can we easily kill the accounts that are flooding us ? >> >> Yes, that's the main advantage of option B. >> >>> What are the consequences of the service been down ? What is an >>> acceptable down time 1 min, 1h , 1 day , 1 week, 1 month ? >> >> I would say that the internal messaging service needs a high >> availability, while the SLA for the external service can be lower. >> That's also a reason for me prefering option 2. >> >> I hope that clarifies a bit. > > Yes it does thanks for the answers :-), my overall feeling is that the > risk of DoS should be one of the factor we take into account to make > the decision but we should also consider how easy is it to use, how > easy is it to maintain, how much effort is it to setup. I feel that > are focusing on the risk of DoS as the main factor to favour one > option against the other and I am not sure this is right but that 's > my personal feeling and I am happy to be wrong on that. I think you are under-estimating how often denial-of-service attacks happen, especially in situations where you only need to have kilobytes of bandwidth to start causing trouble. People do it just because they think it's funny and I don't think it's a matter of *if* someone does it, it's just a matter of when. It'd take a couple minutes to create a few hundred thousand queues, eating through broker resources until no one else can do anything.

I am not under-estimating it was just not obvious how easy it would be to cause a DoS, if you say that this is very trivial then yes it makes sense to worry about it.

...

From a user perspective, both options are identical (except for the possibility of authentication being on). It's really just a question of whether the effort of maintaining a second cluster is worth the increased isolation. > > On the SLA I really think that in the case of DoS attack we would not > have much trouble communicating with the community that we are facing > an attack and the service will be down or deprecated for X days. > Overall I think we should start being OK with taking the risk to have > our services down for multiple hours, days, ... if that allow us to > save on the daily maintenance burden. > > Again just my 2 cents on the subject, so feel free to ignore it :-) If you're still considering the single broker setup with this approach, the cost of being down is that everything grinds to a halt. Package signing is message driven. CI/CD is message driven. Pretty much everything relies on messages. A big CVE gets announced, and then someone attacks the messaging infrastructure to hinder getting the package built, signed, and shipped. I agree that there are plenty of services that are fine with outages of hours or even days (and they can recover if they use messages because it'll all still be queued!), but the message broker isn't one we should allow users to take down.

Sure I also trying to make us realize that this is a community service and that in most case aiming for enterprise level support and availability is not needed (also we don't have the resource for that). So this might not apply in this case but I think it was important to bring it forward. > > - Jeremy

1878

days inactive

1885

days old

infrastructure@lists.fedoraproject.org

Manage subscription

11 comments

5 participants

tags (0)

participants (5)

Aurelien Bompard
Clement Verna
Jeremy Cline
Kevin Fenzi
Stephen John Smoogen

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

External access to the AMQP broker