So, I sent an email a while back about this to get people thinking,
but
I didn't get too much feedback from my questions, so this time I am
going to actually outline a proposal for people to look at. ;)
Currently users expect pretty much any public service we have is fully
supported. This means things like updating status when it's down,
working anytime something is down to fix it as quickly as we can.
New applications/services currently all pass through the (somewhat
long) RFR process which we setup to make sure we could support the
service moving forward.
This is great and all, but some services just aren't as sustainable, or
don't really fit into our RFR process very well. Also, our RFR process
makes us pretty slow to bring a new service online properly.
In order to have support levels, we need a way to communicate that to
our users easily and the only/best way I can think of to do that easily
is via domain name. If we try and have a table or something it could
get pretty confusing for people. Tying it to domain names would make it
much easier.
Just domain would be a good way for us internally, but maybe we can also
get the design people to provide us with banners or different versions of
the logo to put on stg/dev/cloud/... instances, so that we also make it
clear inside the applications in a consistent manner.
So:
fedoraproject.org - Anything with this domain is something that has
passed though our RFR process and we support fully. This means we
update status, we alert on them anytime they have issues, we work on
them anytime they are down, etc.
Maybe clearly indicate cloud.fp.o (and some others probably) as exceptions
to this rule.
getfedora.org - Same level as
fedoraproject.org.
fedorainfracloud.org - This comes with a lesser level of support,
simply because our cloud doesn't have any kind of HA setup, so
it will be down when doing maint or when there's problems. Services in
this domain may be down when there is scheduled cloud maint. We
monitor, but don't page off hours, we may work on issues only during
business hours, etc. Services here may not have passed through our RFR
process (perhaps we should have a parallel cloud process)
cloud.fedoraproject.org - Same level as
fedorainfracloud.org.
stg.fedoraproject.org - These can be down anytime and we monitor on
them, but may not work on them off hours, etc.
someother domain that sounds fedora related (fedorarelated.org?
fedoralinks.org? ?) - These are things that are fedora related, but not
fully controlled by fedora infrastructure. Things like the fedora
bootstrap site or the porting python3 in fedora site, or possibly cloud
instances that aren't managed by us. These we don't monitor or have
status on, and direct people to contact the managers directly.
Any other types of sites / domains people can think of?
Where do hosted, people and planet fall in?
I would say these are production as well, and same as fp.o.
Any general thoughts on the idea?
Outside of the indications to users, how about defining "SLA levels", or
however we want to call it, and display the above rules in a table, for
easy grokking by other people?
Something like:
Status | Monitored | Paged | Off-hours
Production | X X X -
Staging | - X - -
Or however we want to fill this in exactly, just a quick example, we might
for example give different names to the levels or the sort.
kevin
With kind regards,
Patrick Uiterwijk
Fedora Infra