pkgdb2 post-mortem and strategy for future deployments

Fri Jun 6 17:26:00 UTC 2014

On Wed, 4 Jun 2014 11:44:54 -0700
Toshio Kuratomi <a.badger at gmail.com> wrote:

> 
> This came up in a different venue and pingou and I have continued to
> talk about it.  Seemed that this was the right place to bring the
> discussion though.

...snip...

> Some ideas for doing major deployments in the future:
> 
> 1: We have to make people aware when a new deployment means API
> breaks.
>   * Be clear that the new deployment means API breaks in every call
> for testing.  Send announcements to infrastructure list and depending
> on the service to devel list.
>   * Have a separate announcement besides the standard outage
> notification that says that an API breaking update is planned for
> $date
>   * When we set a date for the new deployment, discuss it at least
> once in a weekly infrastructure meeting.
>   * See also the solution in #3 below

This seems good to me. 

> 2: It would be really nice for people to do more testing in stg.
>   * Increase rube coverage.  rube does end-to-end testing so it's
> better at catching cross-app issues where API changes better than
> unittests which try to be small and self-contained
>     - A flock session where everyone/dev in infra gets to write one
> rube test so we get to know the framework

Yeah, that sounds good. Perhaps a badge for 'added a rube test' ? :) 

>   * Run rube daily
>     - Could we run rube in an Xvfb on an infrastructure host?
>   * Continue to work towards a complete replica of production in the
> stg environment.

Yeah, if we can figure out a clean way to run it. It also needs some
credentials for some of the tests, so we would need to make a test
user, etc. 

> 3: "Mean time to repair is more important than mean time between
> failure." It seems like anytime there's a major update there's
> unexpected things that break.  Let's anticipate the unexpected
> happening.

I agree here too... 

>   * Explicitly plan for everyone to spend their day firefighting when
> we make a major new deployment.  If you've already found all the
> places your code is affected and pre-ported it and the deployment
> goes smoothly then hey, you've got 6 extra working hours to shift
> back to doing other things.  If it's not smooth, then we've planned
> to have the attention of the right people for the unexpected
> difficulties that arise.

Yep. 

>   * As part of this, we need to identify people outside of
> infrastructure that should also be ready for breakage.  Reach out to
> rel-eng, docs, qa, cvsadmins, etc if there's a chance that they will
> be affected.

Agreed. 
 ...snip...

> What should we apply this to?
> * Probably can skip if:
>   - Things that we don't think have API breaks
>   - Things that are minor releases (hopefully these would correlate
> with not having API breaks :-)
>   - Leaf services that are not essential to releasing Fedora.
>     + ask, nuancier, elections, easyfix, badges, paste, nuancier
>     + There's a lot of boderline cases too -- is fedocal essential
> enough to warrant being under this policy?  Since the wiki is used
> via its API should that fall under this as well?

yeah, there's going to be some fuzz, but discussing the update in at
least one infra meeting before pushing would mean we would have
time/people to hash out if it's a major update or what. 

> Comments, thoughts, other ideas?
>
> Do we need to "ratify" something like this at a meeting?

We could, but I think it's all quite sensable, so we could just do it
unless someone objects. 

> What's the next app deploy where we'll want to enact this?
> Maybe bodhi2 ;-)?

Yep. Althought it might be something before then... we talked about
trying to get bodhi2 in stg before too long, but we don't want to
disrupt the release process for f21 too much, so we thought targeting
landing it after f21 a few weeks might be best. That also gives lots of
time for testing in stg and getting api users switched over. 

kevin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://lists.fedoraproject.org/pipermail/infrastructure/attachments/20140606/b1ad0f59/attachment.sig>