Task Scheduling for depcheck

Wed Apr 30 19:26:46 UTC 2014

This has kinda been an elephant in the room that I've talked about to a
few people but we haven't had much of a discussion about it yet. For the
sake of simplicity, I'm going to be talking specifically about depcheck
but most of this applies to upgradepath and possibly other tasks.

The base problem is that there's a bit of an impedance mismatch between
how we pretend to schedule depcheck and how depcheck actually works.
From the outside, it looks like we run depcheck on a single update when
that update is created or changed. In reality, depcheck runs on an
entire koji tag when that tag or any of its builds changes.

Another way to summarize what depcheck does is:
Verify that the dependency trees in given set of repositories is sane
and identify any problem builds which disrupt that sanity.

Just because a build didn't break the dep tree when it was first checked
doesn't mean that it won't be involved in breaking the tree when
another build is added. Along the same lines, just because a build
fails when first checked doesn't mean that it needs to be changed in
order to pass - it could require another build that hasn't been
finished or checked yet. Running depcheck on a single update/build
can't work because the effect a build has on dep trees is, by
definition, not something that can be determined by looking at that
build in isolation.

We got around this in AutoQA because we did scheduling with a cron job
and ran depcheck-old more often than we actually needed to (once for
every update that changed since the last cron job). When depcheck-old
ran, it could update the status of any update associated with the
builds in a koji tag. Now that we're moving to scheduling based on
fedmsg, it's not as easy to ignore the fact that depcheck doesn't
really work on a per-update basis.

I have some ideas about how to address this that are variations on a
slightly different scheduling mantra:

1. Collect update/build change notifications
2. Run depcheck on affected koji tags at most every X minutes
3. Report changes in build/update status on a per-build/per-update
   basis at every depcheck run

This way, we'd be scheduling actual depcheck runs less often but in a
way that is closer to how it actually works. From a maintainers'
perspective, nothing should change significantly - notifications will
arrive shortly after changes to a build/update are submitted.

To accomplish this, I propose the following:
1. Add a separate buildbot builder to handle depcheck and similar tasks
   by adding a "fuse" to the actual kickoff of the task. The first
   received signal would start the fuse and after X minutes, the task
   would actually start and depcheck would run on the entire tag.

2. Enhance taskotron-trigger to add a concept of a "delayed trigger"
   which would work with the existing bodhi and koji listeners
   but instead of immediately scheduling tasks based on incoming
   fedmsgs, use the fused builder as described in 1.

Some changes to resultsdb would likely be needed as well but I don't
want to limit ourselves to what's currently available. When Josef and I
sat down and talked about results storage at Flock last year, we
decided to move forward with a simple resultdb so that we'd have a
method to store results knowing full well that it would likely need
significant changes in the near future.

Thoughts? Counter-proposals? Other suggestions?

Tim
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://lists.fedoraproject.org/pipermail/qa-devel/attachments/20140430/1811ce2c/attachment.sig>