Fedmsg Emitting

Thu May 14 23:14:56 UTC 2015

On Thu, 14 May 2015 20:02:29 +0200
Martin Krizek <mkrizek at redhat.com> wrote:

> Before we start sending fedmsgs we need to discuss a few things. We
> don't have to find solutions to all these problems, just keep them in
> mind when designing the solution we're going to start with:
> 
> 1. How often do we send fedmsg
> a) per-task
> b) per-update
> c) per-build
> 
> a) and b): we can list affected packages in a fedmsg.
> 
> I am not sure if there are any limits when it comes to fedmsg size.
> Whether the infra folks would be more happy with less larger or more
> smaller fedmsgs (or it doesn't matter).

a) doesn't make a lot of sense to me - yeah, it fits better into our
execution model but I don't think that anyone outside of taskotron
cares much about what was done in a task. That being said, once we have
more diverse tasks, this could change but I'm not really looking to
design for something that hasn't even started happening yet.

b) hits a similar issue - outside of bodhi, there isn't much that works
on updates and my suspicion is that most of the folks consuming
the output will fall into 1 of 2 categories
 - people who have small updates that only contain packages that
   they're responsible for
 - people who have packages in one of the megaupdates

There are plenty of exceptions to either of those but I suspect that
_most_ people will fall into one of those categories.

That leaves us with c)

> I guess c) allows to easier filtering in FMN.

c) not only allows for easier filtering in FMN but it's also more
compatible with how I think that releng would like to see build gating
done. Assuming that we eventually get into the rawhide space, we'll
have to start emitting stuff per-build anyways :)

I'm of the opinion that c) is going to be best here. In the past, we've
done a lot of results on a per-update basis but unless I'm forgetting
something, we could transition to more of a per-build system.

For example - depcheck processes updates - if one build in that update
fails, the whole update fails. While I think that this the best choice,
I also think that logic should be handled in bodhi instead of us trying
to emulate what bodhi is doing. As far as I know, this is happening
with bodhi2 - they're assuming that we'll be emitting per-build fedmsgs
and the logic for failing/passing an update will lie in bodhi and not
rely on our emulation of bodhi's processes.

> 2. Who do we target: users, systems or both
> 
> The issue here is with tasks that repeatedly test one update.
> Currently we check if there's a bodhi update comment with the same
> result already and if so, we don't post the comment again. To do
> something like that with fedmsgs we'd have to have a code running
> somewhere that would check against its database whether an incoming
> result is a duplicate or not. The question is where the code would
> run. Bodhi comes to mind since it already has information about
> updates and so is good for tasks that work with bodhi updates.
> However, there might be tasks that work with something else, like
> composes. In this case we'd probably have the code on taskotron
> systems.

I think that how we handle scheduling of some of our current checks
(depcheck and upgradepath) is a byproduct of trying to make a
repo-level check look like a build/update-level check. I can't think of
many more tasks that would run into the same problem of repeated runs.

For the majority of tasks, I see the process as being similar to:

  1. trigger task $x for $y
  2. run task $x with $y as input
  3. report result for $x($y)

With this, we'd be running $x for each $y and the reporting would only
happen for each unique ($x, $y) assuming that something wasn't
rescheduled or forced to re-run.

I think it would be best to have consistent behavior for our fedmsg
emitting. If most tasks will only emit fedmsgs once, we should take our
minority tasks that emit more than one fedmsg per item and deduplicate
before the messages are emitted.

> So if we target systems we'd just send all results in fedmsgs and let
> the systems consume them and do whatever they want to do with them
> (e.g. bodhi can squash all the tasks relevant to specific update and
> notify the maintainer of the package via fedmsg about the result). If
> we target users, we'd have to have some logic to limit rate of fedmsgs
> ourselves but that would mean hiding some of the results (although
> duplicates) from the world.

I'd like to see us do the deduplication in resultsdb (assuming that's
where the fedmsg emission will be happening). I think that we already
have a table for items and I don't think that keeping track of
"is_emitted" and the last state emitted (so we can track changes in
state) would be too bad. Then again, I'm not the one working in the
code and I could be wrong :)

> So the question here is where to put the 'deduplication logic'.
> 
> Emitting all results is the simplest solution as a starting point.

Simpler, but I don't think it serves our end goals very well unless
deduplication is going to be more expensive than I think it will be.

Tim
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://lists.fedoraproject.org/pipermail/qa-devel/attachments/20150514/9aabf4e0/attachment.sig>