Fedmsg Emitting

Wed May 20 13:02:49 UTC 2015

----- Original Message -----
> From: "Tim Flink" <tflink at redhat.com>
> To: qa-devel at lists.fedoraproject.org
> Sent: Friday, May 15, 2015 1:14:56 AM
> Subject: Re: Fedmsg Emitting
> 
> On Thu, 14 May 2015 20:02:29 +0200
> Martin Krizek <mkrizek at redhat.com> wrote:
> 
> > Before we start sending fedmsgs we need to discuss a few things. We
> > don't have to find solutions to all these problems, just keep them in
> > mind when designing the solution we're going to start with:
> > 
> > 1. How often do we send fedmsg
> > a) per-task
> > b) per-update
> > c) per-build
> > 
> > a) and b): we can list affected packages in a fedmsg.
> > 
> > I am not sure if there are any limits when it comes to fedmsg size.
> > Whether the infra folks would be more happy with less larger or more
> > smaller fedmsgs (or it doesn't matter).
> 
> a) doesn't make a lot of sense to me - yeah, it fits better into our
> execution model but I don't think that anyone outside of taskotron
> cares much about what was done in a task. That being said, once we have
> more diverse tasks, this could change but I'm not really looking to
> design for something that hasn't even started happening yet.
> 

Looking at it now, I have no idea why I wrote "per-task". What I meant is
that we could send a fedmsg per-build(update) that would contain results
of all tasks executed on that build(update). Just a thought. Sorry for
confusion. :/

> b) hits a similar issue - outside of bodhi, there isn't much that works
> on updates and my suspicion is that most of the folks consuming
> the output will fall into 1 of 2 categories
>  - people who have small updates that only contain packages that
>    they're responsible for
>  - people who have packages in one of the megaupdates
> 
> There are plenty of exceptions to either of those but I suspect that
> _most_ people will fall into one of those categories.
> 
> That leaves us with c)
> 
> > I guess c) allows to easier filtering in FMN.
> 
> c) not only allows for easier filtering in FMN but it's also more
> compatible with how I think that releng would like to see build gating
> done. Assuming that we eventually get into the rawhide space, we'll
> have to start emitting stuff per-build anyways :)
> 
> I'm of the opinion that c) is going to be best here. In the past, we've
> done a lot of results on a per-update basis but unless I'm forgetting
> something, we could transition to more of a per-build system.
> 
> For example - depcheck processes updates - if one build in that update
> fails, the whole update fails. While I think that this the best choice,
> I also think that logic should be handled in bodhi instead of us trying
> to emulate what bodhi is doing. As far as I know, this is happening
> with bodhi2 - they're assuming that we'll be emitting per-build fedmsgs
> and the logic for failing/passing an update will lie in bodhi and not
> rely on our emulation of bodhi's processes.
>

That does make sense to me.

> > 2. Who do we target: users, systems or both
> > 
> > The issue here is with tasks that repeatedly test one update.
> > Currently we check if there's a bodhi update comment with the same
> > result already and if so, we don't post the comment again. To do
> > something like that with fedmsgs we'd have to have a code running
> > somewhere that would check against its database whether an incoming
> > result is a duplicate or not. The question is where the code would
> > run. Bodhi comes to mind since it already has information about
> > updates and so is good for tasks that work with bodhi updates.
> > However, there might be tasks that work with something else, like
> > composes. In this case we'd probably have the code on taskotron
> > systems.
> 
> I think that how we handle scheduling of some of our current checks
> (depcheck and upgradepath) is a byproduct of trying to make a
> repo-level check look like a build/update-level check. I can't think of
> many more tasks that would run into the same problem of repeated runs.
> 
> For the majority of tasks, I see the process as being similar to:
> 
>   1. trigger task $x for $y
>   2. run task $x with $y as input
>   3. report result for $x($y)
> 
> With this, we'd be running $x for each $y and the reporting would only
> happen for each unique ($x, $y) assuming that something wasn't
> rescheduled or forced to re-run.
> 
> I think it would be best to have consistent behavior for our fedmsg
> emitting. If most tasks will only emit fedmsgs once, we should take our
> minority tasks that emit more than one fedmsg per item and deduplicate
> before the messages are emitted.
> 
> > So if we target systems we'd just send all results in fedmsgs and let
> > the systems consume them and do whatever they want to do with them
> > (e.g. bodhi can squash all the tasks relevant to specific update and
> > notify the maintainer of the package via fedmsg about the result). If
> > we target users, we'd have to have some logic to limit rate of fedmsgs
> > ourselves but that would mean hiding some of the results (although
> > duplicates) from the world.
> 
> I'd like to see us do the deduplication in resultsdb (assuming that's
> where the fedmsg emission will be happening). I think that we already
> have a table for items and I don't think that keeping track of
> "is_emitted" and the last state emitted (so we can track changes in
> state) would be too bad. Then again, I'm not the one working in the
> code and I could be wrong :)
>

Can you think of a use case when someone would want to receive all
results including duplicates?

> > So the question here is where to put the 'deduplication logic'.
> > 
> > Emitting all results is the simplest solution as a starting point.
> 
> Simpler, but I don't think it serves our end goals very well unless
> deduplication is going to be more expensive than I think it will be.
> 
> Tim
> 
> _______________________________________________
> qa-devel mailing list
> qa-devel at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/qa-devel
>