On Wed, Feb 8, 2017 at 4:11 PM, Tim Flink <tflink@redhat.com> wrote:
On Wed, 8 Feb 2017 08:26:30 -0500 (EST)
Kamil Paral <kparal@redhat.com> wrote:

I think another question is whether we want to keep assuming that the
user supplies the item that is used as a UID in resultsdb. As you say,
it seems a bit odd to require people to munge stuff together like
"namespace/module#commithash" at the same time that it can be separated
out into a dict-like data structure for easy access.


Emphasis mine. I think that we should not really be assuming that at all. In most cases, the item should be provided by the trigger automagically, the same with the type. With what I'd like to see for the structured input, the conventions module could/should take that data into account while constructing the "default" results.
Keep in mind, that the one result can also have multiple "items" (as it can have a multiple of any extra data field), if it makes sense. One, the "auto-provided" and the second could be user-added. That would make it both consistent (the tirgger generated item) and flexible, if a different "item" makes sense.

Would it make more sense to just pass in the dict and have semi-coded
conventions for reporting to resultsdb based on the item_type which
could be set during the task instead of requiring that to be known
before task execution time?

Something along the lines of enabling some common kinds of input for
the resultsdb directive - module commit, dist-git rpm change, etc. so
that you could specify the item_type to the resultsdb directive and it
would know to look for certain bits to construct the UID item that's
reported to resultsdb.

Yup, I think that setting some conventions, and making sure we keep the same (or at least very similar) set of metadata for the relevant type is a key.
I mentioned this in the previous email, but I am, in the past few days, thinking about making the types a bit more general - the pretty specific types we have now made sense, when we first designed stuff, and had a very narrow usecase.
Now that we want to make the stack usable in stuff like Platform CI, I think it would make sense to abstract a bit more, so we don't have `koji_build`, `brew_build`, `copr_build` which are essentialy the same, but differ in minor details. We can specify those classes/details in extradata, or could even use multiple types - having the common set of information guaranteed for all the 'build' type, and add other kind of data to `koji_build`, `brew_build` of `whatever_build` as needed.
 
Using Kamil's example, assume that we have a task for a module and the
following data is passed in:

  {'namespace':'someuser', 'module':'httpd', 'commithash':'abc123df980'}

Neither item nor type is specified on the CLI at execution time. The
task executes using that input data and when it comes time to report to
resultsdb:

  - name: report results to resultsdb
    resultsdb:
      results: ${some_task_output}
      type: module

By passing in that type of module, the directive would look through the
input data and construct the "item" from input.namespace, input.module
and input.commithash.

I'm not sure if it makes more sense to have a set of "types" that the
resultsdb directive understands natively or to actually require item
but allow variable names in it along the lines of

  "item":"${namespace}/${module}#${commithash}"

I'd rather have that in "conventions" than the resultsdb directive, but I guess it is essentialy the same thing, once you think about it.
 

> > My take on this is, that we will say which variables are provided
> > by the trigger for each type. If a variable is missing, the
> > formula/execution should just crash when it tries to access it.
>
> Sounds reasonable.

+1 from me as well. Assume everything is there, crash if there's
something requested that isn't available (missing data etc.)


yup, that's what I have in mind.
 
> We'll probably end up having a mix of necessary and convenience
> values in the inputdata. "name" is probably a convenience value here,
> so that tasks don't have to parse if they need to use it in a certain
> directive. "epoch" might be an important value for some test cases,
> and let's say we learn the value in trigger during scheduling
> investigation, so we decide to pass it down. But that information is
> not that easy to get manually. If you know what to do, you'll open up
> a particular koji page and see it. But you can also be clueless about
> how to figure it out. The same goes for build_id, again can be
> important, but also can be retrieved later, so more of a convenience
> data (saving you from writing a koji query). This is just an example
> for illustration, might not match real-world use cases.

I mentioned this in IRC but why not have a bit of both and allow input
as either a file or on the CLI. I don't think that json would be too
bad to type on the command line as an option for when you're running
something manually:

  runtask sometask.yml -e "{'namespace':'someuser',\
                    'module':'somemodule', 'commithash': 'abc123df980'}"

There would be some risk of running into the same problems we had with
AutoQA where depcheck commands were too long for bash to parse but
that's when I'd say "you need to use a file for that" or see if there
was another solution to whatever required input that was too long for
bash.


I think Kamil already replied to this - his idea was to have a possibility to just read from stdin. Which is IMO somewhat the same, but at the same time better - as you can just "cat" the contents of a file you prepared to the command, whithout needing to have that awfully lenghty command line.