I think another question is whether we want to keep assuming that
the
user supplies the item that is used as a UID in resultsdb. As you say,
it seems a bit odd to require people to munge stuff together like
"namespace/module#commithash" at the same time that it can be separated
out into a dict-like data structure for easy access.
Would it make more sense to just pass in the dict and have semi-coded
conventions for reporting to resultsdb based on the item_type which
could be set during the task instead of requiring that to be known
before task execution time?
Something along the lines of enabling some common kinds of input for
the resultsdb directive - module commit, dist-git rpm change, etc. so
that you could specify the item_type to the resultsdb directive and it
would know to look for certain bits to construct the UID item that's
reported to resultsdb.
Using Kamil's example, assume that we have a task for a module and the
following data is passed in:
{'namespace':'someuser', 'module':'httpd',
'commithash':'abc123df980'}
Neither item nor type is specified on the CLI at execution time. The
task executes using that input data and when it comes time to report to
resultsdb:
- name: report results to resultsdb
resultsdb:
results: ${some_task_output}
type: module
By passing in that type of module, the directive would look through the
input data and construct the "item" from input.namespace, input.module
and input.commithash.
I'll have to think about this, maybe sketch up some examples.
I mentioned this in IRC but why not have a bit of both and allow
input
as either a file or on the CLI. I don't think that json would be too
bad to type on the command line as an option for when you're running
something manually:
runtask sometask.yml -e "{'namespace':'someuser',\
'module':'somemodule', 'commithash':
'abc123df980'}"
I probably misunderstood you on IRC. In my older response here, I actually suggested
something like this - having "--datafile data.json", which can also be used like
"--datafile -" meaning stdin. You can then use "echo <json> | runtask
--datafile - <more args>". But your solution is probably easier to look at.
There would be some risk of running into the same problems we had with
AutoQA where depcheck commands were too long for bash to parse but
that's when I'd say "you need to use a file for that"
Definitely.
> I'm a bit torn between providing as much useful data as we
can when
> scheduling (because a) yaml formulas are very limited and you can't
> do stuff like string parsing/splitting b) might save you a lot of
> work/code to have this data presented to you right from the start),
> and the easy manual execution (when you need to gather and provide
> all that data manually). It's probably about finding the right
> balance. We can't avoid having structured multi-data input, I don't
> think.
If we did something along the lines of allowing input on the CLI, we
could have both, no? We'd need to be clear on the precedence of file vs
CLI input but that seems to me like something that could solve the
issue of dealing with more complicated inputs without requiring users
to futz with a file when running tasks locally.
That's not the worry I had. Creating a file or writing json to a command line is a bit
more work than the current state, but not a problem. What I'm a bit afraid of is that
we'll start adding many keyvals into the json just because it is useful or convenient.
As an artificial example, let's say for a koji_build FOO we supply NVR, name, epoch,
owner, build_id and build_timestamp. And if we receive all of that in the fedmsg (or from
some koji query that we'll need to do anyway for some reason), it makes sense to pass
that data, it's free for us and it's less work for the task (it doesn't have
to do its own queries). However, running the task manually as a task developer (and I
don't mean re-running an existing task on FOO by copy-pasting the existing data json
from a log file, but running it on a fresh new koji build BAR) makes it much more
difficult for the developer, because he needs to figure out (manually) all those values
for BAR just to be able to run his task.
Even more extreme (deliberately, to illustrate the point) example would be to pass the
whole koji buildinfo dict structure that you get when running koji.getBuild(). Which could
be actually easier for the developer to emulate, because we could document a single
command that retrieves exactly that. Unless we start adding additional data to it...
So on one hand, I'd like to pass as much data as we have to make task formulas
simpler, but on the other hand, I'm afraid task development (manual task execution,
without having a trigger to get all this data by magic) will get harder. (I hope I managed
to explain it better this time:))