On Wed, Feb 8, 2017 at 7:39 PM, Kamil Paral <kparal@redhat.com> wrote:

> I mentioned this in IRC but why not have a bit of both and allow input
> as either a file or on the CLI. I don't think that json would be too
> bad to type on the command line as an option for when you're running
> something manually:
>
> runtask sometask.yml -e "{'namespace':'someuser',\
> 'module':'somemodule', 'commithash': 'abc123df980'}"

I probably misunderstood you on IRC. In my older response here, I actually suggested something like this - having "--datafile data.json", which can also be used like "--datafile -" meaning stdin. You can then use "echo <json> | runtask --datafile - <more args>". But your solution is probably easier to look at.

I honestl like the `--datafile [fname, -]` approach a lot. We could sure name the param better, but that's about it. I like it better than necessarily having a long cmdline, and you can still use "echo <json>" if you wanted to have a cmdline example, or "cat <file>" for the common usage

> There would be some risk of running into the same problems we had with
> AutoQA where depcheck commands were too long for bash to parse but
> that's when I'd say "you need to use a file for that"

Definitely.

And that's why I'd rather stay away from long cmdlines :)

> > I'm a bit torn between providing as much useful data as we can when
> > scheduling (because a) yaml formulas are very limited and you can't
> > do stuff like string parsing/splitting b) might save you a lot of
> > work/code to have this data presented to you right from the start),
> > and the easy manual execution (when you need to gather and provide
> > all that data manually). It's probably about finding the right
> > balance. We can't avoid having structured multi-data input, I don't
> > think.
>
> If we did something along the lines of allowing input on the CLI, we
> could have both, no? We'd need to be clear on the precedence of file vs
> CLI input but that seems to me like something that could solve the
> issue of dealing with more complicated inputs without requiring users
> to futz with a file when running tasks locally.

That's not the worry I had. Creating a file or writing json to a command line is a bit more work than the current state, but not a problem. What I'm a bit afraid of is that we'll start adding many keyvals into the json just because it is useful or convenient. As an artificial example, let's say for a koji_build FOO we supply NVR, name, epoch, owner, build_id and build_timestamp. And if we receive all of that in the fedmsg (or from some koji query that we'll need to do anyway for some reason), it makes sense to pass that data, it's free for us and it's less work for the task (it doesn't have to do its own queries). However, running the task manually as a task developer (and I don't mean re-running an existing task on FOO by copy-pasting the existing data json from a log file, but running it on a fresh new koji build BAR) makes it much more difficult for the developer, because he needs to figure out (manually) all those values for BAR just to be able to run his task.

Even more extreme (deliberately, to illustrate the point) example would be to pass the whole koji buildinfo dict structure that you get when running koji.getBuild(). Which could be actually easier for the developer to emulate, because we could document a single command that retrieves exactly that. Unless we start adding additional data to it...

So on one hand, I'd like to pass as much data as we have to make task formulas simpler, but on the other hand, I'm afraid task development (manual task execution, without having a trigger to get all this data by magic) will get harder. (I hope I managed to explain it better this time:))
_

As I mentioned in one of the other emails - the dev (while developing) should really only need to provide the data that is relevant for the task/formula. Why have a ton of stuff that you never use in the "testing data" - it is unnecessary work, and even makes it more prone to error IMO. If I had task that only needs NVR, name and build_timestamp, I'd (while developing/testing) just pass a structure containing these.

Or do you think that is a bad idea? I sure can see how (e.g.) the resultsdb directive could be spitting warnings out about missing data, but that is why we have the different profiles - the resultsdb could fail in production mode, if data was missing (and that probably means some serious error) or just warn you in development mode.

If you wanted to "test it thoroughly" you'd better use some real data anyway - and if we store the "input data structure" in logs for the tasks, then there even is a good source of those, should you want to copy-paste it.

I hope I understood what you meant.

joza