On Wed, Aug 2, 2017 at 2:44 AM, Dan Callaghan <dcallagh@redhat.com> wrote:
Excerpts from Róman Joost's message of 2017-08-02 10:18 +10:00:
> Dear Petr,
>
> On Tue, Aug 01, 2017 at 01:05:34PM +0200, Petr Pisar wrote:
> > On Tue, Aug 01, 2017 at 11:59:41AM +0200, Kamil Paral wrote:
> > > thanks for the report. In the task, we just install rpmgrill and run it. If
> > > rpmgrill reports outdated clamav results, it seems that's something that
> > > should be fixed in rpmgrill itself (it could depend on clamav-update and
> > > update the virus db before running the virus check). Can you please report
> > > a bug against rpmgrill and post the link here?
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=1477130
> Many thanks for the report. I haven't looked into the specifics, but I'm
> not sure what we envision rpmgrill should be doing here. Should it run a
> freshclam every invokation of rpmgrill? From what I see, that can take
> quite a bit of time.

That was my initial idea, yes. But now that I tried it, I see two problems:
a) freshclam needs to run as root, therefore rpmgrill executed under a standard user can't update it anyway. It could warn about it, and trigger the update if running under root (taskotron case), but it's not a generic solution which makes sure the results are up-to-date.
b) The database refresh is SLOW. I expected just a few seconds. But here's my experience:
https://paste.fedoraproject.org/paste/3oGdfMpOP84~rhQm34AZvw/raw
Three minutes full of timing out. This can probably be much longer, if servers are in even a worse condition some day.
 
>
> Maybe what could be done tho is a systemd timer installed with the
> package which runs freshclam every now and than?

This might not work well if rpmgrill is invoked as part of some system
which creates a fresh VM from scratch and then deletes it shortly after
(think single-use slaves with Jenkins OpenStack Cloud plugin, for
example). The freshclam timer will likely never trigger before rpmgrill
is run.

That's Taskotron case, yes. After looking more into this, I guess I wouldn't object updating clamav database before running rpmgrill. The problem I have with it is how slow it is. We run rpmgrill on every new koji builds, that means very frequently. Each run is performed on a clean machine, meaning each rpmgrill execution takes 3+ minutes longer just because of clamav servers being horrible. I'd definitely add a timeout and kill the process if it did not finish in 5 minutes or so, but even the usual e.g. 3 minutes of extra execution time (mostly idle time) is something I don't like.
 

One possible thing which might help is if rpmgrill could warn or even
fail, if it detects that it's being run with "outdated" ClamAV
definitions. Not sure how old you want to consider "outdated", or if
there is an easy way to check it... At worst I guess something like, if
modtime of ClamAV definitions is more than 4 weeks in the past give an
error? Do we know how frequently the definitions are updated?

The other thing is that this idea of "download some data from the
internet in order to make this package work" is not a good approach. It
breaks in exactly the scenario I mentioned above, where a freshly
installed copy of the package is not actually usable. The pciids and
usbids database used to be like this too (shipping some old version of
the data, plus a cron job to pull down updates from the internet) but
nowadays we have the hwdata package which just gets updated with the
latest definitions once per month. This is a much nicer solution because
it means you can install a machine using only Fedora packages (or
a freshly built disk image) and it already has the data it needs,
without then going back to some random server on the internet.

Very much agreed.
 

So maybe the ClamAV definitions should be treated similarly? In
a separate package which gets updated on a regular interval to pull in
the latest data?

That would be the best solution here, yes. Could someone please file an RFE against clamav?