Stop the git abuse

Thu May 24 09:12:36 UTC 2012

On 05/23/2012 06:30 PM, Jesse Keating wrote:
> On 05/22/2012 11:53 PM, Panu Matilainen wrote:
>> Well, here's what I see after drop_caches=3 from 'strace -tt fedpkg
>> verrel' on kernel.spec which is one of the most complicated specs in
>> Fedora land:
>>
>> 09:09:06.928011 fedpkg exec
>> 09:09:12.699345 python imports done
>> 09:09:13.510192 rpm exec
>> 09:09:15.345425 rpm exit
>> 09:09:15.385441 fedpkg exit
>>
>> So by the look of things, 2/3 of the execution time is spent importing
>> python modules. The rpm execution time is heavily dependent on what the
>> spec actually does, eg in case of kernel.spec this includes ~50
>> fork+execs of shell, getconf and two python invocations from executing
>> shell macros.
>>
>> From plain rpmspec parsing POV (shell macros aside), at top of
>> callgrind charts sits the rpm bad performance hallmark pattern of
>> repeated insert/delete, qsort and bsearch cycles (on macros). Changing
>> the macros engine to use a hash table instead has been on my todo list
>> for some time now, just not very high in the priorities as spec parse
>> isn't exactly the most time-critical thing rpm does.
>
> OOps, I hope my message didn't come across as placing blame or throwing
> rpm under the bus.

Oh, when rpm does something stupid it deserves the blame as much as the 
next guy :) I *know* there are ugly scalability issues in rpmbuild 
elsewhere, it wouldn't exactly shock me to find them in spec parsing as 
well.

This was the first time ever I saw spec parsing being mentioned as a 
bottleneck so I thought I'd had a look: while spec parse has 
traditionally been but a drop in the ocean of package build time, times 
do change and so do usage patterns. For one, the change from cvs to git 
has changed the speed expectations quite dramatically.

> I suspected it was a spec much like the kernel that
> does a lot of complicated macro work to figure out things like name,
> version, release. Also, I meant it as something I can't do much about in
> fedpkg land.

Yup. And obviously rpm can't make eg the shell run any faster, but there 
should be plenty of opportunities for eliminating some of the more 
expensive invocations.

Take for example the ubiquitous python_sitelib/python_sitearch macros: 
these invoke shell which invokes the python interpreter which imports 
piles of stuff in order to get a couple of paths that are for all 
practical purposes static within a Fedora release, and could be 
statically defined from macro files generated at the main python 
package(s) build-time.

Another, probably less expensive but even more ubiquitous that could be 
"fixed" centrally is invoking shell + getconf through %{_smp_mflags} to 
get the number of cpus, another pretty static piece of information.

A different kind of case of unnecessary shell invocations slowing things 
down is doing simple arithmetics and such, when %{lua: ...} could be 
used, but that's obviously something that individual package maintainers 
would have to change to "optimize" the spec parse, no central change can 
help there.

And yes this has drifted pretty damn far from git merges :)

> fedpkg does do a fair amount of python imports. I could probably move
> some of those around to be more lazy loaded when a property that
> requires them gets accessed, but that makes the code harder to manage.
> In practical usage on simple rpms, the amount of time I wait for verrel
> to return is so small as to not really interrupt my work flow.
>
> $ time fedpkg verrel
> pungi-2.11-2.fc18
>
> real 0m0.563s
> user 0m0.437s
> sys 0m0.118s
>
> half of a second.

That's the kind of performance I'm seeing too in normal circumstances 
(even for the more complex packages like the kernel), and seems 
perfectly adequate to me. Uncached performance is what it is but hardly 
matters, several seconds worth of python imports per run when cached 
would deserve a closer look :)

	- Panu -