Response to "Getting Fedora Out of the If-Then Loop"

Jeff Spaleta jspaleta at gmail.com
Wed Feb 17 18:38:27 UTC 2010


On Wed, Feb 17, 2010 at 9:24 AM, inode0 <inode0 at gmail.com> wrote:
> It could but does it and to what extent? Is it a good metric for the
> number of contributors joining the project? What does it look like if
> we overlay the recent stagnant download trend with the recent
> statistics showing numbers of contributors over the same period?
Be very careful here. What you really want to do is look for a
characteristic cross correlation time...essentially slide the trending
graphs of the download metric and contributor metric across each other
looking for the best time offset that maximizes how the graphs
correlate.  I believe numpy(or is it scipy I'll have to look) has a
cross correlation function built in.   Get me two years of data binned
weekly for both download and contributor metrics and I can do the
correlation analysis.

And to convince myself that the correlation is real I would want to
divide the available data set into multiple time chunks and calculate
the correlation time in each. If the characteristic correlation time
across those chunks are tightly grouped with the overall correlation
time calculated from the entire data set then is probably a real
correlation between downloads and contributors.  Even if the
characteristic time has been slowly varying that would also be worth
looking into understanding.

Until a see a correlation analysis of this type done I've no rationale
reason to start looking for a cause/effect relationship.

-jef


More information about the advisory-board mailing list