Re: Systematically crawling Fedoraproject.org repositories

Thursday, 2 September 2010

On Thu, 2 Sep 2010, Pascal Minnerup wrote:

...
 Dear Fedora team,

 We on the Google Code Search project (www.google.com/codesearch) want to improve the
quality of our index, and as part of that, would like to systematically crawl the fedora
 git repositories of fedoraproject.org, which we consider one of the major hosts of open
source. Our crawlers use bandwidth throttling that should ensure that we don't
 overstress your web servers.

 1. Is it okay for you if we systematically crawl your git repositories for new source
code?

 2. How would you recommend we get the repository directories? Our current approach would
be to get the git repositories of recently updated packages from this page:
 http://pkgs.fedoraproject.org/gitweb/?o=age.

 3. Are there any particular times or actions we should _avoid_?

 4. Is there any particular person we should talk to in the future?

 An answer to these questions would be very helpful in improving the presence of Fedora
code files in Code Search. We look forward to hearing from you.

Thanks for contacting us, we really don't know how that would all react
but I'm ok with it provided we can contact you to change things later if
things do go south?

	-Mike

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: Systematically crawling Fedoraproject.org repositories