On Thu, Sep 2, 2010 at 5:34 PM, Mike McGrath <mmcgrath(a)redhat.com> wrote:
On Thu, 2 Sep 2010, Pascal Minnerup wrote:
> Dear Fedora team,
>
> We on the Google Code Search project (
www.google.com/codesearch) want to improve the
quality of our index, and as part of that, would like to systematically crawl the fedora
> git repositories of
fedoraproject.org, which we consider one of the major hosts of
open source. Our crawlers use bandwidth throttling that should ensure that we don't
> overstress your web servers.
>
> 1. Is it okay for you if we systematically crawl your git repositories for new source
code?
>
> 2. How would you recommend we get the repository directories? Our current approach
would be to get the git repositories of recently updated packages from this page:
>
http://pkgs.fedoraproject.org/gitweb/?o=age.
>
> 3. Are there any particular times or actions we should _avoid_?
>
> 4. Is there any particular person we should talk to in the future?
>
> An answer to these questions would be very helpful in improving the presence of
Fedora code files in Code Search. We look forward to hearing from you.
>
Thanks for contacting us, we really don't know how that would all react
but I'm ok with it provided we can contact you to change things later if
things do go south?
-Mike
Of course! We'll try to give you a heads-up the first time we crawl
it, so if you do notice anything strange, you'll know who to blame!
Thanks,
Ben
Ben St. John
jbstjohn(a)google.com
Tel: +49 (0) 89 83 930-9054
Fax:+49 (0) 89 83 930-9001
Google Germany GmbH
Dienerstr. 12
80331 München
AG Hamburg, HRB 86891 | Sitz der Gesellschaft: Hamburg
Geschäftsführer: Nikesh Arora, John Herlihy, Graham Law, Lloyd Martin,
Kent Walker