Systematically crawling Fedoraproject.org repositories

Pascal Minnerup pminnerup at google.com
Thu Sep 2 13:32:15 UTC 2010


Dear Fedora team,

We on the Google Code Search project (www.google.com/codesearch) want to
improve the quality of our index, and as part of that, would like to
systematically crawl the fedora git repositories of fedoraproject.org, which
we consider one of the major hosts of open source. Our crawlers use
bandwidth throttling that should ensure that we don't overstress your web
servers.

1. Is it okay for you if we systematically crawl your git repositories for
new source code?

2. How would you recommend we get the repository directories? Our current
approach would be to get the git repositories of recently updated packages
from this page: http://pkgs.fedoraproject.org/gitweb/?o=age.

3. Are there any particular times or actions we should _avoid_?

4. Is there any particular person we should talk to in the future?

An answer to these questions would be very helpful in improving the presence
of Fedora code files in Code Search. We look forward to hearing from you.

Sincerely,

Pascal Minnerup
Google Code Search
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/websites/attachments/20100902/0cabc5f0/attachment.html>


More information about the websites mailing list