Systematically crawling Fedoraproject.org repositories

Pascal Minnerup pminnerup at google.com
Fri Sep 3 08:43:42 UTC 2010


No, we did not look at these repositories, yet. Thanks, I'll add the lists
to the harvester.

Pascal

Pascal Minnerup
Google Code Search


On Thu, Sep 2, 2010 at 6:55 PM, Mike McGrath <mmcgrath at redhat.com> wrote:

> On Thu, 2 Sep 2010, Ben St. John wrote:
>
> > On Thu, Sep 2, 2010 at 5:34 PM, Mike McGrath <mmcgrath at redhat.com>
> wrote:
> > > On Thu, 2 Sep 2010, Pascal Minnerup wrote:
> > >
> > >> Dear Fedora team,
> > >>
> > >> We on the Google Code Search project (www.google.com/codesearch) want
> to improve the quality of our index, and as part of that, would like to
> systematically crawl the fedora
> > >> git repositories of fedoraproject.org, which we consider one of the
> major hosts of open source. Our crawlers use bandwidth throttling that
> should ensure that we don't
> > >> overstress your web servers.
> > >>
> > >> 1. Is it okay for you if we systematically crawl your git repositories
> for new source code?
> > >>
> > >> 2. How would you recommend we get the repository directories? Our
> current approach would be to get the git repositories of recently updated
> packages from this page:
> > >> http://pkgs.fedoraproject.org/gitweb/?o=age.
> > >>
> > >> 3. Are there any particular times or actions we should _avoid_?
> > >>
> > >> 4. Is there any particular person we should talk to in the future?
> > >>
> > >> An answer to these questions would be very helpful in improving the
> presence of Fedora code files in Code Search. We look forward to hearing
> from you.
> > >>
> > >
> > > Thanks for contacting us, we really don't know how that would all react
> > > but I'm ok with it provided we can contact you to change things later
> if
> > > things do go south?
> > >
> > >        -Mike
> >
> > Of course! We'll try to give you a heads-up the first time we crawl
> > it, so if you do notice anything strange, you'll know who to blame!
> >
>
> I'm not sure if you're already looking at the fedorahosted repos but we
> have several web based repos at
>
> http://git.fedorahosted.org/git/
> http://hg.fedorahosted.org/hg/
> http://bzr.fedorahosted.org/bzr/
>
>        -Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/websites/attachments/20100903/da5ab74a/attachment.html>


More information about the websites mailing list