[change req] Allow fedorahosted robots.txt to only crawl /wiki/*

Kevin Fenzi kevin at scrye.com
Mon Dec 31 01:12:49 UTC 2012


On Sun, 30 Dec 2012 20:07:37 -0500
Ricky Elrod <codeblock at elrod.me> wrote:

> We've been seeing load spikes on hostedXX, following
> df7e8578432b224d9576dc8359f0729763861526. This semi-reverts that
> commit and only allows /wiki/* to be crawled.
> 
> diff --git a/configs/web/fedorahosted.org/fedorahosted-robots.txt
> b/configs/web/fedorahosted.org/fedorahosted-robots.txt
> index cd572f8..7782677 100644
> --- a/configs/web/fedorahosted.org/fedorahosted-robots.txt
> +++ b/configs/web/fedorahosted.org/fedorahosted-robots.txt
> @@ -1,5 +1,5 @@
>  User-agent: *
> -Disallow: /*/browser
> -Disallow: /*/search
> +Allow: /wiki/*
> +Disallow: /
>  user-agent: AhrefsBot
>  disallow: /

It seems like some things are causing the load, but it's hard to
isolate (in particular timeline, changeset and log since they all hit
the repo browser). 

I'm +1 to just applying this one for now and we can adjust it further
after the freeze is over. 

kevin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://lists.fedoraproject.org/pipermail/infrastructure/attachments/20121230/9375cf1d/attachment.sig>


More information about the infrastructure mailing list