docs.fp.o Search [Was: Re: CMS]

Paul W. Frields stickster at gmail.com
Mon Jun 7 15:20:55 UTC 2010


Including Infrastructure team gurus on this email thread for
additional expert advice and assistance. :-) Setting reply-to docs@
list.

On Sat, Jun 05, 2010 at 05:55:05AM -0400, Eric Sparks Christensen wrote:
> On 06/05/2010 03:14 AM, Ruediger Landmann wrote:
> > One of the new features of Publican 2.0 that I haven't mentioned yet is 
> > that it creates an XML sitemap for search engine bots to crawl. You can 
> > find d.fp.o.'s sitemap here:
> > 
> > http://docs.fedoraproject.org/Sitemap
> 
> Awesome.
> 
> > 
> > I've fed this to Google, Yahoo, and Bing, and they're all slowly 
> > re-indexing the site. The map now contains a little over 2,000 URLs and 
> > at the time of writing, Google has crawled about 350 of them.
> 
> I know that Google has some algorithm that figures out how often your
> site changes and then crawls more or less frequently.  Not sure if we
> could work with Google on scheduling this more or less around the time
> of a release.  Of course I'm guessing that we won't have this big
> re-structuring next time, either.
> 
> > 
> > The dilemma we face is the decision of when to turn off the 404 
> > redirect. For the sake of all the existing links scattered around the 
> > net (both on the Fedora Project site and off it), we'd want to postpone 
> > this as far as possible. On the other hand, any bot attempting to verify 
> > that link gets a page served up and probably concludes that the link is 
> > valid; I suspect that if these links 404ed, they'd start to evaporate 
> > from search results.
> 
> Isn't there a type of redirect (302?) that tells you that you are being
> redirected so you don't think the URL is valid?

I thought that 301 and 302 both do this, but 302 generally is used for
temporary redirects, and 301 for permanent ones.  So 301 would be the
one you're thinking of here perhaps?

> > Given that existing links around the net are pointing to (at most 
> > recent) the F12 versions of docs, there will be no need to keep the 404 
> > redirect in place past October; however, if we want to start allowing 
> > dead links to 404 out rather than poison search results, maybe we should 
> > bring that date forward? The sooner we do this, the sooner search will 
> > start working properly...
> 
> Yeah, the sooner the better, IMO.

I wonder how long you would need 301's in place before it's safe to
remove them?  Because I think Infrastructure's not keen on maintaining
a big list of these.

-- 
Paul W. Frields                                http://paul.frields.org/
  gpg fingerprint: 3DA6 A0AC 6D58 FEC4 0233  5906 ACDB C937 BD11 3717
  http://redhat.com/   -  -  -  -   http://pfrields.fedorapeople.org/
          Where open source multiplies: http://opensource.com


More information about the docs mailing list