Search Engine on start.fedoraproject.org

Fri Jul 23 15:10:04 UTC 2010

On Fri, 23 Jul 2010, Máirín Duffy wrote:

> On Thu, 2010-07-22 at 18:41 -0500, Mike McGrath wrote:
> > I liked one of the examples we talked about right at the end of that
> > conversation with rsync.net.  We considered rsync.net as a backup
> > solution.  This was something that was pretty clearly easy for us to
> > duplicate if we wanted to but for cost reasons or whatever we could have
> > just paid rsync.net for the storage.
> >
> > Now, the api for rsync is pretty well documented, there are several
> > implementations.  But we had no promises that rsync.net was even running a
> > FOSS version of rsync, nor what OS they may have been running, etc.  By
> > that definition our current policy would have disallowed us from using
> > rsync.net but I don't think it should have.  So the policy will have to be
> > altered a bit in that respect.
> >
> > At the same time, Google's search API is pretty well open.  It just uses
> > an http GET url and you get your results.  However, it does seem google's
> > actual search is closed.  So we certainly couldn't provide our own google
> > if we wanted (ignoring content licensing and size for now).
> >
> > So now the trick is to try to find wording that describes what Jef said
> > above.  We don't want to be locked into a vendor and we do want to be able
> > to duplicate an external service on our own if the service goes away.
>
> I kind of feel using non-open source code in any capacity is really bad,
> even if it has an open API, even if it is running on Linux, even if it's
> something we could practically duplicate should it disappear.
> *Especially* when there are open source alternatives (I can understand
> somewhat if there aren't.)
>

The problem is there's no way to know.  Any place that says they're
running apache and turbogears and some product, it could very well be
packaged in Fedora but there's no way to know if what they're using is
actually what what is in the source code.

> E.g. for search, would it be feasible to use Apache Solr rather than
> Google? I've seen a few success stories of sites using it. Here's an
> example (love the domain name o_O) of a guy who set up a shared Solr
> server on linode for 15 different websites and is very happy with it:
>
> http://www.opensourcecatholic.com/blog/oscatholic/setting-apache-solr
>

We discounted solr because it has no crawling/spidering facility:

https://fedoraproject.org/wiki/Infrastructure/Search

FOSS search is kind of a wasteland right now, there's no clear winners and
while our requirements weren't super high, they did some specific wants.

	-Mike