Search Engine on start.fedoraproject.org

Thu Jul 22 23:41:29 UTC 2010

On Thu, 22 Jul 2010, Jeff Spaleta wrote:

> On Wed, Jul 21, 2010 at 11:33 AM, Mike McGrath <mmcgrath at redhat.com> wrote:
> > Thoughts, comments?
> >
> >        -Mike "Sorry but the free road is a hard one sometimes" McGrath
>
>
> Mike and I had a nice little discussion on irc about all this and I
> want to sum up my thoughts for everybody about what our policy should
> really be in terms of web services.
>
> First and foremost i think we all need to come into door with an
> understanding that all external webservices break traditional
> assumptions about how traditional open source licensing(prior to
> AGPL-like constructions) provide freedoms to users. Such licensing has
> primarily relied on the act of distribution as a key provision to
> protecting particular freedoms that we want to retain as users and
> developers. Web services break the assumptions that usage by a new
> entity requires distribution to that entity. So a lot of our hard
> fought consensus understanding on how all this open licensing stuff
> works gets thrown under the bus with that underlying assumption.
>
> A web service could be completely built with openly licensed code
> under traditional open licensing (BSD) and we'd never be required to
> be given access to that codebase. And even for services that offer
> access to a codebase (AGPL), we have no practical ability to verify
> that the externally running service does indeed match the codebase we
> have access to making it difficult to ensure compliance to the
> licensing.
>
> So with that in mind. I'm going to ask everyone to throw away the
> language in the policy that Mike pointed to and ask people to answer a
> more fundamental question. What freedoms do we need to
> protect/maintain when we choose to rely on _any_  externally built and
> maintained web service.
>
> I'll go ahead and give you how I've answered that question in my
> conversation with Mike, but I'm more than happy to have others
> disagree with me and express an alternative opinion.
>
> What I think we need to protect is API compliance and to ensure that
> we can reimplement a service on our own as a replacement if the
> external service goes up in smoke for some reason. Which means we need
> services with documented stable APIs and we need to identify an
> existing open codebase that we can rebuild into a replacement service
> that conforms to the APIs we make use of in any 3rd party service
> provider we choose to rely on.. but not necessarily the same codebase
> that the 3rd party service provider is running.
>
> Anything more restrictive than that as a policy and I think we pretty
> much prevent ourselves from depending on any external service without
> bending the policy way way out of shape to meet the practical
> realities of relying on any external service provider...cough Red Hat
> bugzilla...cough.
>
> And I don't mean that we ensure that we can build an equivalent
> service in-house either. There's a lot about the day-to-day utility of
> a service, even with open codebase, that can't be replicated without a
> signficant infrastructure investment that we simply aren't going to be
> able to make all the time.  We _could_ build our on in-house search
> engine, but we aren't going to be investing in the infrastructure to
> index the web very effectively so our in-house service would have
> drastically reduced utility.  But we could build it if we felt we had
> to.
>
> It really comes down to API restraint on our part. We restrict
> ourselves to the use of services with published APIs, and we ensure
> there is an open codebase out in the wild that conforms to those APIs
> and we squirrel that codebase away as a hedge against the external
> service provider going dark or closing down its public APIs.
>

I liked one of the examples we talked about right at the end of that
conversation with rsync.net.  We considered rsync.net as a backup
solution.  This was something that was pretty clearly easy for us to
duplicate if we wanted to but for cost reasons or whatever we could have
just paid rsync.net for the storage.

Now, the api for rsync is pretty well documented, there are several
implementations.  But we had no promises that rsync.net was even running a
FOSS version of rsync, nor what OS they may have been running, etc.  By
that definition our current policy would have disallowed us from using
rsync.net but I don't think it should have.  So the policy will have to be
altered a bit in that respect.

At the same time, Google's search API is pretty well open.  It just uses
an http GET url and you get your results.  However, it does seem google's
actual search is closed.  So we certainly couldn't provide our own google
if we wanted (ignoring content licensing and size for now).

So now the trick is to try to find wording that describes what Jef said
above.  We don't want to be locked into a vendor and we do want to be able
to duplicate an external service on our own if the service goes away.

I'd also like wording that states that "duplicating it ourselves" does not
mean "writing our own rsync".  I think Jeff's comment about API restraint
is going to go in the policy as a "must" requirement.  But I think there's
some should's there about actual software availability.  If we actually do
start relying on some external service and it goes away and our own
recourse is to build our own from scratch, that's a recipe for disaster.
But if our recourse is to install some software and can duplicate (or near
duplicate) the functionality I think that's more feasible and realistic.

I'll continue to mull over this a bit.  We want a simple policy that's
easy to understand but still protects our values.  Please do comment if
you have thoughts on this.

	-Mike