New subject: Bugzilla dupes attack

Wednesday, 15 February 2006

At 04:30 PM 2/14/2006, Jeff Spaleta wrote:

...
I'm suggesting that there is very little to be learned from the
specific comparison to google.  I'm saying that since we are incapable
of examining the details of how google search works, there is very
little to be gleamed from looking at example output from google at
all. The magic in the google search is the search algorithm which
produces the results. And its exactly that piece of magic which we
don't have access to to examine and reuse.  Are you really suggesting
that we blindly reverse-engineer the google search algorithm and apply
it to bugzilla? 
         The magic of Google isn't in the ranking algorithm,  it's in the 
kind and quantity of data that it searches over and the expectations people 
have of it.

         The problems of information retrieval depend on the scale of your 
database.  Historically,  people have evaluated IR systems based on two 
things:  precision and recall.

         If you've got a database with 10,000 items,  and there is 1 item 
that matches,  there's a lot of risk that that 1 item will be lost if 
someone doesn't type in the perfect search term.  Recall is the issue,  so 
it's important to stem words (working -> work),  have a system that's smart 
about synonyms,  etc.

         Now,  if you're searching a database with 10 billion items,  there 
will be 1 million hits for a 1:10000 item.  The issue is picking out the 
best items out of that million items,  so there's more stress on precise 
phrase matching,  things like pagerank.  Antispam measures are 
essential,  and so is the removal of duplicate documents.

         Google's trying to do something entirely different from what 
bugzilla search is trying to do or,  say,  beagle should do on your desktop. 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: Bugzilla dupes attack