Fedora search

Frankie Onuonga frankie.onuonga at gmail.com
Sat Nov 2 06:02:57 UTC 2013


Hi folks,

I trust all is well.

I believe this email will spark something so I will cc Kevin in it because
of multiple reasons.
I would like somethings to be clear from the word go.

I am not too sure where to start with this email.
I have combined emotions of extremely mad and extremely excited at the same
time .
First I would like to thank all those who have been kind enough to offer
their assistance.
I would also like to say thank you to those that have started brainstorming.
I am sure in sometime we will be able to see the fruits of our labor.

Now in regards to the reason why I am least amused today. I am going to be
straight to the point and clear with this. I do not appreciate a user who
is here to critic and offer no solution. I generally follow open source
ethics but if your job is to come in and critic with a  lot of rubbish
opinions (yes I am referring directly to whoever posted that this is not
something to look into and even worse insist on it) then please don't waste
your time. Keep off this thread.

It does not amuse me to the slightest bit when criticism is given with no
solution. I understand when someone makes a mistake. I also understand when
someone has a valid point.
I do not understand when you give an opinion with a solution being you will
never use the service.
You can not rate something before use.
I would also advice you have a look at the mailing list guidelines so that
you are up to speed.


The best of minds are probably here with us, people do not mention who they
work for but trust me they are here. Fine we admit google is miles ahead. I
personally know they took time to get there.

I also have read their papers and there are open source solutions that have
been mentioned earlier (Apache lucene/sol) that try and mimic this. Seeing
it is for our use , which in my opinion is small I think it is a great
start.

Third, free and open source is all that is used here. simple.

I would therefore proceed to mention, if you are not contributing in a
positive way, be kind to the world. We do not have super cow abilities.


Kind Regards,

Onuonga Frankie


On Sat, Nov 2, 2013 at 4:57 AM, Alek Paunov <alex at declera.com> wrote:

> On 02.11.2013 02:32, Michael Cronenworth wrote:
>
>> This will be my last mailing on this topic as I will not contribute or
>> use this feature in Fedora, but this reply warranted clarification.
>>
>> On 11/01/2013 06:14 PM, Alek Paunov wrote:
>>
>>> Another simple answer: CSE is a low quality search - no facets, no (real)
>>> content age restriction. The same is valid also for every other
>>> service/application which is solely based on generic web pages crawling.
>>>
>>
>> CSE is as full blown as a Google Appliance. More advanced than anything
>> you can write in Perl/Python/Ruby in a month. Site restrictions, keyword
>> restrictions, (real) age restrictions, autocomplete help, synonyms,
>> image search, all of which are provided through a XML API.[1]
>>
>>
> Indeed. Don't get me wrong - I like CSE service for what it is good for.
> It seems that I had not been clear enough with my English - Sorry!
>
> Nobody is able to write a good, modern index in a month - lucene/solr,
> xapian, etc, are all evolved in long, long years. Our task is a proper
> deployment of one or combination of them, not inventing a new.
>
> Why e.g. solr instead of CSE or dpsearch (which is opensource, and also
> mentioned in the old tickets)?
>
> Granularity: With CSE/dpsearch the indexed content unit is a crawled and
> automatically processed Web document (I say Web document instead of HTML
> page, because CSE handles many types). Not single BZ comment. Not change
> comment in a spec file. Not Git commit. Or in the reverse direction: Email,
> not thread (because we do not yet have yet archive page displaying the
> whole thread). I.e. there are no concept of document and subdocuments (in
> which most of our content belongs).
>
> Attributes: You can not attach custom scalar/category attributes (the base
> of the faceted search) to the FTS indexed units.
>
> Please correct me if I am wrong about CSE with some of the above.
>
> Fedora has datasources (bugs, wikis, mails, packages, docs, etc,) not just
> sitemaps/pages, and they all talk about same things (common topic
> hierarchies, common tag hierarchies, common authors). They form highly
> interlinked virtual knowledge base.
>
> We should start index the sources in their native structure now, to be
> able to upgrade some happy day to full blown semantic search (when
> available), which is actually what we badly need.
>
>
>  In our case, we are the owners of the content, we know how it is
>>> structured, we
>>> know where are the feeds with the pure content changes, we can
>>> explicitly feed
>>> the indexes with all named attributes of the content nodes and later
>>> use them.
>>>
>>
>> But you don't know how other people on the web find and link to Fedora
>> pages to provide accurate page ranking.
>>
>>
> Personas: 1. Active Fedora contributor, 2. Fedora contributor, 3. Power
> Fedora user/sysadmin, 4. Fedora user, 5. Potential Fedora user, 6. IT
> journalist.
>
> IMHO, at least for 1-3 the results ordering by recursive link-rank
> valuation (Google page ranking) is more an issue than an advantage.
>
> For 4 (also important) the relevant sets are probably: the docs, part of
> wiki, ask.fp.o and might be users at . I don't know - not always
> stackoverflow 'relevance' top resuls on a set of keywords are the same as
> google with site:stackoverflow.com in the query ...
>
> For 5-6 Google page ranking is probably the best, but they will use Google
> instead of search.fp.o anyway (at least initially, latter their more
> concrete queries would be more like 3-4 ones).
>
> Kind Regards,
> Alek
>
>
> --
> devel mailing list
> devel at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/devel
> Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
>



-- 
Skype: Frankie.Onuonga
twitter: Frankieonuonga
irc #freenode: Frankie.onuonga
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/devel/attachments/20131102/79d09a1e/attachment.html>


More information about the devel mailing list