Fedora search

Sat Nov 2 09:23:10 UTC 2013

I'm mentioning it just because nobody has so far. Elastic Search[1]
which is also lucene-based, was designed from the very beginning to be
distributed (in contrast to solr). The product hasn't reached the symbolic
1.0 yet but is production-ready (for instance github[2] uses it).

Dridi

[1] http://www.elasticsearch.org/
[2] https://github.com/blog/1381-a-whole-new-code-search

On Sat, Nov 2, 2013 at 7:02 AM, Frankie Onuonga
<frankie.onuonga at gmail.com> wrote:
> Hi folks,
>
> I trust all is well.
>
> I believe this email will spark something so I will cc Kevin in it because
> of multiple reasons.
> I would like somethings to be clear from the word go.
>
> I am not too sure where to start with this email.
> I have combined emotions of extremely mad and extremely excited at the same
> time .
> First I would like to thank all those who have been kind enough to offer
> their assistance.
> I would also like to say thank you to those that have started brainstorming.
> I am sure in sometime we will be able to see the fruits of our labor.
>
> Now in regards to the reason why I am least amused today. I am going to be
> straight to the point and clear with this. I do not appreciate a user who is
> here to critic and offer no solution. I generally follow open source ethics
> but if your job is to come in and critic with a  lot of rubbish opinions
> (yes I am referring directly to whoever posted that this is not something to
> look into and even worse insist on it) then please don't waste your time.
> Keep off this thread.
>
> It does not amuse me to the slightest bit when criticism is given with no
> solution. I understand when someone makes a mistake. I also understand when
> someone has a valid point.
> I do not understand when you give an opinion with a solution being you will
> never use the service.
> You can not rate something before use.
> I would also advice you have a look at the mailing list guidelines so that
> you are up to speed.
>
>
> The best of minds are probably here with us, people do not mention who they
> work for but trust me they are here. Fine we admit google is miles ahead. I
> personally know they took time to get there.
>
> I also have read their papers and there are open source solutions that have
> been mentioned earlier (Apache lucene/sol) that try and mimic this. Seeing
> it is for our use , which in my opinion is small I think it is a great
> start.
>
> Third, free and open source is all that is used here. simple.
>
> I would therefore proceed to mention, if you are not contributing in a
> positive way, be kind to the world. We do not have super cow abilities.
>
>
> Kind Regards,
>
> Onuonga Frankie
>
>
> On Sat, Nov 2, 2013 at 4:57 AM, Alek Paunov <alex at declera.com> wrote:
>>
>> On 02.11.2013 02:32, Michael Cronenworth wrote:
>>>
>>> This will be my last mailing on this topic as I will not contribute or
>>> use this feature in Fedora, but this reply warranted clarification.
>>>
>>> On 11/01/2013 06:14 PM, Alek Paunov wrote:
>>>>
>>>> Another simple answer: CSE is a low quality search - no facets, no
>>>> (real)
>>>> content age restriction. The same is valid also for every other
>>>> service/application which is solely based on generic web pages crawling.
>>>
>>>
>>> CSE is as full blown as a Google Appliance. More advanced than anything
>>> you can write in Perl/Python/Ruby in a month. Site restrictions, keyword
>>> restrictions, (real) age restrictions, autocomplete help, synonyms,
>>> image search, all of which are provided through a XML API.[1]
>>>
>>
>> Indeed. Don't get me wrong - I like CSE service for what it is good for.
>> It seems that I had not been clear enough with my English - Sorry!
>>
>> Nobody is able to write a good, modern index in a month - lucene/solr,
>> xapian, etc, are all evolved in long, long years. Our task is a proper
>> deployment of one or combination of them, not inventing a new.
>>
>> Why e.g. solr instead of CSE or dpsearch (which is opensource, and also
>> mentioned in the old tickets)?
>>
>> Granularity: With CSE/dpsearch the indexed content unit is a crawled and
>> automatically processed Web document (I say Web document instead of HTML
>> page, because CSE handles many types). Not single BZ comment. Not change
>> comment in a spec file. Not Git commit. Or in the reverse direction: Email,
>> not thread (because we do not yet have yet archive page displaying the whole
>> thread). I.e. there are no concept of document and subdocuments (in which
>> most of our content belongs).
>>
>> Attributes: You can not attach custom scalar/category attributes (the base
>> of the faceted search) to the FTS indexed units.
>>
>> Please correct me if I am wrong about CSE with some of the above.
>>
>> Fedora has datasources (bugs, wikis, mails, packages, docs, etc,) not just
>> sitemaps/pages, and they all talk about same things (common topic
>> hierarchies, common tag hierarchies, common authors). They form highly
>> interlinked virtual knowledge base.
>>
>> We should start index the sources in their native structure now, to be
>> able to upgrade some happy day to full blown semantic search (when
>> available), which is actually what we badly need.
>>
>>
>>>> In our case, we are the owners of the content, we know how it is
>>>> structured, we
>>>> know where are the feeds with the pure content changes, we can
>>>> explicitly feed
>>>> the indexes with all named attributes of the content nodes and later
>>>> use them.
>>>
>>>
>>> But you don't know how other people on the web find and link to Fedora
>>> pages to provide accurate page ranking.
>>>
>>
>> Personas: 1. Active Fedora contributor, 2. Fedora contributor, 3. Power
>> Fedora user/sysadmin, 4. Fedora user, 5. Potential Fedora user, 6. IT
>> journalist.
>>
>> IMHO, at least for 1-3 the results ordering by recursive link-rank
>> valuation (Google page ranking) is more an issue than an advantage.
>>
>> For 4 (also important) the relevant sets are probably: the docs, part of
>> wiki, ask.fp.o and might be users at . I don't know - not always stackoverflow
>> 'relevance' top resuls on a set of keywords are the same as google with
>> site:stackoverflow.com in the query ...
>>
>> For 5-6 Google page ranking is probably the best, but they will use Google
>> instead of search.fp.o anyway (at least initially, latter their more
>> concrete queries would be more like 3-4 ones).
>>
>> Kind Regards,
>> Alek
>>
>>
>> --
>> devel mailing list
>> devel at lists.fedoraproject.org
>> https://admin.fedoraproject.org/mailman/listinfo/devel
>> Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
>
>
>
>
> --
> Skype: Frankie.Onuonga
> twitter: Frankieonuonga
> irc #freenode: Frankie.onuonga
>
> --
> devel mailing list
> devel at lists.fedoraproject.org
> https://admin.fedoraproject.org/mailman/listinfo/devel
> Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct