Mike MacCana wrote:
>.
>They (meaning engineers at redhat) are discussing this. The solution
>won't use Lucene, as Lucene treats all fine content as equal - ie, it
>doesn't know about headings being different from body text and so on.
>
>Mike
>
>
Also, Lucene suffers from the Java UCS-16 scandal: they chose a
character encoding which is good for Japanese, but bulks up european
languages by a factor of two and doesn't support enough characters to do
a good job with Chinese.
Because of this, Lucene loses a factor of two in performance
compared to C++ competitors such as Xapian, which is a minus for those
who care about performance on computers that aren't monster servers with
8 megs of RAM and Ultra 320 disks. (Funny enough, we're not all that
happy with Lucene performance on such a machine... But we've got a lot
of text...)