Product: Fedora https://bugzilla.redhat.com/show_bug.cgi?id=915448
Bug ID: 915448 Summary: Spell check problem (UTF8 conversion?) with Hunspell Product: Fedora Version: 18 Component: emacs Severity: unspecified Priority: unspecified Reporter: igor.redhat@gmail.com
+++ This bug was initially created as a clone of Bug #725235 +++
Created attachment 514921 Sample text with accents (md5sum f91f52b0ae84fd91aa25e0d671228a23)
Description of problem: When using emacs / hunspell to spell-check a UTF-8 encoded text file, emacs chokes on some accented letters, with the error message:
Ispell error: UTF-8 encoding error. Missing continuation byte in 0. character position: Spell-checking testtext.txt using hunspell with default dictionary...done ispell-process-line: Wrong type argument: number-or-marker-p, nil
Version-Release number of selected component (if applicable): emacs-23.2-19.fc15.i686
using
hunspell-1.2.15-2.fc15.i686 hunspell-en-0.20110112-4.fc15.noarch
How reproducible: Always on my netbook with Fedora 15 for i686.
Steps to Reproduce: 1. Open a text file with accented characters, e.g. the attached test case. 2. Start spell-check in emacs (after making sure that aspell is not installed, so that emacs will use hunspell.) 3.
Actual results: Error message as above.
Expected results: Correct spell-checking session...
Additional info: This does not happen with aspell. It also does not happen when spell-checking files using hunspell on the command line.
For some other files, the error message was: "this UTF-8 encoding can't convert to UTF-16"
Using "enter debugger on error" on the text file, the following appears in *Backtrace* (with byte code removed):
Debugger entered--Lisp error: (wrong-type-argument number-or-marker-p nil) ispell-parse-output(#("ël!" 0 3 (charset iso-8859-1)) nil 0) ispell-process-line("^Titre: noël!\n" nil) byte-code("....310\311!\210)\312\313....") ispell-region(1 38) ispell-buffer() call-interactively(ispell-buffer nil nil)
--- Additional comment from Akira TAGOH on 2012-06-29 07:00:33 EDT ---
ispell.el has the code to find the spell checker program out though, it doesn't update ispell-dictionary-base-alist according to the result. it should be optimized against it.
Here is what my .emacs has and I want English spell checker only:
(setq ispell-dictionary-base-alist '((nil "[[:alpha:]]" "[^[:alpha:]]" "[']" nil ("-d" "en_US") nil utf-8))) (eval-after-load "ispell" (progn (setq ispell-extra-args '("-a" "-i" "utf-8") ispell-silently-savep t)))
It work well here.
--- Additional comment from Fedora End Of Life on 2013-01-16 09:39:38 EST ---
This message is a reminder that Fedora 16 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 16. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '16'.
Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 16's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 16 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" and open it against that version of Fedora.
Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
--- Additional comment from Fedora End Of Life on 2013-02-13 10:55:37 EST ---
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.
Product: Fedora https://bugzilla.redhat.com/show_bug.cgi?id=915448
--- Comment #1 from igor.redhat@gmail.com igor.redhat@gmail.com --- This still happens on F18 with emacs-24.2-6.fc18.x86_64
Product: Fedora https://bugzilla.redhat.com/show_bug.cgi?id=915448
Fedora Admin XMLRPC Client fedora-admin-xmlrpc@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Assignee|kklic@redhat.com |phracek@redhat.com
Product: Fedora https://bugzilla.redhat.com/show_bug.cgi?id=915448
--- Comment #2 from Fedora Admin XMLRPC Client fedora-admin-xmlrpc@redhat.com --- This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
Product: Fedora https://bugzilla.redhat.com/show_bug.cgi?id=915448
Petr Hracek phracek@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED
--- Comment #3 from Petr Hracek phracek@redhat.com --- I have found that similar problem is solved in upstream http://lists.gnu.org/archive/html/bug-gnu-emacs/2013-04/msg00330.html
https://bugzilla.redhat.com/show_bug.cgi?id=915448
--- Comment #4 from Petr Hracek phracek@redhat.com --- This bug seems to be a hunspell bug.
Proposed patches are: http://sourceforge.net/p/hunspell/patches/57/ http://lists.gnu.org/archive/html/bug-gnu-emacs/2013-04/msg00341.html
I will reassign that into hunspell package.
https://bugzilla.redhat.com/show_bug.cgi?id=915448
Petr Hracek phracek@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |caolanm@redhat.com Component|emacs |hunspell Assignee|phracek@redhat.com |caolanm@redhat.com
https://bugzilla.redhat.com/show_bug.cgi?id=915448
--- Comment #5 from Caolan McNamara caolanm@redhat.com --- poked nemeth to have a look
https://bugzilla.redhat.com/show_bug.cgi?id=915448
--- Comment #6 from Fedora End Of Life endoflife@fedoraproject.org --- This message is a reminder that Fedora 18 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 18. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '18'.
Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 18's end of life.
Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 18 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 18's end of life.
Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
https://bugzilla.redhat.com/show_bug.cgi?id=915448
Jens Petersen petersen@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Version|18 |19
--- Comment #7 from Jens Petersen petersen@redhat.com --- Moving to F19 - I assume it is still affected.
https://bugzilla.redhat.com/show_bug.cgi?id=915448
Maciek Borzecki maciek.borzecki@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |maciek.borzecki@gmail.com
--- Comment #8 from Maciek Borzecki maciek.borzecki@gmail.com --- Can we add one of the two patches? I mean that the bug itself is like >2 years old, upstream does not seem to share the interest fix this anyway, while it does hurt users (such as emacs users :)).
I know that Fedora's policy is not to bundle patches, but in this case we'll keep on having a broken hunspell in repositories.
I can help and update the spec and bundle the patches if this is going to speed up the process.
https://bugzilla.redhat.com/show_bug.cgi?id=915448
--- Comment #9 from igor.redhat@gmail.com igor.redhat@gmail.com --- It'd be really awesome to get the patche in - as it stands, emacs' spell check is not operational for languages other than English. BTW, this issue still exists in F20.
https://bugzilla.redhat.com/show_bug.cgi?id=915448
Giuseppe Castagna gc@pps.univ-paris-diderot.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |gc@pps.univ-paris-diderot.f | |r
--- Comment #10 from Giuseppe Castagna gc@pps.univ-paris-diderot.fr --- The bug is still in the rawhide version (normal, last update was in October 2013).
As suggested Petr Hracek it suffices to apply a very simple patch: it takes 5 minutes and it works like a charm.
For those that are desperate to wait and want to use emacs for languages other than English here you are how to do it.
1. download the hunspell src.rpm, for instance (or the version you find there)
http://dl.fedoraproject.org/pub/fedora/linux/development/rawhide/source/SRPM...
2. install it as root
rpm -ivh hunspell-1.3.2-15.fc21.src.rpm
3. save the following text in the file /root/rpmbuild/SOURCES/hunspell.emacs.patch
--- src/tools/hunspell.cxx~0 2011-01-21 19:01:29.000000000 +0200 +++ src/tools/hunspell.cxx 2013-02-07 10:11:54.443610900 +0200 @@ -710,13 +748,22 @@ if (pos >= 0) { fflush(stdout); } else { char ** wlst = NULL; - int ns = pMS[d]->suggest(&wlst, token); + int byte_offset = parser->get_tokenpos() + pos; + int char_offset = 0; + if (strcmp(io_enc, "UTF-8") == 0) { + for (int i = 0; i < byte_offset; i++) { + if ((buf[i] & 0xc0) != 0x80) + char_offset++; + } + } else { + char_offset = byte_offset; + } + int ns = pMS[d]->suggest(&wlst, chenc(token, io_enc, dic_enc[d])); if (ns == 0) { - fprintf(stdout,"# %s %d", token, - parser->get_tokenpos() + pos); + fprintf(stdout,"# %s %d", token, char_offset); } else { fprintf(stdout,"& %s %d %d: ", token, ns, - parser->get_tokenpos() + pos); + char_offset); fprintf(stdout,"%s", chenc(wlst[0], dic_enc[d], io_enc)); } for (int j = 1; j < ns; j++) { @@ -745,13 +792,23 @@ if (pos >= 0) { if (root) free(root); } else { char ** wlst = NULL; + int byte_offset = parser->get_tokenpos() + pos; + int char_offset = 0; + if (strcmp(io_enc, "UTF-8") == 0) { + for (int i = 0; i < byte_offset; i++) { + if ((buf[i] & 0xc0) != 0x80) + char_offset++; + } + } else { + char_offset = byte_offset; + } int ns = pMS[d]->suggest(&wlst, chenc(token, io_enc, dic_enc[d])); if (ns == 0) { fprintf(stdout,"# %s %d", chenc(token, io_enc, ui_enc), - parser->get_tokenpos() + pos); + char_offset); } else { fprintf(stdout,"& %s %d %d: ", chenc(token, io_enc, ui_enc), ns, - parser->get_tokenpos() + pos); + char_offset); fprintf(stdout,"%s", chenc(wlst[0], dic_enc[d], ui_enc)); } for (int j = 1; j < ns; j++) {
4. edit the file /root/rpmbuild/SPECS/hunspell.spec
- Add on line 25 Patch5: hunspell.emacs.patch - Add on line 48 %patch5 -p0 -b .emacs
5. rebuild the rpm files
rpmbuild -ba SPECS/hunspell.spec
6. Install the rpm files you now have in /root/rpmbuild/RPMS/<arch>
Enjoy!
https://bugzilla.redhat.com/show_bug.cgi?id=915448
--- Comment #11 from Giuseppe Castagna gc@pps.univ-paris-diderot.fr --- Actually, be careful that bugzilla added nasty line breaks to the patch file. Better take hunspell.emacs.patch from here
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=7781#31
https://bugzilla.redhat.com/show_bug.cgi?id=915448
--- Comment #12 from Maciek Borzecki maciek.borzecki@gmail.com --- (In reply to Giuseppe Castagna from comment #11)
Actually, be careful that bugzilla added nasty line breaks to the patch file. Better take hunspell.emacs.patch from here
I did a koji build some time ago: http://koji.fedoraproject.org/koji/taskinfo?taskID=6861381
Although the patch was applied and spell checking seemed to work if I piped a file in the terminal, Emacs did not work as expected. Feel free to try the build, if you do so, please report back if it works for you. src.rpm is included there in case you want to rebuild
https://bugzilla.redhat.com/show_bug.cgi?id=915448
--- Comment #13 from Maciek Borzecki maciek.borzecki@gmail.com --- Created attachment 899034 --> https://bugzilla.redhat.com/attachment.cgi?id=899034&action=edit 0001-Resolves-rhbz-915448-UTF-8-handling.patch
fedpkg patch
https://bugzilla.redhat.com/show_bug.cgi?id=915448
--- Comment #14 from Giuseppe Castagna gc@pps.univ-paris-diderot.fr --- (In reply to Maciek Borzecki from comment #12)
Feel free to try the build, if you do so, please report back if it works for you. src.rpm is included there in case you want to rebuild
Thank you a lot. I've just installed it on the 3 machines I use most and I am testing it. BTW, I have extensively used the solution I suggested in my post and it worked flawlessly for me.
https://bugzilla.redhat.com/show_bug.cgi?id=915448
Caolan McNamara caolanm@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |CLOSED Fixed In Version| |hunspell-1.3.3-4.fc22 Resolution|--- |RAWHIDE Last Closed| |2014-10-16 10:56:30
--- Comment #15 from Caolan McNamara caolanm@redhat.com --- integrated upstream
i18n-bugs@lists.fedoraproject.org