pdftohtml encoding question
Andras Simon
szajmi at gmail.com
Tue Mar 11 12:40:21 UTC 2008
On 3/10/08, François Patte <francois.patte at math-info.univ-paris5.fr> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> bonsoir,
>
> I am trying to convert a pdf file into html using pdftohtml provided by f8.
>
> I get an html file with "nice" characters like: ’ insead of apostroph,
> or Ã(c) instead of é...
>
> so i think that there is some coding problem.
>
> Using man pdftohtml, I got this info:
> - -enc <string>
> ~ output text encoding name
>
>
> but, I am unable to guess what is the syntax to use in order to have a
> correct output in utf8 for:
>
> Error: Couldn't find unicodeMap file for the 'utf8' encoding
>
> is the only answer I get if I try:
>
> pdftohtml -enc utf8 myfile.pdf
>
>
> i tried utf-8, latin1, latin-1, ISO_8859-1, .... without any success.
>
>
> If somebody knows... many thnaks in advance.
I don't, but
man pdftohtml
-> Pdftohtml was developed by Gueorgui Ovtcharov and Rainer Dorsch. It is
based and benefits a lot from Derek Noonburg?s xpdf package.
man xpdf
-> -enc encoding-name
Sets the encoding to use for text output. The encoding-name
must be defined with the unicodeMap command (see xpdfrc(5)).
This defaults to "Latin1" (which is a built-in encoding). [con-
fig file: textEncoding]
man xpdfrc
-> unicodeMap encoding-name map-file
[...]
The Latin1, ASCII7, Symbol, ZapfDingbats, UTF-8, and
UCS-2 encodings are predefined.
I'm afraid you'll have to read the elided part if you need an encoding
other than these six.
Hope this helps,
Andras
More information about the users
mailing list