Convert PDF to Text?
Keith G. Robertson-Turner
fedora-gmane.00003 at genesis-x.nildram.co.uk
Sun Apr 22 00:33:32 UTC 2007
Verily I say unto thee, that bdk at unb.ca spake thusly:
> I think pdftohtml is part of
>
> poppler-utils
Got it, thanks.
However, now there's another problem - it doesn't really work.
All it produces is "empty" html files, that is - they are proper html
(head, body, etc.) but the actual content is not there.
IOW it looks like it can only work if the content of the PDF really is
text, and not a scanned image of text.
This definitely works with Evince, I just wish there was a way to
automate it with a batch script, rather than me having to copy and paste
the text out of 2000 documents.
Here's the original PDF file:
http://antitrust.slated.org/www.iowaconsumercase.org/011607/0000/PX00111.pdf
And here's a video of Evince "OCRing" the text from the image:
http://media.slated.org/albums/userpics/Evince_podit.mp4 (H264 MP4)
Download the PDF and try it yourself.
It's bizarre, surely there's a way to automate this?
TIA.
--
K.
http://slated.org
.----
| I found [Vista] to be a dangerously unstable operating system,
| which has caused me to lose data ... unfortunately this product
| is unfit for any user. - [H]ardOCP, <http://tinyurl.com/3bpfs2>
`----
Fedora Core release 5 (Bordeaux) on sky, running kernel 2.6.20-1.2312.fc5
01:31:48 up 4 days, 23:03, 3 users, load average: 0.57, 0.52, 0.54
More information about the users
mailing list