PDF to text?

Cameron Simpson cs at zip.com.au
Fri Aug 12 22:25:44 UTC 2011


On 12Aug2011 12:09, Bob Goodwin <bobgoodwin at wildblue.net> wrote:
| On 12/08/11 12:04, Genes MailLists wrote:
| > On 08/12/2011 11:58 AM, Bob Goodwin wrote:
| >> On 12/08/11 11:22, Genes MailLists wrote:
| >>> On 08/12/2011 11:16 AM, Madhav Ancha wrote:
| >>>     You could try this fedora app:  pdftotext
| >>>
| >>          As can be seen I tried several combinations, thought perhaps it
| >>          couldn't handle the file nam in quotes "Couier  etc" but nothing
| >>          seems to do it?
| >>
| >    Is it possible the PDF contains an image of the text rather than text
| > itself ?
| 
|         I'm not sure, how would I tell? It's an attachment to an html
|         cover letter. The Fedora default app, disolays it with no
|         complaints.

Is it ridiculously large for the amount of text? Does it seem to have
scanner artifacts in the text - "graininess" if you peer closely, fuzzy
text instead of perfectly formed letters (i.e. a picture of text instead
of text rendered by your computer from a font)?

Personally I use pdftohtml to convert PDFs (then an HTML-to-text
pipeline on the end of that). Possibly pdftotext does exactly that
anyway. Of course it achieves nothing for me if the PDF is a scan.

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

Many companies are just now realizing that testing for the year 2000
problems will likely be more time-consuming and expensive than the
fix-it phase.   - Bob Evans, Information Week, September 1997


More information about the users mailing list