PDF to text?
Cameron Simpson
cs at zip.com.au
Fri Aug 12 22:25:44 UTC 2011
On 12Aug2011 12:09, Bob Goodwin <bobgoodwin at wildblue.net> wrote:
| On 12/08/11 12:04, Genes MailLists wrote:
| > On 08/12/2011 11:58 AM, Bob Goodwin wrote:
| >> On 12/08/11 11:22, Genes MailLists wrote:
| >>> On 08/12/2011 11:16 AM, Madhav Ancha wrote:
| >>> You could try this fedora app: pdftotext
| >>>
| >> As can be seen I tried several combinations, thought perhaps it
| >> couldn't handle the file nam in quotes "Couier etc" but nothing
| >> seems to do it?
| >>
| > Is it possible the PDF contains an image of the text rather than text
| > itself ?
|
| I'm not sure, how would I tell? It's an attachment to an html
| cover letter. The Fedora default app, disolays it with no
| complaints.
Is it ridiculously large for the amount of text? Does it seem to have
scanner artifacts in the text - "graininess" if you peer closely, fuzzy
text instead of perfectly formed letters (i.e. a picture of text instead
of text rendered by your computer from a font)?
Personally I use pdftohtml to convert PDFs (then an HTML-to-text
pipeline on the end of that). Possibly pdftotext does exactly that
anyway. Of course it achieves nothing for me if the PDF is a scan.
Cheers,
--
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/
Many companies are just now realizing that testing for the year 2000
problems will likely be more time-consuming and expensive than the
fix-it phase. - Bob Evans, Information Week, September 1997
More information about the users
mailing list