Convert PDF to Text?

Mon Apr 23 10:40:39 UTC 2007

On Mon, Apr 23, 2007 at 06:38:01AM +0100, Keith G. Robertson-Turner wrote:
> Verily I say unto thee, that Akemi Yagi spake thusly:
> > On Sun, 22 Apr 2007 01:33:32 +0100, Keith G. Robertson-Turner wrote:
> > 
> >> All it produces is "empty" html files, that is - they are proper html
> >> (head, body, etc.) but the actual content is not there.
> >>
> >> IOW it looks like it can only work if the content of the PDF really is
> >> text, and not a scanned image of text.
> > 
> > This might be of help:
> > 
> > http://www.groklaw.net/article.php?story=20061210115516438
> 
> Thanks for the link. Looks good.
> 

I must point out that the scanned result will certainly need a fair amount
of cleanup. While tesseract is pretty good, it is far from perfect.

-- 
-------------------------------------------------------------------------------
 .----    Fred Smith   /              
( /__  ,__.   __   __ /  __   : /     
 /    /  /   /__) /  /  /__) .+'           Home: fredex at fcshome.stoneham.ma.us 
/    /  (__ (___ (__(_ (___ / :__                                 781-438-5471 
-------------------------------- Jude 1:24,25 ---------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.fedoraproject.org/pipermail/users/attachments/20070423/014250f7/attachment-0002.bin