A question on OCR for bad old document?

Jim mickeyboa at sbcglobal.net
Sun Jun 6 23:25:24 UTC 2010


On 06/06/2010 05:19 PM, Frank Cox wrote:
> On Sun, 2010-06-06 at 22:01 +0100, mike cloaked wrote:
>    
>> I have a scanned pdf of a very old document which was typewritten
>> about half a century ago. The scanned copy is noisy and the letters
>> are far from clear. The text can be made out (mostly) by eye, but it
>> is 19 pages long and I would like to OCR it to get a digitised text to
>> save the eye strain and lots of typing.
>>      
> You can't make a silk purse out of a sow's ear.
>
> If you are having difficulty reading the scan yourself, then you're
> probably out of luck getting the computer to OCR it for you.
>
> Your best bet is to retype it.  It's only 19 pages so it shouldn't take
> too long to type it again.  You'll spend far more time fiddling around
> (unsuccessfully) with OCR stuff than it will take to retype it anyway.
>    
Scanning a Text doc is not going to Save properly in Xsane/Linux, even 
if you use "gocr"
Scanning and "Saving Text" is broken.

As far as how a text looks on your terminal after scanning, It always 
looks bad. You have to Save As" to get good finish product, and again 
"Save As" Text is broken in Xsane. only Images turn out after "Saving"


More information about the users mailing list