I did gOCR with screenshots, if helpful

Rodolfo Alcazar Portillo nospaze at gmail.com
Mon Jun 7 09:45:00 UTC 2010


Am Sonntag, den 06.06.2010, 22:01 +0100 schrieb mike cloaked:
> Does anyone have any guidance or a url to point me to that may help
> with turning that scanned old document into something sensible as a
> character file within Fedora ?

Well I had a somehow bad experience. Tried saving my firefox passwords,
but Firefox doesnt' have any stored passwords exporting form
(edit/configuration/security/stored passwords, if anyone knows how to
export them, please.), so I did an OCR to my firefox screenshots. 

The best I did was with gocr, due to its configuration options, making
tests of best configurartion options

1. used Dejavu fonts on my screen
2. resized the image (bigger), letter height about 20 pixels
(uppercases) 
3. tried a lot of attempts playing with certainty and space sizes:

for a in $(seq 95 100); do 
for s in $(seq 12 25); do 
gocr -s $s -a $a screen.jpg
echo s, a
read
# this read command allows you to explore your text on screen, 
# read S and A and continue by pressing ENTER
done 
done

4. did my final recognition for all my saved screenshots with the best
combination of A and S, cant remember them

By the end, I had a 98% characters recognized, bad for passwords.
Neither tesseract nor Ocrad offered good results for me. In your case,
maybe tesseract would be helpful. Also unpaper.

:)
----------------------------------------------
Rodolfo Alcazar Portillo - nospaze at gmail.com
otbits.blogspot.com / counter.li.org: #367962
----------------------------------------------
"Nur noch fünf Staaten der Erde haben mehr Einwohner als Linus Torvalds
Freunde hat."
- linux.de




More information about the users mailing list