PS generated file -> pdftotext -layout and images

Gary Stainburn gary.stainburn at ringways.co.uk
Wed Oct 7 09:39:37 UTC 2015


I've not been able to get anywhere with pdftotext or PDF OCR software so I'm 
looking OMR using image comparison.

So far, I have:
converted each page into PPM (PDFTOPPM)
for each page cropped each line into a separate PNG file
I also have got images for each field containing the label and a tick and 
another containing the label and a cross.

Is there an easy way to detect if the label+tick PNG image is contained in the 
line PNG? (it doesn't have to be a PNG)

On Tuesday 06 October 2015 09:44:50 Gary Stainburn wrote:
> Hi folks,
>
> Thanks to help from here I've managed to get pseudo printers working with
> CUPS to allow Windows PC's to print a document which then gets converted to
> PDF, has stationery applied to it, and the result emailed back to the
> requesting user.
>
> I'm now on phase 2 of the project which uses
>
> pseudo printer -> ps2pdf -> pdftotext -layout
>
> to generate a text file which then is used to import data into my systems.
> This method is because the system that generates the report doesn't have an
> export facility.
>
> My only problem with this is that the generating system uses images of a
> tick and a cross to indicate some boolean values.
>
> Can anyone suggest how I can convert these ticks and crosses back to their
> boolean value?



More information about the users mailing list