If you have pdf files with actual characters, the pdftotext tool works well for extracting the text (though not necessarily the layout). As far as doing OCR from actual image files, I always found tesseract to work better than most (but it was still pretty feeble).