Re: A simple question on copy/paste

Thursday, 1 June 2006

Stephen Liu wrote:
...
 All documents on Internet printed as .ps files and/or later
converted
 to .pdf have problem.  Disregarding they can be retrieve and read,
 their text can't be highlighted and copy/paste.  I don't know what
 mistake committed.  Any suggestion? 
Text can be stored in a PDF as a series of numbers (e.g. 65 is A) plus
font information, or as a picture of the text. Exactly how this happens
depends on how the PDF was created (and in this case, how the PostScript
version was created in the first place -- often there'll be an option
somewhere in whatever created them to create bitmaps or include font
information).

Once it's turned into a picture, then there's no easy way to go back to
the text it was created from. This isn't a limitation of your program,
but of existing technology. There are "OCR" programs that can "read"
the
text in the same way as you or I would -- they look at the shapes, and
try to recognise letters. But they aren't foolproof (or particularly
fast).

Hope this helps,

James.

-- 
E-mail address: james | Helpful Advice from Thames Water:
@westexe.demon.co.uk  | "If you have difficulty reading this leaflet,
                      | please ask someone to help you."
                      |     -- Read on "The News Quiz", BBC Radio 4