Copying text from a protected pdf file
Paul Smith
phhs80 at gmail.com
Sat Sep 17 21:53:35 UTC 2005
On 9/16/05, George White <aa056 at chebucto.ns.ca> wrote:
> > I have got a pdf file, whose text I would like to copy to a word
> > processor. However, it seems to be protected, as when I copy and paste
> > a piece of text from there into a word processor, I only see garbage.
> > Is there some way of getting clean text from the pdf file?
>
> The PDF format has many ways to display text. To be able to extract text
> you need a file that stores strings and uses font information to render them
> in the viewer. You may be seeing images that were rasterized long ago.
> You should provide the output of the "pdffonts" command, preferrable for a
> minimal document (a big document could combine sections that use fonts with
> images).
>
> For example, the simplest case is a document that uses the PostScript Type 1
> fonts provided by the viewer:
>
> $ pdffonts /usr/share/doc/cups-1.1.20/ssr.pdf
> name type emb sub uni object ID
> ------------------------------------ ------------ --- --- --- ---------
> Times-Roman Type 1 no no no 4 0
> Helvetica Type 1 no no no 7 0
> Helvetica-Bold Type 1 no no no 8 0
> Times-Bold Type 1 no no no 5 0
> Courier Type 1 no no no 3 0
> Symbol Type 1 no no no 9 0
> Times-Italic Type 1 no no no 6 0
>
>
> --
> George N. White III
> Head of St. Margarets Bay, Nova Scotia
>
> --
Thanks, George. In my case,
$ pdffonts myfile.pdf
name type emb sub uni object ID
------------------------------------ ------------ --- --- --- ---------
DTUUBE+TTBC19E318t00 TrueType yes yes no 13 0
URMVBE+TTBC18C910t00 TrueType yes yes no 16 0
TOYVBE+Symbol Type 1C yes yes no 19 0
Helvetica Type 1C yes no no 22 0
CLLUBE+TTBC1802E0t00 TrueType yes yes no 34 0
Helvetica-Bold Type 1C yes no no 43 0
Helvetica-Oblique Type 1C yes no no 58 0
$
Paul
More information about the users
mailing list