Saturday, February 23, 2008

Get Document Metadata

To create a plain-text report of PDF metadata, use pdftk's dump_data operation. It will also report PDF bookmarks and page labels, among other things. The command looks like this:

pdftk mydoc.pdf dump_data output mydoc.data.txt

Metadata will be represented as key/value pairs, like so:

InfoKey: Creator

InfoValue: Acrobat PDFMaker 6.0 for Word

InfoKey: Title

InfoValue: Brian Eno: His Music and the Vertical Color of Sound

InfoKey: Author

InfoValue: Eric Tamm

InfoKey: Producer

InfoValue: Acrobat Distiller 6.0.1 (Windows)

InfoKey: ModDate

InfoValue: D:20040420234132-07'00'

InfoKey: CreationDate

InfoValue: D:20040420234045-07'00'

Another tool for reporting PDF metadata is pdfinfo, which is part of the Xpdf project (http://www.foolabs.com/xpdf/). In addition to metadata, it also reports page sizes, page count, and PDF permissions . Running pdfinfo mydoc.pdf yields a report such as this:

Title: Brian Eno: His Music and the Vertical Color of Sound

Author: Eric Tamm

Creator: Acrobat PDFMaker 6.0 for Word

Producer: Acrobat Distiller 6.0.1 (Windows)

CreationDate: 04/20/04 23:40:45

ModDate: 04/22/04 14:39:30

Tagged: no

Pages: 216

Encrypted: no

Page size: 522 x 756 pts

File size: 1126904 bytes

Optimized: yes

PDF version: 1.4

Use pdfinfo's options to fine-tune its behavior. Use its -meta option to view a PDF's XMP stream.

2 comments:

PDF Security said...

Hi,

Users should have to classify documents or items with applying metadata in bulk during file import and copy, indexes the metadata and content of each object as it is stored, keeps track of the locators for each piece of data and provides precise hits as to where the query was found to enable immediate click through to the document, image, backup or file. Thanks a lot.

digital signature software said...

Amazing! It is interesting & very easy way of getting document metadata. You did great job. I followed instruction given by you and did my work easily. Thanks!