Saturday, February 23, 2008

Set Document Metadata

Pdftk can take a plain-text file of these same key/value pairs and update a PDF's Info dictionary to match. Currently, it does not update the PDF's XMP stream. The command would look like this:

pdftk mydoc.pdf update_info new_info.txt output mydoc.updated.pdf

This will add or modify the Info keys given by mydoc.new_data.txt. Note that the output PDF filename must be different from the input. To remove a key/value pair, simply pass in the key/value with an empty value, like so:

InfoKey: MyDataKey

InfoValue:

Use pdftk to strip all Info and XMP metadata from a document by copying its pages into a new PDF, like so:

pdftk mydoc.pdf cat A output mydoc.no_metadata.pdf

The PDF specification defines several Info fields. Be careful to use these only as described in the specification. They are Title, Author, Subject, Keywords, Creator, Producer, CreationDate, ModDate, and Trapped.

3 comments:

Rights Management PDF said...

Hi,

Indexes the metadata and content of each object as it is stored, keeps track of the locators for each piece of data and provides precise hits as to where the query was found to enable immediate click through to the document. Thanks a lot.

electronic signature for sharepoint said...

I was known to the pdf and was using it as a consumer rather than creator.But recently I need to create those and I started learning about same.Thanks for helping!

Anonymous said...

This is incorrect

pdftk mydoc.pdf cat A output mydoc.no_metadata.pdf

Correct is:

pdftk A=mydoc.pdf cat A output mydoc.no_metadata.pdf