Tuesday, March 18, 2008

How to create a PDF Table of Contents in HTML with pdftk and pdftoc?

First, download and install pdftk.
Pdftk can report on PDF data, including bookmarks. pdftoc converts this plain-text report into HTML. Visit http://www.pdfhacks.com/pdftoc/ and download pdftoc-1.0.zip. Unzip, and move pdftoc.exe to a convenient location, such as C:\Windows\system32\. On other platforms, build pdftoc from the source code.

Use pdftk to grab the bookmark data from your PDF, like so:

pdftk mydoc.pdf dump_data output mydoc_data.txt

Next, use pdftoc to convert this plain-text report into HTML:

pdftoc mydoc.pdf <> mydoc_toc.html

Alternatively, you can run these two steps together, like so:

pdftk mydoc.pdf dump_data | pdftoc mydoc.pdf > mydoc_toc.html

The first argument to pdftoc is the document location that you want pdftoc to use in its hyperlinks. The previous example assumes that mydoc.pdf and mydoc_toc.html will be in the same directory. You can also give a relative path to your PDF, like so:

pdftoc ../pdf/mydoc.pdf <> mydoc_toc.html

or a full URL:

pdftoc http://pdfhacks.com/pdf/mydoc.pdf <> mydoc_toc.html

Once readers enter the PDF, they can use its bookmarks for further navigation. To ensure they see your bookmarks, set your PDF to display them upon opening.

You can also add a download link on the web page that prompts the user to save the PDF on her local disk. As a courtesy to the user, mention the download file size, too.

2 comments:

Unknown said...

pdftk version 1.12 (latest version as of time of this post) will not open a PDF with version number 1.5 or higher (Acrobat 6.x or higher). So that means this pdfhack is limited to that PDF version.

I tried this pdfhack on a PDF with version number 1.4 and it worked exactly as author claims. Despite its limitations, this pdfhack is worth keeping in mind if one has an application that can lower the PDF version number (or if pdftk gets updated).

Unknown said...

A correction to my previous post...

Accesspdf.com is the first website returned in a Google search of pdftk and I assumed incorrectly the latest version could be found there.

A newer version is available for Windows and works with all PDF versions: pdftk v1.14