Wednesday, April 30, 2008

Add New Acrobat Features with Startup JavaScripts

When Acrobat starts up, it runs any JavaScripts it finds in either the system-level JavaScripts folder or the user-level JavaScripts folder. The locations of these folders are given shortly. You might need to create some of these folders if you can't find them.

Use startup JavaScripts to add menu items to Acrobat or to set global JavaScript variables.] Acrobat stores these persistent global variables in a file named glob.js.

These JavaScripts even enable you to add features to the free Reader, although Reader won't perform the more powerful commands. Our various JavaScript hacks all work with Reader. JavaScripts are also platform-independent, so our JavaScript hacks all run on Windows, Mac, and Linux.

Customize Acrobat Using JavaScript

Create custom Acrobat menu items and batch processing scripts.

Acrobat can do most of the things that you need. Yet, there's always something you wish it did a little differently. Acrobat enables you to add custom features using plain-text JavaScripts. These scripts can add menu items to Acrobat's menus or add tailored sequences to Acrobat's batch processing.

Acrobat JavaScript builds on the language core familiar to web developers, but its document object model is completely different from the DOM used by web browsers. Acrobat's JavaScript objects are documented in Technical Note 5186: Acrobat JavaScript Object Specification. Access it online from http://partners.adobe.com/asn/developer/pdfs/tn/5186AcroJS.pdf. Another useful document is the Acrobat JavaScript Scripting Guide from http://partners.adobe.com/asn/acrobat/sdk/public/docs/AcroJSGuide.pdf.

The JavaScript Debugger (Acrobat 6 Pro) or Console (Acrobat 5) is the place to test new ideas. Open it by selecting Advanced > JavaScript > Debugger . . . (Acrobat 6 Pro) or Tools > JavaScript > Console . . . (Acrobat 5).

The Code

In this example, the Perl script will use Acrobat to read annotation (e.g., sticky notes) data from the currently open PDF. The script will format this data using HTML and then output it to stdout.
Download the script here (SummarizeComments.pl)


Friday, April 25, 2008

How to Install PERL

Depending on your tastes or requirements, you might want to use the Perl scripting language instead of Visual Basic to program Acrobat. Perl can access the same Acrobat OLE interface used by Visual Basic to manipulate PDFs. Perl is well documented, is widely supported, and has been extended with an impressive collection of modules. A Perl installer for Windows is freely available from ActiveState.

We'll describe how to install the ActivePerl package from ActiveState, and then we'll use an example to show how to access Acrobat's OLE interface using Perl.

Acrobat OLE documentation comes with the Acrobat SDK. Look for IACOverview.pdf and IACReference.pdf. Acrobat Distiller also has an OLE interface. It is documented in DistillerAPIReference.pdf.

Install Perl on Windows
The ActivePerl installer for Windows is freely available from http://www.ActiveState.com/Products/ActivePerl/. Download and install. It comes with excellent documentation, which you can access by selecting Start Programs ActiveState ActivePerl 5.8 Documentation.

ActivePerl also includes the OLE Browser, which enables you to browse the OLE servers available on your machine (Start Programs ActiveState ActivePerl 5.8 OLE-Browser). The OLE Browser is an HTML file that must be opened in Internet Explorer to work properly.

Running the Code

Open a PDF in Acrobat In Word, run the macro by selecting Tools Macro Macros . . . SummarizeComments and then clicking Run. After a few seconds, a new Word document will appear. It will list all the comments that readers have added to each page of the currently visible PDF.

This script demonstrates the typical process of drilling down through layers of PDF objects to find desired information. Here is a simplified sketch of the layers:


app
The currently running Acrobat program. Use the app to alter the user interface or Acrobat's preferences.

avdoc
The PDF currently displayed in Acrobat. Use the avdoc to change how the PDF appears in the viewer or to print pages.

pddoc
Represents the underlying PDF document. Use the pddoc to access or manipulate the PDF's pages or metadata.

pdpage
Represents the underlying PDF page. Use the pdpage to access or manipulate a page's annotations, its rotation, or its cropping.

These OLE objects closely resemble the objects exposed by the Acrobat API. The API gives you much more power, however.

Drive Acrobat using VB or Microsoft Word's Visual Basic for Applications (VBA)

Adobe Acrobat's OLE interface enables you to access or manipulate PDFs from a freestanding Visual Basic script or from another application, such as Word. You can also use Acrobat's OLE interface to render a PDF inside your own program's window. The Acrobat SDK [Hack #98] comes with a number of Visual Basic examples under the InterAppCommunicationSupport directory. The SDK also includes OLE interface documentation. Look for IACOverview.pdf and IACReference.pdf. These OLE features do not work with the free Reader; you must own Acrobat.

Acrobat Distiller also has an OLE interface. It is documented in DistillerAPIReference.pdf, which comes with the full Acrobat SDK.


The following example shows how easily you can work with PDFs using Acrobat OLE. It is a Word macro that scans the currently open PDF document for readers' annotations (e.g., sticky notes). It creates a new Word document and then builds a summary of these annotation comments.

The Code
To add this macro to Word, select Tools > Macro > Macros . . . , type in the macro name SummarizeComments, and click Create. Word will open a text editor where you can enter the code. Save, and then test. You can download this code from http://www.pdfhacks.com/summarize.

VBA code for summarizing comments

Sub SummarizeComments( )

Dim app As Object

Set app = CreateObject("AcroExch.App")

If (0 < newdoc =" Documents.Add(DocumentType:=" newdocrange =" NewDoc.Range" found_notes_b =" False" avdoc =" app.GetActiveDoc" pddoc =" avdoc.GetPDDoc" num_pages =" pddoc.GetNumPages" ii =" 0" pdpage =" pddoc.AcquirePage(ii)" page_head_b =" False" num_annots =" pdpage.GetNumAnnots" jj =" 0" annot =" pdpage.GetAnnot(jj)"> "" And _

annot.GetSubtype <> "Popup") Then



If (page_head_b = False) Then ' output the page number

NewDocRange.Collapse wdCollapseEnd

NewDocRange.Text = "Page: " & (ii + 1) & vbCr

NewDocRange.Bold = True

NewDocRange.ParagraphFormat.LineUnitBefore = 1

page_head_b = True

End If



' output the annotation title and format it a little

NewDocRange.Collapse wdCollapseEnd

NewDocRange.Text = annot.GetTitle & vbCr

NewDocRange.Italic = True

NewDocRange.Font.Size = NewDocRange.Font.Size - 1

NewDocRange.ParagraphFormat.LineUnitBefore = 0.6



' output the note text and format it a little

NewDocRange.Collapse wdCollapseEnd

NewDocRange.Text = annot.GetContents & vbCr

NewDocRange.Font.Size = NewDocRange.Font.Size - 2



found_notes_b = True

End If

Next jj

End If

Next ii



If (Not found_notes_b) Then

NewDocRange.Collapse wdCollapseEnd

NewDocRange.Text = "No Notes Found in PDF" & vbCr

NewDocRange.Bold = True

End If

End If

End Sub

Monday, April 21, 2008

Convert Microsoft Office Documents to PDF

If you have Acrobat 6 and Microsoft Word, you can use Acrobat's preconfigured Open All batch sequence to convert Word documents into PDFs hands-free. As the name suggests, you actually can use the Open All batch sequence on any kind of file that Acrobat knows how to handle, including bitmap and PostScript files. Acrobat 5 also has an Open All batch sequence, but it does not handle as many file types as Acrobat 6 does.

To merge a number of Word documents into a single PDF with Acrobat 6, use the File Create PDF From Multiple Files . . . feature instead.

First, you must configure Acrobat 6 to create the kind of PDF you desire. Do this using the Acrobat preferences, located at Edit > Preferences > General . . . > Convert to PDF. Select Microsoft Office and click Settings . . . , and a dialog opens.

In Acrobat 6, start the Open All batch sequence by selecting Advanced > Batch > Processing . . . Open All and clicking Run Sequence. In Acrobat 5, start the Open All batch sequence by selecting File Batch Processing Open All. Click OK to close the confirmation dialog (if necessary), and a file selector will open. Change Files of Type to All Files, select one or more input files, and then click Select. Acrobat will create one PDF for each input document. Acrobat 5 can't process Word documents this way, but it can handle bitmap images.

Refry a Folder Full of PDFs (Acrobat 6 Pro)


Before publishing a PDF online for wide distribution, you should try reducing its file size by refrying it . With Acrobat 6, you can refry a PDF using its Optimizer feature (Advanced PDF Optimizer . . . ). Let's create an Acrobat 6 batch sequence that applies the Optimizer to an entire folder of PDF documents. While we're at it, we can also add metadata or other finishing touches .

Create a batch sequence in Acrobat 6 Professional by selecting Advanced Batch Processing . . . and clicking New Sequence . . . . Name the new sequence Refry and click OK. The Batch Edit Sequence dialog will open.

If you want to also add metadata (title, subject, author, or keywords) to the PDFs, click Select Commands . . . and the Edit Sequence dialog will open. Select the Description command from the list on the left and click Add. In the right column, double-click this command and a dialog opens where you can set the metadata values. Click OK to close the Edit Sequence dialog and to return to the Batch Edit Sequence dialog.

Fine-tune a batch sequence using the Execute JavaScript batch command. If JavaScript is not powerful enough, you can develop your own batch processing commands using an Acrobat plug-in. See the BatchCommand and BatchMetadata plug-in samples that come with the Acrobat SDK.

Set Run Commands On to Ask When Sequence is Run. Set Select Output Location to Same Folder as Original(s). Click Output Options . . . .

On the Output Options dialog, select Add to Original Base Name(s) and then set Insert After to .opt. Under Output Format, set Save File As to Adobe PDF Files. Place checkmarks next to Fast Web View and PDF Optimizer. Click Settings . . . to configure the Optimizer.

Configure the Optimizer to suit your requirements. Set its compatibility to Acrobat 5.0 and Later or Acrobat 4.0 and Later for maximum PDF portability. Click OK when you're done.

Click OK to close the Output Options dialog. Click OK to close the Batch Edit Sequence dialog. Your new Refry batch sequence now should be visible in the Batch Sequences dialog.

To make a batch sequence recurse into subfolders, set Run Commands On to Selected Folder. Then click Browse . . . to select the folder you want to process. Whenever you run the sequence, it will process that same folder (and its subfolders).

Test your batch sequence on a temporary folder of disposable PDFs. In the Batch Sequences dialog, select Refry and click Run Sequence. Click OK on the Confirmation dialog. A file selector will open. Select one or more PDFs and click Select to continue. Acrobat will create new PDFs based on your Optimizer settings. The new PDFs will have the same filenames as the original PDFs, except they will have .opt.pdf instead of .pdf at the end. When Acrobat is done, check the new PDFs to make sure the results are satisfactory.

Disable the batch processing confirmation dialog using the Acrobat preferences: Edit>Preferences>General . . . Batch Processing.

Converting folders of Word documents to PDF.

If you have a folder of PDFs that you must alter or convert, consider using Acrobat's built-in batch processing feature. After you create a batch sequence, you can use it to process large quantities of PDFs hands-free. You can also apply a batch sequence to a single PDF, which means you can create batch sequences for use as macros.

Acrobat batch processing isn't just for manipulating PDF. You can use it to convert Microsoft Office documents, PostScript files, or graphic bitmaps into PDF documents. Or, use batch processing to convert PDF documents to HTML, PostScript, RTF, text, or graphic bitmaps. Many of these options are not available in Acrobat 5. In Acrobat 6, you can also apply OCR to bitmaps or refry PDFs to prepare them for online distribution.

You can automate many of the basic things you do in Acrobat with batch processing. We'll describe a couple of examples.

Tuesday, April 15, 2008

Discover Perl Packages with CPAN

CPAN (http://www.cpan.org) is the Comprehensive Perl Archive Network, where you will find "All Things Perl." Visit http://search.cpan.org to discover several other PDF packages. Drill down to find details, documentation, and downloads. For example, PDF::Extract (http://search.cpan.org/~nsharrock/) creates a new PDF from the pages of a larger, input PDF.

"Hello World" PDF in Perl

This Perl script creates a PDF named HelloWorld.pdf, adds a page, and then adds text to that page. It gives you an idea of how easily you can create PDF.

#!/usr/bin/perl

# HelloWorld.pl; adapted from 0x_test-pl

use PDF::API2;

my $pdf = PDF::API2->new(-file => "HelloWorld.pdf");

$pdf->mediabox(595,842);

my $page = $pdf->page;

my $fnt = $pdf->corefont('Arial',-encoding => 'latin1');

my $txt = $page->hybrid;

$txt->textstart;

$txt->font($fnt, 20);

$txt->translate(100,800);

$txt->text("Hello World! left-aligned");

$txt->translate(500,750);

$txt->text_right("Hello World! right-aligned");

$txt->translate(300,700);

$txt->text_center("Hello World! center-aligned");

$txt->textend;

$pdf->save;

$pdf->end( );

Install Perl and the PDF::API2 Package on Windows

After installing Perl, use the Perl Package Manager to easily install the PDF::API2 package.

Launch the Programmer's Package Manager (PPM, formerly called Perl Package Manager) by selecting Start Programs ActiveState ActivePerl 5.8 Perl Package Manager. A command prompt will open with its ppm> prompt awaiting your command. Type help to see a list of commands. Type search pdf to see a list of available packages. To install PDF::API2, enter install pdf-api2. The Package Manager will fetch the package from the Internet and install it on your machine. The entire session looks something like this:

PPM - Programmer's Package Manager version 3.1.

Copyright (c) 2001 ActiveState SRL. All Rights Reserved.

Entering interactive shell. Using Term::ReadLine::Stub as readline library.

Type 'help' to get started.

ppm> install pdf-api2

====================

Install 'pdf-api2' version 0.3r77 in ActivePerl 5.8.3.809.

====================

Transferring data: 74162/1028845 bytes.

...

Installing C:\Perl\site\lib\PDF\API2\CoreFont\verdanaitalic.pm

Installing C:\Perl\site\lib\PDF\API2\CoreFont\webdings.pm

Installing C:\Perl\site\lib\PDF\API2\CoreFont\wingdings.pm

Installing C:\Perl\site\lib\PDF\API2\CoreFont\zapfdingbats.pm

Installing C:\Perl\site\lib\PDF\API2\Chart\Pie.pm

Successfully installed pdf-api2 version 0.3r77 in ActivePerl 5.8.3.809.

ppm> quit

The PDF::API2 package is used widely to create and manipulate PDF. You can download documentation and examples from http://pdfapi2.sourceforge.net/dl/.

Format your content in HTML and then transform it into PDF.

HTML pages are easy to create on the fly. PDF pages are hard. One simple way to create dynamic PDF is to first create the document in HTML and then use HTMLDOC to transform it into PDF. This works for single pages and long documents.

HTMLDOC creates PDF documents from HTML 3.2 data. It provides document layout options, such as running headers and footers. It can add PDF features, such as bookmarks, links, metadata, and encryption. Invoke HTMLDOC from the command line or use its GUI. Visit http://www.easysw.com/htmldoc/software.php to download Windows binaries or source that can be compiled on Linux, Mac OS X, or a variety of other operating systems.

The detailed documentation that comes with HTMLDOC also is available online at http://www.easysw.com/htmldoc/documentation.php.

In Perl, you can automate PDF generation with HTMLDOC by using the HTML::HTMLDoc module to interface with HTMLDOC.

Sunday, April 6, 2008

How to Combine PDF and FDF URLs to Fill Forms?

Another way to automatically fill an online PDF form is to append an FDF file reference to the PDF form's URL. In this case the FDF file must omit the PDF form reference (the /F key). When the user follows the link, Acrobat/Reader opens the PDF and fills the form fields using the FDF data. The FDF file reference must be a full URL:

http://localhost/fine_form.pdf#FDF=http://localhost/fine_data.fdf

Or, it must reference an FDF-generating script instead of a file. For example:

http://localhost/fine_form.pdf#FDF=http://localhost/fdf_data.php?t=42

You should use this technique of referencing both the form PDF and the FDF data in a single URL when displaying filled-in forms inside of HTML frames.

You really must use a web server to test these techniques. Windows users can download IndigoPerl from http://www.indigostar.com. IndigoPerl is an Apache installer for Windows that includes PHP and Perl support.

FDF can also contain PDF annotation (e.g., sticky note) information. Use the preceding techniques to dynamically add annotations to online PDF. Create example FDF or XFDF files by opening a PDF in Acrobat and adding some annotations. Then, select Document Export Comments . . . (Acrobat 6) or File Export Comments . . . (Acrobat 5).

How to Serve FDF to Fill Forms?

One way to automatically fill an online PDF form is to serve a data-packed FDF file (with MIME type application/vnd.fdf). The user's browser will open Acrobat/Reader and pass it the FDF data. Acrobat/Reader will read the FDF data to locate the PDF form. It will load and display this PDF and then populate its fields from the FDF. The PDF form in question should be available from your web server and the FDF data should reference it by URL using the /F key, as we do in our preceding example.

Check your web server to make sure it sends the appropriate Content-type: application/vnd.fdf header when serving FDF files. Or, send the header directly from your script.

This technique is simple, but it has limitations. First, not all browsers know how to handle FDF data. Second, this technique does not always work inside of HTML frames. The next technique overcomes both of these problems.

FDF, the Forms Data Format

Populate online PDF forms with known data.
To maintain form data, you must display the current state of the data to the user. This enables the user to review the data, update a single field, and submit this change back to the server. With HTML forms, you can set field values as the form is served to the user. With PDF forms, you can use the Forms Data Format (FDF) to populate a form's fields with data.

The PDF Reference describes the FDF file format. Its syntax uses PDF objects Section 6.8 to organize data. To see an example, open your PDF form in Acrobat and fill in some fields. Export this data as FDF by selecting Advanced Forms Export Forms Data . . . (Acrobat 6) or File Export Form Data . . . (Acrobat 5). Our basic PDF form yields an FDF file that lists fields in name/value pairs and then references the PDF form by filename:

%FDF-1.2

1 0 obj

<< /FDF << /Fields [ << /T (text_field_1) /V (Here is some text) >>

<< /T (text_field_2) /V (More nice text) >> ]

/F (http://localhost/fine_form.pdf)

>>

>>

endobj

trailer

<< /Root 1 0 R >>

%%EOF

For XML fans, XFDF is an XML-based subset of FDF features. Acrobat Versions 5 and 6 support XFDF. Its MIME type is application/vnd.adobe.xfdf.

Users can store and manage PDF form data using FDF files. Visit http://segraves.tripod.com/index3.htm for some examples. For our purpose of serving filled-out PDF forms, the user never sees or handles the FDF file directly.

You have two options for automatically filling an online PDF form with data. You can serve FDF data that references the PDF form, or you can create a URL that references both the PDF form and the FDF data together.

Tuesday, April 1, 2008

How to Test Your PDF Form?

Create a text file named echo.php and program it with the following script. IndigoPerl users can save it to C:\indigoperl\apache\htdocs\pdf_hacks\echo.php. This PHP script simply reports submitted form data back to your browser. Create a PDF Submit button that posts data to this script's URL (e.g., http://localhost/pdf_hacks/echo.php#FDF) as we described earlier.

Download echo.php here: http://mihd.net/7nzemdj

A PDF form interacts properly with a web server only when viewed inside a web browser. So, drag and drop your form into a browser, fill some fields, and then click the Submit button. The PDF should be replaced with an echoed data report.

If dragging and dropping PDF into Mozilla causes the PDF to open outside of the browser window, make sure Mozilla's Java is enabled (Edit >Preferences... >Advanced). After enabling Java, restart Mozilla and try again.

How to Install the Apache Web Server on Windows?


To test your interactive PDF form, you must have access to a web server. Many of these hacks use server-side PHP scripts, so your web server should also run PHP (http://www.php.net). Windows users can download an Apache (http://www.apache.org) web server installer called IndigoPerl from IndigoSTAR (http://www.indigostar.com). This installer includes PHP (and Perl) modules, so you can run our hacks right out of the box. Apache and PHP are free software.

Visit http://www.indigostar.com/indigoperl.htm and download indigoperl-2004.02.zip. Unzip this file into a temporary directory and then double-click setup.bat to run the installer. When the installer asks for an installation directory, press Enter to choose the default: C:\indigoperl\. In our discussions, we'll assume IndigoPerl is installed in this location.

After installing IndigoPerl, open a web browser and point it at http://localhost/. This is the URL of your local web server, and your browser should display a Web Server Test Page with links to documentation. When you request http://localhost/, Apache serves you index.html from C:\indigoperl\apache\htdocs\. Create a pdf_hacks directory in the htdocs directory, and use this location for our PHP scripts. Access this location from your browser with the URL: http://localhost/pdf_hacks/.

How to Create the Form?

Open the form's source document and print to PDF or scan a paper copy and create a PDF using OCR. Open the PDF in Acrobat to add form fields.

PDF forms can be powerful JavaScript programs, but we won't be using any PDF JavaScript. Instead, we will create PDF forms that let the web server do all the work. This gives you the freedom to program the form's logic with any language or database interface you desire.

PDF form fields correspond closely to HTML form fields. Add them to your PDF using one or more Acrobat tools.

input type="text"
Text

input type="password"
Text with Password Option

input type="checkbox"
Checkbox

input type="radio"
Radio Button

input type="submit"
Button with Submit Form Action

input type="reset"
Button with Reset Form Action

input type="hidden"
Text with Hidden Appearance

input type="image"
Button with Icon Option

input type="button"
Button

textarea
Text with Multiline Option

select
Combo Box or List Box

In Acrobat 6, you have one tool for each form field type. Open this toolbar by selecting Tools Advanced Editing Forms Show Forms Toolbar. Select a tool (e.g., Text Field tool), click, and drag out a rectangle where the field goes. Release the rectangle and a Field Properties dialog opens. Select the General tab and enter the field Name. This name will identify the field's data when it is submitted to your web server. Set the field's appearance and behavior using the other tabs. Click Close and the field is done.

In Acrobat 5, use the Form tool to create any form field. Click, and drag out a rectangle where the field goes. Release the rectangle and a Field Properties dialog opens. Select the desired field Type (e.g., Text) and enter the field Name. This name will identify the field's data when it is submitted to your web server. Set the field's appearance and behavior using the other tabs. Click OK and the field is done. Using the Form tool, double-click a field at any time to change its properties.

Take care to maximize your PDF form's compatibility with older versions of Acrobat and Reader.

To upload form data to your web server, the PDF must have a Submit Form button. Create a PDF button, open the Actions tab, and then add the Submit a Form (Acrobat 6) or Submit Form (Acrobat 5) action to the Mouse Up event.

Edit the action's properties to include your script's URL; this would be an HTML form's action attribute. Append #FDF to the end of this URL, like this:

http://localhost/pdf_hacks/echo.php#FDF

Set the Field Selection to include the fields you want this button to submit; All Fields is safest, to start. Set the Export Format to HTML and the PDF form will submit the form data using HTTP's post method.

When you are done, save your PDF form and test it.

Buttons look funny on paper. If users will be printing your form, consider making buttons unprintable. Open the button properties and select the General tab (Acrobat 6) or the Appearance tab (Acrobat 5). Under Common Properties set Form Field: to Visible but Doesn't Print. Click OK.