Friday, March 26, 2010

Building an index for your collection

After you’ve prepared your document collection, you’re ready to build the index for it. When you create the index, you specify the folder that contains the PDF document collection (this is also the folder in which the index file and its support folder must reside). You also can specify up to a maximum of 500 words that you want excluded from the index (such as a, an, the, and, or, and the like) and have numbers excluded from the index to speed up your searches. Words that you exclude from an index are called stop words. Keep in mind that while specifying stop words does give you a smaller and more efficient index (estimated at between 10 and 15 percent smaller), it also prevents you and other users from searching the collection for phrases that include these stop words (such as “in the matter of Smith and James”).
To build a new index, follow these steps:

1. Launch Acrobat and choose Advanced➪Catalog.
(You don’t have to have any of the files in the PDF document collection open at the time you do this.) The Catalog dialog box opens.

2. Click the New Index button in the Catalog dialog box.
The Index Definition dialog box opens,

3. Enter a descriptive title that clearly and concisely identifies the new index in the Index Title text box.

4. Click in the Index Description list box and enter a complete description of the index.
This description can include the stop words, search options supported, and the kinds of documents indexed.

5. Click the Add button to the right of the Include These Directories list box; in the Browse for Folder dialog box, select the folder that contains your PDF document collection and click OK.

6. To specifically exclude any subfolders that reside within the folder that contains your PDF document collection (the one whose directory path is now listed in the Include These Directories list box), click the Add button to the right of the Exclude These Subdirectories list box, select the subfolders of the folder you selected in Step 5, and click OK.
Repeat this step for any other subfolders that need to be excluded. (Actually, you should be able to skip this step entirely, because the folder that contains your PDF document collection ideally shouldn’t have any other folders in it.)

7. To further configure your index definition, click the Options button.

The Options dialog box opens.

8. Select the Do Not Include Numbers check box to exclude numbers from the index.

9. In the rare event that your PDF document collection contains PDF files saved in the original Acrobat 1.0 file format, select the Add IDs to Acrobat 1.0 PDF Files check box.

10. Select the Do Not Warn for Changed Documents When Searching check box if you don’t want to see an alert dialog box when you search documents that have changed since the last index build.

11. Click the Custom Properties button to open the Custom Properties dialog box, where you specify that any custom fields that have been added to the PDF document be searched. These include any custom fields that were converted by PDFMaker 6.0 from Microsoft Word documents.

12. To specify stop words for the index or to disable any of the word search options, click the Stop Words button. The Stop Words dialog box opens.

13. To specify a stop word that is not included in the index, enter a term in the Word text box and click the Add button.
Repeat this step until you’ve added all the stop words you don’t want indexed.

14. Click OK to close the Stop Words dialog box and return to the Options dialog box.

15. Click the Tags button to specify which document structure tags (if the PDF Document is tagged) can be used as search criteria in the Tags dialog box.

16. Click OK to close the Options dialog box and return to the New Index Definition dialog box.

17. Check over the fields in the New Definition dialog box and, if everything looks okay, click the Build button. The Save Index File dialog box opens.

18. If you want, replace the generic filename index.pdx in the File Name (Name on the Mac) text box with a more descriptive filename, and then click the Save button.

When editing the filename, be sure that you don’t select a new folder in which to save the file (it must be in the same folder as your PDF document collection) and, in Windows, don’t remove the .pdx extension (for Portable Document Index) that identifies it as a special Acrobat index file.
Acrobat responds by displaying the Catalog dialog box that keeps you informed of its progress as it builds the new index. When the Progress bar reaches 100% and the program finishes building the index, you can then click the Close button to close the Catalog dialog box and return to the Acrobat program, where you can start using the index in searching the files in the PDF document collection. Note that when Acrobat builds an index, it not only creates a new index file (with the .pdx filename extension on Windows), but also creates a new support folder using the same filename as the index file. All settings specified in the Options dialog box (Steps 7 through 15 in the preceding step list) apply only to the currently opened index file. If you want to apply any or all of these options globally to every catalog index you create, choose Edit➪Preferences and click Catalog in the list box to display the Catalog Preferences options. You can then specify global settings for index file creation, using the same options found in the Options dialog box.

1 comments:

Anonymous said...

Is there a size limitation for creating your .pdx file?