Friday, March 26, 2010

Rebuilding an index

If you modify a PDF document collection for which you’ve created an index by removing or adding files to the collection, you must rebuild the index in order to have Acrobat search its entire contents. Before you rebuild an index for a collection from which you have removed some PDF files, you need to purge the index. When you do this, Acrobat actually removes the files no longer part of the collection from the index, rather than just marking them as invalid. Purging them from the index streamlines it considerably and makes searching it as fast as possible.
To purge and then rebuild an index, follow these steps:

1. Choose Advanced➪Catalog; in the Catalog dialog box that appears, click the Open Index button.
The Open Index File dialog box appears.

2. Select the folder that contains the PDF document collection and the index file, and then click the index file icon (the one with the .pdx file extension in Windows) before you click the Open button.
The Open Index File dialog box closes, and the Index Definition dialog box appears.

3. Click the Purge button at the bottom of the Index Definition dialog box.
Acrobat responds by opening the Catalog dialog box that displays the status of your index purge operation with a progress bar. When the purge operation is finished, you are informed of that fact in the list box below the progress bar.

4. To rebuild the purged index, click the Open Index button again, click the index file icon, and then click the Open button.
Once again, the Open Index File dialog box closes, and the Index Definition dialog box appears.

5. Click the Rebuild button to rebuild the index using only the PDF files left after the purge.

6. After Acrobat finishes rebuilding the index, click the Close button to close the Catalog dialog box.

After you’ve finished purging and rebuilding an index, you can then immediately start using it in the searches you perform on the PDF document collection. Although not specifically noted in the preceding steps, keep in mind that prior to clicking the Rebuild button, you can click the Options button to modify stop words or change the other number and document element search options, as discussed earlier in this chapter in the section, “Building an index for your collection.”
If you use only one particular index that you built when searching a particular PDF document, you can associate the index file with the PDF file. That way, Acrobat automatically mounts the index so you’re ready to search the document with it every time you open the PDF document in Acrobat. To do this, choose File➪Document Properties and click Advanced in the list box to display the Advanced Document Properties options. In the PDF Setting area, click the Browse button to locate and select the index file you want to associate with the current PDF document. Click Open to select the index file and return to the Document Preferences dialog box. The directory path for the index file now appears in the Search Index text box. Click OK to close the Document Preferences dialog box.

Building an index for your collection

After you’ve prepared your document collection, you’re ready to build the index for it. When you create the index, you specify the folder that contains the PDF document collection (this is also the folder in which the index file and its support folder must reside). You also can specify up to a maximum of 500 words that you want excluded from the index (such as a, an, the, and, or, and the like) and have numbers excluded from the index to speed up your searches. Words that you exclude from an index are called stop words. Keep in mind that while specifying stop words does give you a smaller and more efficient index (estimated at between 10 and 15 percent smaller), it also prevents you and other users from searching the collection for phrases that include these stop words (such as “in the matter of Smith and James”).
To build a new index, follow these steps:

1. Launch Acrobat and choose Advanced➪Catalog.
(You don’t have to have any of the files in the PDF document collection open at the time you do this.) The Catalog dialog box opens.

2. Click the New Index button in the Catalog dialog box.
The Index Definition dialog box opens,

3. Enter a descriptive title that clearly and concisely identifies the new index in the Index Title text box.

4. Click in the Index Description list box and enter a complete description of the index.
This description can include the stop words, search options supported, and the kinds of documents indexed.

5. Click the Add button to the right of the Include These Directories list box; in the Browse for Folder dialog box, select the folder that contains your PDF document collection and click OK.

6. To specifically exclude any subfolders that reside within the folder that contains your PDF document collection (the one whose directory path is now listed in the Include These Directories list box), click the Add button to the right of the Exclude These Subdirectories list box, select the subfolders of the folder you selected in Step 5, and click OK.
Repeat this step for any other subfolders that need to be excluded. (Actually, you should be able to skip this step entirely, because the folder that contains your PDF document collection ideally shouldn’t have any other folders in it.)

7. To further configure your index definition, click the Options button.

The Options dialog box opens.

8. Select the Do Not Include Numbers check box to exclude numbers from the index.

9. In the rare event that your PDF document collection contains PDF files saved in the original Acrobat 1.0 file format, select the Add IDs to Acrobat 1.0 PDF Files check box.

10. Select the Do Not Warn for Changed Documents When Searching check box if you don’t want to see an alert dialog box when you search documents that have changed since the last index build.

11. Click the Custom Properties button to open the Custom Properties dialog box, where you specify that any custom fields that have been added to the PDF document be searched. These include any custom fields that were converted by PDFMaker 6.0 from Microsoft Word documents.

12. To specify stop words for the index or to disable any of the word search options, click the Stop Words button. The Stop Words dialog box opens.

13. To specify a stop word that is not included in the index, enter a term in the Word text box and click the Add button.
Repeat this step until you’ve added all the stop words you don’t want indexed.

14. Click OK to close the Stop Words dialog box and return to the Options dialog box.

15. Click the Tags button to specify which document structure tags (if the PDF Document is tagged) can be used as search criteria in the Tags dialog box.

16. Click OK to close the Options dialog box and return to the New Index Definition dialog box.

17. Check over the fields in the New Definition dialog box and, if everything looks okay, click the Build button. The Save Index File dialog box opens.

18. If you want, replace the generic filename index.pdx in the File Name (Name on the Mac) text box with a more descriptive filename, and then click the Save button.

When editing the filename, be sure that you don’t select a new folder in which to save the file (it must be in the same folder as your PDF document collection) and, in Windows, don’t remove the .pdx extension (for Portable Document Index) that identifies it as a special Acrobat index file.
Acrobat responds by displaying the Catalog dialog box that keeps you informed of its progress as it builds the new index. When the Progress bar reaches 100% and the program finishes building the index, you can then click the Close button to close the Catalog dialog box and return to the Acrobat program, where you can start using the index in searching the files in the PDF document collection. Note that when Acrobat builds an index, it not only creates a new index file (with the .pdx filename extension on Windows), but also creates a new support folder using the same filename as the index file. All settings specified in the Options dialog box (Steps 7 through 15 in the preceding step list) apply only to the currently opened index file. If you want to apply any or all of these options globally to every catalog index you create, choose Edit➪Preferences and click Catalog in the list box to display the Catalog Preferences options. You can then specify global settings for index file creation, using the same options found in the Options dialog box.