Monday, June 16, 2008

How to make scanned documents searchable and editable


When you scan a document directly into a PDF file (as described in the preceding section), Acrobat captures all the text and graphics on each page as though they were all just one big graphic image. This is fine as far as it goes, except that it doesn’t go very far because you can neither edit nor search the PDF document. (As far as Acrobat is concerned, the document doesn’t contain any text to edit or search — it’s just one humongous graphic). That’s where the Paper Capture plug-in in Acrobat 6 for Windows comes into play: You can use it to make a scanned document into a PDF that you can either just search or both search and edit.

To use Paper Capture, all you have to do is choose Document➪Paper Capture to open the Paper Capture dialog box, select the page or pages to be processed (All Pages, Current Page, or From Page x to y), and then click the OK button; the Paper Capture utility does the rest. As it processes the page or pages in the document that you designated, a Paper Capture Plug- In alert dialog box keeps you informed of its progress in preparing and performing the page recognition. When Paper Capture finishes doing the page recognition, this alert dialog box disappears, and you can then save the changes to your PDF document with the File➪Save command When doing the page recognition in a PDF document, the Paper Capture plugin offers you a choice between the following three Output Style options:
  • Searchable Image (Exact): Select this option to make the text in the PDF document searchable but not editable (this is the default setting). This setting is the one to choose if you’re processing a document that needs to be searchable but should never be edited in any way, such as an executed contract.
  • Searchable Image (Compact): Select this option to make the text in the PDF document searchable but not editable and to compress its graphics. Use this setting if you’re processing a document whose text requires searching without editing and that also contains a fair number of graphic images that need compressing. When you select this setting, Paper Capture applies JPEG compression to color images and ZIP compression to black-and-white images.
  • Formatted Text & Graphics: Select this option to make the text in the PDF document both editable and searchable. Pick this setting if you not only want to be able to find text in the document but also possibly make editing changes to it.
To select a different output style setting, click the Edit button in the Paper Capture dialog box to open the Paper Capture Settings dialog box (as shown in Figure 6-5). This dialog box not only enables you to select a new output style in the PDF Output Style drop-down list, but also enables you to designate the primary language used in the text in the Primary OCR Language drop-down list (OCR stands for Optical Character Recognition, which is the kind of software that Paper Capture uses to recognize and convert text captured as a graphic into text that can be searched and edited).

If your PDF document contains graphic images, you can tell Paper Capture how much to compress the images by selecting the maximum resolution in the Downsample Images drop-down list. This menu offers you three options in addition to None (for no compression): Low (300 dpi), Medium (150 dpi), and High (72 dpi). The Low, Medium, and High options refer to the amount of compression applied to the images, and the values 300, 150, and 72 dpi (dots per inch) refer to their resolution and thus their quality. As always, the higher the amount of compression, the smaller the file size and the lower the image quality.

After processing the pages of your PDF document with the Paper Capture plug-in, use the Search feature (Ctrl+F on Windows and Ô+F on the Mac) to search for words or phrases in the text to verify that it can be searched. If you used the Formatted Text & Graphics output style in doing the page recognition, you can select the TouchUp Text Tool by clicking its button on the Advanced Editing toolbar or by typing T, and then click the I-beam pointer in a line of text to select the line with a bounding box to verify that you can edit the text as well. Always remember to choose File➪Save to save the changes made to your document by processing with Paper Capture.


0 comments: