Showing posts with label a49. Extracting Blocks of Text from PDF. Show all posts
Showing posts with label a49. Extracting Blocks of Text from PDF. Show all posts

Tuesday, December 29, 2009

Selecting and Copying Graphic Images


You use the Select Image tool, located at the bottom of the Selection toolbar menu on the Basic toolbar, to select individual graphic images for copying. When you choose the Select Image tool, the mouse pointer becomes a crosshair that you use to draw a bounding box around the graphic. After you’ve enclosed the entire graphic (and you don’t have to worry if your marquee is a little larger than the image borders), you can copy the graphic to a new document open in another program either by copying it to the Clipboard (Edit➪ Copy) or by dragging it to a new document window. Note that if your PDF document is tagged, you can simply click an image with the Select Image tool to select a graphic object.
Keep in mind that when you copy images to the Clipboard, Acrobat uses the graphics resolution of your monitor and that set for the Clipboard by your computer’s operating system, rather than the resolution of the images as saved in the PDF document (which could well be a lot higher than either of the two). Also, be aware that all images you copy into the Clipboard are automatically converted onto the Clipboard as pixels, even if they are saved as vector (or line) graphics in the PDF file.

Copying PDF tables into word processors and spreadsheets

The Select Table tool makes it a joy to copy tables from PDF files into wordprocessed documents or spreadsheets.
Microsoft Word automatically recognizes and preserves the table structure by creating a new Word table. Even more importantly, Word has maintained the number formatting as well (indicated by the dollar signs, commas, percent signs, and parentheses for the negative values).
You see that Excel also has no problem recognizing and correctly interpreting the layout and formatting of the table data. It immediately inserted the incoming table data into the correct worksheet cells, while maintaining the correct cell formatting. (By the way, in case you aren’t yet an Excel user, if you see #### symbols in the new worksheet, these symbols merely indicate that the column isn’t wide enough to display the values in that cell — these are not error indicators and are easily disposed of by widening the column.)
Acrobat 6 offers an even easier way to get selected table data into a spreadsheet program. (This method assumes that you already have a CSV-compliant spreadsheet program like Microsoft Excel installed on your computer.) Select a table in a PDF document with the Select Table tool, right-click to open the context menu, and choose Open Table in Spreadsheet. Your CSV-compliant spreadsheet program (and all of them are these days) opens a document with your table data imported into the spreadsheet. You can then edit and save your table data in that program’s document format.

Saving a table or formatted text in a new file


Unlike when you select text with the Select Text tool, after you highlight a table or blocks of text with the Select Table tool, you can not only copy it to the Clipboard but also save the selection into a new file format. To do this, you right-click (Control+click on the Mac) the text or table selection and then click Save Selected Table As on the context menu to open the Acrobat Save As dialog box, where you specify the folder, filename, and type of file format in which to save the selection.
Select the Rich Text Format when you want to open the table or formatted text in a word processor such as Microsoft Word. Stay with the Comma Separated Values (*.csv) default file format when you’re saving a table of data and you want to be able to import that data into a spreadsheet program (such as Microsoft Excel) or a database program (such as FileMaker Pro).

Tuesday, November 24, 2009

Selecting tables and formatted text


The second text tool on the Basic toolbar is called the Select Table tool, and as its name implies, you use this tool when you want to copy text set in a table or to copy text along with its formatting (including font, font size, text color, alignment, line spacing, and indents when saving in an RTF — Rich Text Format — file format). To use the Select Table tool, you use its cross-hair mouse pointer to draw a bounding box around a table or lines of text that you want to select. As soon as you release the mouse button, Acrobat encloses the selected text or table in a heavy blue outline. The Select Table tool can make table selections based on a PDF document’s underlying document structure tags. To find out if you’re working with a tagged PDF document, right-click the page with the Select Table tool to see if the Select Table Uses Document Tags command is activated (the PDF file is tagged) or grayed-out (the PDF file in untagged) on the context menu. Acrobat automatically selects this command when you open a tagged PDF document. If you’re working with a tagged PDF document, you can simply click with the Select Table tool to select a table or lines of text formatted as a table.
When Acrobat identifies a text selection as a table, it maintains the structure of the table by preserving the layout of the data in rows and columns of cells. If you then save the table data in the RTF file format for use in a word-processed document, the table maintains this layout in the new document. If you save the table data in the CSV (Comma Separated Values) text file format, which is the default format selected by Acrobat, the program maintains the table structure by separating the data items with commas and hard returns. This creates what is often called a comma delimited text file that most database and spreadsheet programs can convert easily into their own native file formats.

Selecting columns of text

The Select Text tool enables you to select complete columns of text without having to worry about selecting text in any adjacent columns on the page that you don’t want to include. Use this tool when you need to copy all or part of columns on a single page of a PDF document that uses newspaper columns.
To select a column of text with the Select Text tool, you simply drag the Ibeam pointer from the top-left corner of a column of text in a diagonal direction toward the bottom-right corner of the column of text and release the mouse button.
In this figure, I have used the Select Text tool to select all the text in the righthand column. The selected text is now available for copying to the Clipboard or dragging to a document in another program window. If you’re working with a lot of text in a PDF document, you can configure the Hand tool in Acrobat 6 to automatically function as the Select Text tool when you hover it over text in a PDF document. Choose Edit➪Preferences or press Ctrl+K (Ô+K on Mac) to open the Preferences dialog box. Click General in the list box on the left to display the General Preferences options, and then select the Enable Text Selection for the Hand tool check box. You can enter values (measured in picas) in the Text Selection Margin Size and Column Selection Margin Size text boxes to specify how much white space around text or columns to allow before the Hand tool transforms into the Text Selection tool and vice versa.

Using drag-and-drop to copy text

Instead of copying and pasting to and from the Clipboard, you can just drag the selected text from the PDF file open in an Acrobat window to a new document open in another program window. Figure how this method works.
PDF document open in the Acrobat program window on the right, I dragged the Select Text tool through the lines with the title and the first paragraph of text to select it. Then I dragged this text selection to the new document window open in Microsoft Word on the left by positioning the arrowhead mouse pointer (with the outline of the text selection) at the very beginning of the blank document.

Tuesday, October 27, 2009

Extracting Blocks of Text from PDF

Before you can copy sections of text in a PDF document to the Clipboard or another open document, you need to select the text in the PDF document. To select text in a PDF document, you use two of the three different tools found on the Selection toolbar, which is attached to the Basic toolbar:
  • Select Text tool (V): Use this tool to select lines or columns of text by dragging through them.
  • Select Table tool (Shift+V): Use this tool to select a table or block of text with its formatting by drawing a bounding box around the table or text block.
You can also use the TouchUp Text tool (press T to select this tool) to select a block of text defined by its underlying document structure tags, such as whole headings or whole paragraphs. True to its name, this tool should be used only when you need to extract small amounts of text from a PDF document. Like the text selection tools on the Basic toolbar, text selected with the TouchUp Text tool can be copied, deleted, edited, and placed in other program documents . When you use the Select Text tool to select lines or columns of text in a PDF document, you can then copy the selected text to the Clipboard by choosing Edit➪Copy or by pressing Ctrl+C (Ô+C on the Mac). After you’ve copied the text to the Clipboard, you can switch to a document open in another program and then paste the copied text into the file by using that program’s Edit➪Paste command or by pressing Ctrl+V (Ô+V on the Mac).