Tuesday, January 29, 2008

Improvement

Pass the name of your PDF document and the kw_catcher window size to make_index.sh like so:

make_index.sh mydoc.pdf 12

The script will create a document index named mydoc.index.pdf. Review this index and append it to your PDF document if you desire. The script also creates two intermediate files: mydoc.data.txt and mydoc.txt. If the PDF index is faulty, review these intermediate files for problems. Delete them when you are satisfied with the PDF index.

The second argument to make_index.sh controls the keyword detection sensitivity. Smaller numbers yield fewer keywords at the risk of omitting some keywords; larger numbers admit more keywords and also more noise.

0 comments: