This application does OCR on Acrobat documents. The tool is operated through a command line interface.
This command Line application uses Optical Character Recognition technology to OCR Acrobat documents to editable ASCII files. This does not require the presence of the Adobe Acrobat to be present. This OCR recognition is particularly useful when you cannot copy the text in the PDF document. The converter is able to handle a range of image formats that include TIFF, BMP, PNG, JPG, PCX, TGA, etc. You can select pages to be converted instead of the complete document as also a range of pages. Optical Character Recognition (OCR) turns printed or written text into an electronic character-based file. A document that is scanned and converted into a PDF document can be converted into computer readable ASCII characters, rather than the original images, and become editable. Thus, the application can actually read the text from PDF files and password protected documents. You can integrate the capabilities into scripts too.
The OCR capability built in takes care of the character variations in, besides English, German, French, Spanish, Italian and other languages. How well the text in the image based document is reproduced, depends largely on scan quality and the characters of that image. Most of PDF files are generated via the scanning route. Thus if you need to edit these documents for any reason, that becomes quite a hassle. Do remember however that with this it would be easy to get around copyright restrictions. You need to be conscious of how you use it of course. This is a very convenient tool for converting those image based PDF documents. While the process is automatic there would be significant amount of editing, wherever OCR fails.