Release 4.0 (DTIC collection)

March 15, 2010

Release 4.0 is the first release to offer native (text) PDF handling. Most documents will be processed without the need for OCR. This avoids a major source of error in typical document processing and also speeds up processing considerably.

Selected pages of documents will be sent out for OCR only if it appears that the pages may be scanned images (i.e., if the page contains a single large image and almost no natively inserted text) or if irregularities are detected in the fonts used on the page that might prevent accurate extraction.

extract/v4.0releasedtic.txt · Last modified: 2010/03/15 16:12 by zeil Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0