Extracting Metadata And Structure
Project sponsored by DTIC, GPO, and NASA
DELIVERABLES FOR DTIC PHASE IIIB
Because of unforeseen issues arising from DTIC's adoption, early in this performance period, of a new OCR engine (Luratech/Abbyy), replacing Omnipage engine previously employed, we have, after discussion with DTIC, pushed back some DTIC Phase IIIB deliverables by three months and deferred deliverable 2 (Text PDF) entirely to permit resources to be devoted to adapting the software to employ the new OCR engine.
The end date for the DTIC Phase IIIB project has been moved from March 31, 2009 to June 30, 2009. We are absorbing the cost of this extension and are not asking for any additional funding for this.
  1. Template Creation Tool
    1. Develop tool, - Jan 1, 2009

      Developed and included in V3.4 releases. The current version has seen extensive internal use by the development team.

  2. Text PDF
    1. Implement an enhanced version of the text PDF support system, Software upgrade - Jan 1, 2009
      The Text PDF module is now targeted for the next phase.
  3. Template Set
    1. Template Language enhancement, Language spec (internal) - June 1, 2008
    2. Engine enhancement, Software upgrade

      Implemented all language features described in the Language spec (item 3A).

      New tasks included in this release not part of original schedule:
      • Added support for Luratech OCR software as an input source to the extraction program.
      • Developed internal network access and caching subsystems for internal use to manage team access to the Luratech package, which by license and design cannot be shared by multiple developers working on different machines on the local network.
    3. Template development, Template set (included in software release, 3B, above) - March 31, 2009
      This task was originally focused on the development of templates for non-form documents. By request of DTIC, the focus was shifted to improvment of form templates and associated post-processing in anticipation of an accelerated deployment of the form-based capability. The bulk of the development of templates for non-form documents is deferred to the next phase.
  4. Training
    1. Template writing training using Template Creation Tool at ODU, Training Seminar - March 31, 2009
      After negotiation with DTIC, focus was shifted to providing a remote operation and testing capability for DTIC staff via the use of ODU-hosted virtual machines.

      Virtual PC set up at extractxp3.seven.research.odu.edu and training materials provided on virtual machine access and operation of software.

MONTHLY REPORTS

Old Dominion University Digital Library Group. extract@cs.odu.edu