Monthly Report for December 2006



Project No.: 260671

Funding Agency: Defense Logistics Agency - Defense Technical Information Center

Award No.: SP4700-05-P-0148

Project Title: Tools for Automatic Extraction of Metadata from DTIC Electronic Documents Collections - Phase II

Project Period: 09/19/05 - 03/14/07


Work Accomplishments during period





Completed the testing work on DTIC documents and the demo is available at:


















We started the classification work for NASA document set. For this we are initially working of building statistics from NASA documents, which can then be used by validation approach for classification.










Metadata Extraction Software Packaging


Package the latest software with validation approach for classification. The workflow for the latest software is shown below.


















Figure 1. System Overview





Figure 2. Input Processing












Figure 3. Form  Processing








Figure 4. Post Processing










Figure 5. Non Form Processing









Figure 6. Post-Hoc Classification



Problem Areas and Corrective Actions




Deviations in Cost/Schedule




Work to be Accomplished Next Period

1) Continue work on NASA document set.

2) More extensive experimentation with the validation approach for classification

3) Work on validating the extracted metadata