Monthly Report for January 2007



Project No.: 260671/72

Funding Agency: Defense Logistics Agency - Defense Technical Information Center/NASA

Award No.: SP4700-05-P-0148

Project Title: Tools for Automatic Extraction of Metadata from DTIC Electronic Documents Collections - Phase II

Project Period: 09/19/05 3/31/06


Work Accomplishments during period






Completed evaluation of post-hoc evaluation concluded that this is a viable approach, though work will continue on alternatives.

Created demos of post-hoc classification for DTIC and NASA collections, added to web site.


Began integration testing with post-hoc classifier added to non-form extraction.


Preliminary experiments conducted in clustering based on visual recognition algorithms. There show promise as an alternative to or pre-filter to post-hoc classification.



Metadata Extraction general:


Continued review of possible engine enhancements for next generation extractor.



Packaging and Deliverables:


Updated README and installation directions to describe application to alternate collections (e.g., NASA instead of DTIC).



Problem Areas and Corrective Actions




Deviations in Cost/Schedule




Papers and Reports


Detailed validation paper written for future conference submission, available at Deliverable Page.


Work to be Accomplished Next Period

1) Continue testing of the integrated system

2) Tune the post-hoc validation approach

3) Continue review of engine enhancements and related template language features