Monthly Report for January 2007
Project No.: 260671/72
Funding Agency: Defense Logistics Agency - Defense Technical Information Center/NASA
Award No.: SP4700-05-P-0148
Project Title: Tools for Automatic Extraction of Metadata from DTIC Electronic Documents Collections - Phase II
Work Accomplishments during period
Completed evaluation of post-hoc evaluation – concluded that this is a viable approach, though work will continue on alternatives.
Created demos of post-hoc classification for DTIC and NASA collections, added to web site.
Began integration testing with post-hoc classifier added to non-form extraction.
Preliminary experiments conducted in clustering based on visual recognition algorithms. There show promise as an alternative to or pre-filter to post-hoc classification.
Metadata Extraction general:
Continued review of possible engine enhancements for next generation extractor.
Packaging and Deliverables:
Updated README and installation directions to describe application to alternate collections (e.g., NASA instead of DTIC).
Problem Areas and Corrective Actions
Deviations in Cost/Schedule
Papers and Reports
Detailed validation paper written for future conference submission, available at Deliverable Page.
Work to be Accomplished Next Period
1) Continue testing of the integrated system
2) Tune the post-hoc validation approach
3) Continue review of engine enhancements and related template language features