Monthly Report for May 2006
Project No.: 260671
Agency: Defense Logistics Agency -
Award No.: SP4700-05-P-0148
Project Title: Tools for Automatic Extraction of Metadata from DTIC Electronic Documents Collections - Phase II
Project Period: 09/19/05 - 09/18/06
Work Accomplishments during period
No Action this month
NASA Collection Characteristics:
186 files from NASA - 43 forms, 143 non-forms
43 forms files, 3 unresolved (form was an image)
143 non-form: classified manually into 5 classes with 75 docs
Investigated ‘blade’ PDF extraction software for use in getting tabular information as alternative to getting from ocr output. Will compare its performance on form pages that have problem with ocr.
Problem Areas and Corrective Actions
Deviations in Cost/Schedule
Work to be Accomplished Next Period
Prepare presentation for NASA summarizing current characterization of their collection based on two samples of 200 documents each.
Prepare prototype software that will color visual display of point pages.
Prepare prototype software for classification based on authority files
Investigate the use of oracle based validation as alternative to classification
Work on feature enhancement of template engine