Monthly Report for March 2007
Project No.: 260671
Funding Agency: Defense Logistics Agency -
Award No.: SP4700-05-P-0148
Project Title: Tools for Automatic Extraction of Metadata from DTIC Electronic Documents Collections - Phase II
Project Period: 09/19/05 - 04/30/07
Work Accomplishments during period
· Completed stress testing of the integrated software. One of the problems we found was that if a pdf document is restricted, the pdftk process hangs as it is waiting for the password. We added code to discover these types of documents and remove them from the metadata extraction process.
· Identified that we need to have a time-out approach to discover hanging of processes. Also, to use a standard logging package such as log4j to keep track of exceptions.
· Evaluated the software for the DTIC documents that originally had no form.
· Revisited illegal character handling to generate numeric character entities for anything that is not a valid 8-bit encoded XML.
Problem Areas and Corrective Actions
Deviations in Cost/Schedule
Work to be Accomplished Next Period
1) Repeat form/stress test using NASA documents
2) Complete adding of logging software
3) Complete the time-out approach to avoid hanging processes.