Monthly Report for March 2007

 

 

Project No.: 260671

Funding Agency: Defense Logistics Agency - Defense Technical Information Center

Award No.: SP4700-05-P-0148

Project Title: Tools for Automatic Extraction of Metadata from DTIC Electronic Documents Collections - Phase II

Project Period: 09/19/05 - 04/30/07

 

Work Accomplishments during period

 

        Completed stress testing of the integrated software. One of the problems we found was that if a pdf document is restricted, the pdftk process hangs as it is waiting for the password. We added code to discover these types of documents and remove them from the metadata extraction process.

        Identified that we need to have a time-out approach to discover hanging of processes. Also, to use a standard logging package such as log4j to keep track of exceptions.

 

        Evaluated the software for the DTIC documents that originally had no form.

 

        Revisited illegal character handling to generate numeric character entities for anything that is not a valid 8-bit encoded XML.

 

 

Problem Areas and Corrective Actions

 

None

 

Deviations in Cost/Schedule

 

None

 

Work to be Accomplished Next Period

1) Repeat form/stress test using NASA documents

2) Complete adding of logging software

3) Complete the time-out approach to avoid hanging processes.