Extracting Metadata And Structure
Project sponsored by DTIC, GPO, and NASA
DELIVERABLES FOR DTIC PHASE II 2006-2007
  1. Document Classification
    1. Exploration of Alternatives - Report on experiments - Jan. 19, 2006
      See Item 5. Papers and Reports
    2. Classifier design - Design report - May. 5, 2006
      Document Classification (html, doc) May 5, 2006
    3. Classifier development - Software packaged - Sep. 19, 2006
      Demo available - Nov 15 2006
  2. Metadata Extraction - RDP
    1. Engine enhancement design - Engine design , Template specification - Nov. 19, 2005
      See 2B
    2. Engine enhancement development - Software, packaged - April. 19, 2006
      RDP Software and Readmefile-(Version 1) April 19,2006
      RDP Software and Readmefile-(Version 2.0) August 30,2006
      RDP Software and Readmefile-(Version 2.1) March 14,2007
      Demo of RDP Documents
  3. Metadata Extraction - General
    1. Engine enhancement design-Engine design, Template specification-Report - Jul. 19, 2006
    2. Engine enhancement development - Software, packaged - Jan. 19, 2007
    3. Template Development - Template set - Jan. 19, 2007
      Form Templates (DTIC) Non-form Templates (DTIC)
  4. Transition
    1. DTIC process study - Domain model(DTIC process)- Dec. 19, 2005
      Based on our discussion with DTIC on their process of document handling, we have developed a process of metadata extraction that can be easily integrated in the DTIC process. For details see Readme file in deliverable 2B.
    2. Develop documentation standards - Oct. 19, 2005
      ODU Computer Science Dlib Team: Software Development Process (html doc)-Oct. 19, 2005
    3. Document configuration and front-end processing-Configuration documentation, Clean XML specification-Mar. 19, 2007
      Metadata Extraction software (version 3.0) and Readmefile-(Version 3.0) May 16,2007
      Metadata Extraction software (version 3.1)-Jun 19,2007
    4. Training - Tutorial, Workshop - Mar. 19, 2007
    5. Final Report
      Dtic Final Report
  5. NASA Collection Characteristics
    1. Feasibility Study to identify the NASA document types Report - May 31, 2006
    2. Form identification and template development - Template set - Aug 31, 2006
    3. Enhance classification algorithm for two specific classes-Software packaged. Oct 31, 2006
      Demo available - Nov 15, 2006
      Non-form templates (NASA)
    4. Process study for inter-organizational collections. configuration software . Dec 1, 2006
    5. Enhance engine to recognize two major classes. Dec 15, 2006.
      Software packaged see 3B
    6. Evaluation of extraction process. Final Report . Jun 21,2007
      Nasa Final Report
  6. Papers and Reports
    1. Automated Building of OAI Compliant Repository from Legacy Collection (DOC)ELPUB 2006, June 14-16 Bansko Bulgaria International Conference on Electronic Publishing
    2. Using Statistical Models for Dynamic Validation of a Metadata Extraction System (PDF)
    3. Automated Template-Based Metadata Extraction Architecture (DOC) ICADL 2007 - the 10th international conference on Asian Digital Library
    4. A Machine Learning Approach for Automatic Text Categorization (DOC) ICSIIT 2007, Bali, Indonesia, July 2007
    5. A Scriptable, Statistical Oracle for a Metadata Extraction System (PDF) STEV07, Portland, Oregon, Oct 2007
MONTHLY REPORTS

Old Dominion University Digital Library Group. extract@cs.odu.edu