Extracting Metadata And Structure
Project sponsored by DTIC, GPO, and NASA
DELIVERABLES FOR GPO PHASE IV
  1. GPO Document Characterization and Feasibility Study of EPA Documents, - Nov 1, 2007

    Delayed due to non-availability of samples from the collection.

  2. Document Characterization, and Selection, Validation statistics - Feb 1, 2009
  3. Test Metadata Extraction and Metadata Delivery
    1. Document classification, enhanced software - April 1, 2008
    2. Form development, form templates - Feb 1, 2008
      • No usable forms were found in 1,000 GPO documents, hence no templates to be delivered.
    3. Template refinement for enhanced engine, Templates -- Jul 1, 2008
    4. Engine enhancement, software releases - Dec 1, 2008
  4. Evaluation of process
    1. with EPA documents, available via ftp.cs.odu.edu as extract_eval_2008_09.zip - Sept 1, 2008
    2. with second set of documents,available via ftp.cs.odu.edu as congressDeliverable20090714_*.zip - July 1, 2009
  5. Final report on EPA documents and demonstration
    1. Demonstration of process on EPA at GPO - Sept 1, 2008
    2. Demonstration of process on 2nd collection at GPO - July 1, 2009
  6. Feasibility study for second set
    1. Category development and assessment for second set of documents, report - Nov 1, 2008
    2. New templates and language features - Feb 1, 2009
    3. Human intervention (metadata correction) interface, - April 1, 2009
    4. New Engine enhanced, software - Aug, 1, 2009
  7. Cost estimates and final report, Report on cost to run GPO's other major documents types through automated process - Oct 1, 2009
MONTHLY REPORTS

Old Dominion University Digital Library Group. extract@cs.odu.edu