Extract Project Developers' Guide

Basic information for developers on the support tools & procedures for the Extract project. Updated versions of this manual can be found at http://www.cs.odu.edu/~extract/developersWiki/doku.php?id=extract:manuals

1. Overview

For developers working with this code, the overall plan is:

  1. Install Eclipse on your chosen machine. Then, install the Ivy plugin for Eclipse, (Ivy) . If you want the option to work from the command line as well (recommended), install Ant.
  2. Load the entire project from CVS (see Initial Build) or use source distribution zip file(see Initial Build from source-distribution).
  3. Create Eclipse working projects within the checked-out directories. These are where you will do most of your work.
  4. Run Ant to compile (see below) and test current code.
  5. Make your changes. Repeat 4 & 5 as necessary.
  6. Commit your changes into CVS only when they pass compile and test. (Do not “break the build” for everyone else!)

1.1 Subproject Structure

The Extract project at the moment is structured as a single main project with several subprojects:

  • generatedSource: code produced by running 3rd-party tools (currently the Sun JaxB processor)
  • extractor: this currently contains all code and data that is not collection-specific
    • Depends on: generatedSource
  • extractor-dtic: code and data specific to the DTIC collection
    • Depends on: generatedSource, extractor
  • extractor-nasa: code and data specific to the NASA collection
    • Depends on: generatedSource, extractor
  • extractor-gpo_epa: code and data specific to the EPA collection from the GPO
    • Depends on: generatedSource, extractor
  • extractor-gpo_congress: code and data specific to the Congressional documents collection from the GPO
    • Depends on: generatedSource, extractor, extractor-gpo_epa
  • installer: code for the installation program

The main project serves as a container for these subprojects and supports the construction of a .jar file that can be used to install the entire system.

The directory structure for the project reflects the project structure. Each subproject occupies a separate directory. The subproject directories are gathered together under the main project directory.

It is reasonable to expect that, as new clients are acquired, the collection of “extractor-collectionName” subprojects will also grow.

1.2 Software Tools

A number of support tools and associated practices have been adopted as standards for this project. All developers working on the project will need to be familiar with these:

  • Java (Sun JDE) The project uses versions 1.5 or later
  • CVS A CVS repository is used to mediate simultaneous development by different individuals and teams. This also provides an archiving capability.
  • Ant Apache Ant is the build manager.
  • JUnit Support for automated unit testing. Developers are expected to provide a self-checking unit test suite for each new class they provide. When bugs are detected, these bugs should, if possible, be added to the appropriate unit test suite prior to debugging and correction of the problem.
  • XMLUnit – JUnit extension supporting tests involving XML outputs
  • Eclipse GUI/IDE used by most developers. Eclipse provides useful interfaces to all of the above. Note, however, that the tools above are te “official” support for each function. For example, although Eclipse has built-in project build management, Eclipse builds and tests will generally fail until the Ant build has been completed successfully at least once.

2. Project Setup

2.1 Initial Build from CVS

The initial build is the starting point for each new project. It places all files for the Extract project on your machine. Even if you are only going to be working on a single subproject, you should begin by following this procedure to obtain a copy of the entire project. This initial build is a “General Project” from Eclipse’s point of view. Eclipse will not attempt to compile code from this project. It will, however, allow you to run Ant to compile code and build the complete project and it will allow you to synchronize any changes you make (and future changes that other developers make) via the CVS repository.

  1. In Eclipse, File→New→Project.
  2. Select “Projects from CVS”
  3. If necessary, create a new repository location: :pserver:yourLoginName@cvs.cs.odu.edu:/home/cvs/dlib
  4. Use an existing module. Log into the cvs.cs.odu.edu cvs server. Select the ”Extract” module. Next.
  5. Check out as: a project configured using the New Project Wizard. Next
  6. Unless you are working in a branch (a way of splitting off a set of changes so you can make a series of changes without affecting everyone else working on the main trunk), select HEAD and then Finish. This finishes giving Eclipse info about the CVS checkout. Now the New Project Wizard will run again to let you describe the project you are checking out.
  7. Select General Project
  8. You can name this project whatever you like. For the sake of example, I will assume you call it Extract-CVS. Do not use the default location. Put this somewhere outside your Eclipse Workspace, selecting any easily accessible folder location. Finish.
  9. Eclipse will now download the project from CVS.
  10. When it is done, you can browse through the project structure using the tree controls on the left of Eclipse. Among other things, you will note folders for each of the subprojects. You will also see a file named “build.xml”. Actually, there is one such file in the top-level project and another in each subproject. For now, we will only be concerned with the top-level one.
  11. In the Package Explorer, right-click on build.xml, Run As, Ant Build. Using Ivy, dependencies are resolved and a number of .jar files will be downloaded into the project’s lib directory.
  12. You now have a basic project build. From this point, you have various options:
    1. Running Ant Build once, or possibly twice, more will create a deployable version of the Extract software in a .jar file in the target folder. Be warned: the process of building the statistical databases can take a long time (one or more hours).
    2. If you are working on non-programming tasks, you might be able to begin editing/adding files in this project.
    3. If you are working on a programming task, you will next want to create a development subproject as described in the next section.


  • As you run Ant Build, a target directory is created and various folders and files placed there. In the main project and in all of the subprojects, target/ contains all the products of the build operations. The target directories, and anything in them, may be safely deleted as they can be rebuilt later.
  • As you make changes, note the ‘>’ markers by files and directories. These indicate that something has changed from the CVS version.
  • The CVS commands are accessible by right-clicking on a file or project and looking under “Team”.
  • Normally, only source code (stuff we produce via an editor) should be checked into CVS. It’s best to do a “clean” operation before doing a massive commit so that you do not accidentally check in .class or .jar files.
    • Do not check in .class files
    • Do not check in .exe files
    • Do not check in anything from a target/ or gen-src/ directory.

2.2 Initial Build from DTIC Source Distribution

The initial build is the starting point for each new project and the first step towards setting up the Extract project development environment in Eclipse. This top-level Extract project is a “General Project” with multiple java subprojects under it. In this section we describe the process to create the initial Eclipse general project from the source code distribution file and build it to create the installable software. Eclipse does not attempt to compile code from this project. It will, however, allow you to run Ant to compile code and build the complete project. To start setting up the development environment, obtain a copy of the source distribution zip file delivered to DTIC and follow the steps below:

  1. Unzip the source distribution file into a directory on your computer.
  2. Strart Eclipse and go to File→New→Project to start a new Project.
  3. Select “Project” under “General” in the window and click Next.
  4. Enter a project name like “Extract” and browse to the unzipped source directory(step 1).
  5. Click Finish. The project will be created.
  6. When it is done, you can browse through the project structure using the tree controls in the Package Explorer on the left side of the Eclipse window.
  7. You will note folders for each of the subprojects. Each of these subfolders contain the source code and related files in the “src” directory. You will also see a file named “build.xml”. Actually, there is one such file in the top-level project and another in each subproject. These files contain Ant targets required to compile the java classes, build statistical databases and deploy the project. For now, we will only be concerned with the top-level one which deploys the whole extract project.
  8. In the Package Explorer, right-click on build.xml, Run As, Ant Build. If the build fails with a warning, run the build file again.
  9. Running Ant Build once, or possibly twice, more will create a deployable version of the Extract software in a .jar file in the target sub-folder in the extractor-dtic subproject. Be warned: the process of building the statistical databases can take a long time (one or more hours). Now, you have built the installable jar from source code.
    1. If you are working on non-programming tasks, you might be able to begin editing/adding files in this project.
    2. If you are working on a programming task, you will next want to create java development subprojects as described in the next section. These projects will allow you to compile the java classes, run the unit tests and debug code.


  • As you run Ant Build, a target directory is created and various folders and files placed there. In the main project and in all of the subprojects, target/ contains all the products of the build operations. The target directories, and anything in them, may be safely deleted as they can be rebuilt later.
  • All required dependencies are packaged with the /lib/IvyCache folder of the source-distribution file. Unless the dependencies are modified, it does not need to access the online repositories to resolve the dependencies and complete the build process.
  • Any changes made in the project cannot be synchronized with the CVS repository. If multiple developers are working of the project then, it is recommended to use the source-distrubution to set up a CVS repository.

2.3 Setting Up Working Projects

If you are tasked with programming or testing duties, you will want to create an Eclipse Java project that supports the subproject in which you are working.

You must start with an Initial Build. The following instructions will create an Eclipse project that “overlaps” the files in the Initial Build. This new project will be a Java project from Eclipse’s point of view, and so will be able to exploit the significant support provided in Eclipse for Java programming.

The directory structure of the subprojects follows the conventions used in many Apache open-source projects. At the top-level you will find

  • a build.xml file – this contains the Ant instructions on how to compile, test, and build this subproject.
  • a src directory – this contains source code and data. Inside the src directory you will find
    • The main directory, which contains code and data for the actual deliverable project
    • A build directory, containing code and data that is used during the build process but that need not be retained within the delivered product. (Typically these are used to generate other data files that are part of the final project.)
    • The test directory, containing code and data used for unit testing of this subproject
    • Inside both those three directories, you may find directories
      • java, containing Java source code. The official package name for this project is edu.odu.cs.extract, so the java directory always contains a directory edu, which contains a directory odu, and so on.
      • resources, containing data files that will be stored in the same .jar files used to deliver the compiled code, and so will be retrievable via Java’s resource loaders
      • data, containing data files that will be stored outside the .jar files, usually at locations designated by a “property”
  • A target directory – this contains files generated by the build process. A “clean” compile can be forced by deleting all contents of this directory before starting a build. The target directory may contain a variety of files and directories, often created as one-off inputs and outputs for tests, but the following directories are standard:
    • classes – contains compiled code and resources (from src/main) . All files and directories here will be included in the deployable package.
    • test-classes – contains compiled code and resources (from src/test) used for unit testing. None of this will be included in the deployable package and other subprojects should not assume that any of this exists, as it will only be created if the unit tests have been run.
    • test_reports – contains reports from the unit tests. Look here for detailed information on test failures.

To set up a development subproject in Eclipse, start from an Initial Build, possibly with other subprojects already set up. In particular, note the list of subprojects that indicates that some subprojects depend on others. If you want to work on a subproject A that depends on subproject B, you must first create the development subproject for B and build (compile) that subproject.

  1. File→New→Project
  2. select “Java Project”
  3. Create Project from Existing Source. Browse to the directory containing the build.xml for your desired subproject. This will be inside the larger project tree that you loaded from CVS as part of the Initial Build.. Next.
  4. A dialog will open that allows you to specify the source and output directories for your new subproject. On the Source tab you should have an entry for src/main/java and src/test/java.
    1. If not, navigate through src to those directories and add them (Use as source folder).
    2. Check “Allow output folders…”. Change the default output folder, if necessary, from …/bin to …/target/classes
    3. Expand src/main/java. If should list the default for its output folder.
    4. Select src/test/java. Configure Output Folder Properties. Specific output folder: target/test-classes
    5. Select src/build/java (if it exists). Configure Output Folder Properties. Specific output folder: target/test-classes
    6. For the generated-src subproject only: Select gen-src. Click on “Add folder … to build path”. If necessary, Configure Output Folder Properties. Specific output folder: target/classes
    7. If this project depends on others (e.g., extractor-dtic and extractor-nasa depend on extractor), then go to the Projects tab and select those projects this one depends on.
    8. Go to the Libraries tab. Add Libraries. Select “IvyDE Managed Dependencies” (make sure IvyDE is installed) and click “Next>”. On the “Main” tab check “Enable project specific settings” and browse for the “Ivy Settings path:”. Select the file “ivy-settings” in the main extract folder. Now, in the “Retrieve” tab again check “Enable project specific settings” and also check “Do retrieve after resolve”. Specify ”/target/JarFiles/[artifact]-[revision](-[classifier]).[ext]” as “Retrieve pattern”. Click finish.
    9. Select the ivy.xml[*] library and “Finish”.
    10. Finish
  5. Eclipse will try to compile everything and may report lots of problems. That’s OK.
  6. In the Package Explorer, right-click on ivy.xml[*] library and select resolve. Watch the pop-up output for reports that dependencies being resolved.
  7. In the Package Explorer, right-click on “refresh”. Any Eclipse markers (red X’s) due to compilation errors should now disappear.
  8. In the Package Explorer, right-click on build.xml, Run As, Ant Build.
  9. You now have the subproject build.
    1. Running Ant Build once, or possibly twice, more will create a deployable version of the Extract subproject software in a .jar file in the target folder.

2.4 Applying Patch for DTIC Source Distribution

This section describes the processes of updating the source distribution by applying a patch using the Patcher application. Please follow the following steps:

Step 1: Obtain a copy of the following files and save them on your file system.

  • The full source distribution zip file provided during the previous delivery.
  • The update patch file(.jar) provided to you.

Step 2: Download the Patcher application JAR file and save it on your file system. Double click the JAR to start the application.

Step 3: Specify the location of the patch file. Click on the red circled button in the dialog box and select the patch file after browsing to the required location.


Step 4: Next, specify the location of the old source distribution file. Click on the circled button in the dialog box and select the source file after browsing to the required location.


Step 5: Finally, specify the folder where the updated source distribution JAR is to be placed on your file system. Click on the circled button in the dialog box and select folder.


Step 6: Click “Apply” button and the application updates the old source code file with the patch.


Step 7: Patching is complete.


The new source distribution JAR can be found in the folder specified during step 5.

3. Programming Support

3.1 Eclipse

Eclipse is the preferred development environment on this project. It supports not only Java programming, but also our use of Ant,Ivy and CVS.

Eclipse comes in several variations. Most project members use the "Eclipse Classic" variant.

Launching Eclipse

Eclipse is a Java program. You can adjust the settings under which it runs by altering the command line when it is invoked or by adding them to the eclipse.ini file. See the Eclipse FAQ for details.

A common modification is to increase the heap size allowed by the Java VM for Exlipse. Add to the end of the command line:

-vmargs -Xmx256M

adjusting the number of Meg up from 256 if necessary. If you already have -vmargs in the command line, do not add it a second time. (If you are launching Eclipse from a Windows shortcut, add these parameters in the “Target” field of the shortcut properties.

This affects the program Eclipse itself, not programs of yours that you launch from inside Eclipse. See below for info on setting the heap size of programs launched from Eclipse.


Some notes on recommended settings. Most of these can either be set as defaults (Windows→Preferences) or as project settings (Project→Properties).:

  • Java→Installed JRE's: Make sure that this points to the compiler for your system. This should be a full compiler (JDK), not just a JRE. Although Eclipse has its own internal Java compiler, you need a separate one when using Ant.
  • Java→Compiler: Set compliance level to 1.6.
  • Java→Debug: Be aware of the existence of the setting “Suspend execution on uncaught exceptions”. You will want to set/clear this depending on what you are actually doing. When you are trying to track down an odd exception, it's a great time-saver. But having it active all the time can lead to a lot of false alarms in code that is supposed to throw an exception and to catch it in other functions.

Launching Applications

To run an application, right-click on the class file containing the main() function and select “Run As”. Normally, you will run either as “Java Application” or as “JUnit Test”. By selecting one of the ”…” options, you can adjust the runtime configuration:

  • The Arguments tab is particularly important.
    • In the Program Arguments box, place any command line parameters (one per line) you want fed to main(). Examples would be ”-debugMode” or ”-collection=dtic”.
    • In the VM Arguments box, add parameters that should be supplied to the Java engine (VM). Examples would include ”-Djava.library.path=..\lib” (for access to DLL's in the lib directory) or ”-Xmx256M” to increase the max heap size.

3.2 Ant

Although Eclipse will compile your code automatically each time you save a Java file, keep in mind that the instructions in build.xml, executed by Ant, are the official way to compile, test, and build the project. The reason for this is that Eclipse’s built-in build mechanisms are fine for source code, but the Extract project requires steps including some automated generation of source code from XML schemas, copying of data files into appropriate locations, generation of databases and data files, etc., that can’t be done via the basic Eclipse built-in mechanisms.

  1. Right-click on the subproject’s build.xml file, Run As, Ant Build or Ant Build… In the first case, the default is to compile, unit test, and if tests succeed, package everything into a .jar file. In the second case, you can select your desired target. The most common ones are:
    • ant compile – compile all code associated with a subproject
    • ant test – compile and run unit tests
    • ant package – compile, run unit tests, and create the .jar and .zip files for this subproject
    • ant deploy – (main project only) compile and package all subprojects (bypasses unit tests), then package up the entire system as an executable .jar file that launches the installation program
    • ant clean – Remove all files produced by any of the above targets, leaving the original source code intact
  2. If tests fail, look in the target/test_reports directory

3.3 Ivy & IvyDE

Ivy is an open source dependency management tool closely integrated with Ant. It has the following features:

  • All published external dependencies of a project can be listed in an xml file commonly named ivy.xml. Specific information about each dependency like organization, revision etc. can be mentioned.
  • The location of these dependencies are then provided in a settings file. The location could be a single or a combination of local and online repositories. Ivy uses the maven2 repository by default.
  • Ivy resolves each dependency from one of the repositories at build time. It copies the required artifact and meta-data file into a local cache. Future builds only need to access this cache.
  • Ivy goes further to read the dependencies of the downloaded artifact from the meta-data file and transitively resolve them too.
  • Ivy also generates a report of the resolution process.

Please see the Ivy documentation from Apache for full information.

Installing Ivy & IvyDE

We use Ivy with Ant or Eclipse in this project. Ant can use tasks provided to download and install Ivy at build time. However, while using Ivy with Eclipse, the IvyDE plug-in for Eclipse must be installed.

There are two ways to install the IvyDE plug-in :

  1. Using IvyDE update site.
    1. Open the Update Manager. If you use Eclipse version “Galileo” then go to “Help > Install new Software”. Else it may be in “Help > Software Updates > Find and Install”.
    2. In the install dialog box click “Add”. Enter name as IvyDE and location as “http://www.apache.org/dist/ant/ivyde/updatesite” and click ok
    3. A new entry “Apache Ivy update site” will appear in the list of update sites. Check the entry and complete the installation with defaults. It will require you to restart Eclipse.
    4. The process may vary slightly for different versions of Eclipse. See here for the detailed process. In case of any errors in the above process try the manual install.
  2. IvyDE Manual installation
    1. Download the latest plugins (Jar files) for Ivy & IvyDE from http://www.trieuvan.com/apache/ant/ivyde/updatesite/plugins/ and copy them into the plugin directory in $ECLIPSE_HOME.
    2. Download the latest feature files for Ivy & IvyDE from http://www.trieuvan.com/apache/ant/ivyde/updatesite/features/, unzip any zip files and copy them into the feature directory in $ECLIPSE_HOME.
    3. Restart eclipse with clean option. Note : To restart Eclipse with clean option add ”-clean” to the first line in the file $ECLIPSE_HOME/eclipse.ini.
    4. Check the installation in “Window > Preferences> Ivy”. For detailed instruction see Apache's Any/Ivy installation instructions.

IvyDE documentation can be found at http://ant.apache.org/ivy/ivyde/index.html.

4. Change Management

As you make changes, note the ‘>’ markers by files and directories. These indicate that something has changed from the CVS version. No one else will see your changes until you check them into CVS. CVS keeps a history of all changes, so, as a general rule, any changes made in CVS are reversible. If you lose a file entirely, you can always retrieve the last-checked-in version from the CVS repository.

4.1 General guidelines for working with CVS:

  • Don’t let weeks go by without checking things in. That’s an invitation to accidents.
  • Don’t “break the build”. Never check in code that does not compile. Generally, you should not check in code that fails unit tests. Doing these things will interfere with the work of others.
    • If you find that this guideline conflicts with the previous one, you are probably trying to tackle too much at once. Restructure your work into pieces that can be tested and checked in separately.
    • Sometimes your may find yourself working on a set of changes that have far-reaching implications and would “break” a lot of other code if checked in a bit at a time. In those circumstances, consult with one of the more experienced team members about the possibility of setting up a project “branch” where you can check in changes without affecting the main stream of code until such time as you are ready to “merge” all the changes in your branch onto the main “trunk”.
  • Keep up to date with the changes being checked in by other project members. You should “update” from the CVS repository on a regular basis. If two of you are changing the same file, CVS will detect this as a “conflict” and Eclipse has nice facilities for resolving conflicts, but this is easiest to do if the conflict is caught early.
  • Nothing from the target or gen-src directories should be checked into CVS.
  • Normally, only source (stuff we produce via an editor) should be checked into CVS. Binaries (.class, .jar files or executables) do not fare will in CVS.

4.2 Eclipse and CVS

The CVS commands are accessible by right-clicking on a file or folder and looking under “Team”. The most important commands are:

  • update: check the CVS repository for changes checked in by other people and fetching the newer version if it exists. Although Eclipse allows you to run this command on files or folders, I strongly recommend you only use it on single files.
  • commit: check to see if your local copy of this file or folder is different from the ones last checked into CVS. If so, it checks them in so they become the new version for everyone else (once they do “update”). You will be prompted to supply a description of the changes. Although Eclipse allows you to run this command on files or folders, I strongly recommend you only use it on single files.
  • synchronize: This is the preferred way to do commits and updates across an entire folder or an entire (sub)project. The synchronize command scans the selected folder (including subfolders) noting all differences with the CVS repository. It then takes you to an interface that allows you to step from one changed file to another and even to single step from one change to the next within the changed files. You can then decide to “update” or “commit” changes as appropriate in each instance. (Right-click on files to get these options.)
    • Blue arrows indicate incoming changes that you can choose to update into your local copy of the file. Blue arrows with + or minus signs indicate new files that you do not even have a copy of, or files that have been marked as deleted (i.e., they are still in the repository and can be retrieved if you want, but are no longer considered necessary as part of the project).
    • Black arrows indicated changes that you have made that you can commit to the repository. Again, plus or minus signs in the arrows indicate new files that you have created but that do not exist, yet, in the CVS repository, or files that you deleted from your copy of the project, presumably because you think they are no longer needed. (Be careful – don’t delete files just because they are unrelated to your personal task and then commit those deletions into the repository.)
    • Red arrows indicate conflicts. You have changed the file, but someone else has checked in changes that you do not have. You will need to resolve these carefully (and may need to talk to the other person involved.)
    • Double-clicking on any of these files will bring up a display that shows you both your copy and the CVS version, allowing you to step from one change to the next. You can even choose to copy individual changes from the CVS version into your local copy.
    • One reason for recommending the use of “synchronize” rather than directly doing “commit” or “update” on entire directories is just to encourage people to look, file by file, at what they are about to do with the repository. Another reason will become clear very quickly – Eclipse creates a number of files with names staring with “.”, such as .project and .classpath, which should generally not be committed or updated because they contain machine-specific locations used only by Eclipse.
  • Closely related to the synchronize command are the “Compare” commands. You can compare your local copy of the file to the last one checked in (useful if you aren’t sure if you remember what changes you have made, or if you want to throw out some of those changes). If you are working in a CVS branch, you can also compare your files to the main trunk or to other branches.

4.3 Working With Branches:

CVS allows team members to create “branches” in which they can try out changes that would affect large portions of the system without affecting other team members.

Checking Out Code From Branches

When following the procedure for checking out a project from CVS via Eclipse, you are eventually presented with the option of which branch to check out. Normally, you would select HEAD, the main trunk of the development “tree”.

But if you want to check out a different branch, look for it under the “branches” item. Choose it instead, and then proceed with the checkout.

Very often, you will find that the branch you want is not actually listed. Sometimes the “Refresh Tags” button will add it to the branches list. If not, you should also see a “ConfigureTags” button. Press that. Then select one of the files checked in under the missing branch. Eclipse will look at that file and note what branches & tags it has available. You can then use “Add Checked Tags” to add those to your list. You may need to try a few different files to find one that was checked in under the missing branch.

Creating a New branch

Before doing this, you should probably make sure that you have checked in all the latest changes in your current working project.

In Eclipse, right-click on the project root directory and select Team→Branch…

Enter a name for your branch. Leave “Start working in the branch” checked. Click OK.

All the files under that root directory will be marked as belonging to a new branch. Note however that, until you change some of those files and check in the changes, all those files are really just defaulting back to the main truck (HEAD) or whatever other version youwere in when you created the branch.

From this point on, things you check in to your branch will not affect the main truck or other branches. Likewise, any new changes people check in on the main trunk will not be offered for update on your branch. (If there's an update that you decide you need, you can right-click on a file and select “Compare With”→“Another Branch of Version”, merge the desired changes into your file, and save it.

Merging a Branch Back Into the Main Trunk

  1. Make sure that your copy of the branch is up-to-date. Synchronize the branch with the reposstory and be sure you have checked out all changes on the branch and that all changes you have made o nthe branch have been checked in.
  2. If you want to allow for continued use of the branch after the merge, add a tag on the branch. Right-click on a project with an up-to-date copy of the branch, and select “Team→Tag as Version…”
  3. To begin the actual merge, you will start with an up-to-date copy of the main trunk (HEAD). You may already have one, or you can Right-click on the branc project, select “Team→Replace with→Branch or Version” and select HEAD. This will actually check out a copy of the HEAD, replacing code in the branch. That should not be a problem, since you made sure that you had checked in all changes in the 1st step. But it is a bit disconcerting, so I prefer to simply work from s separate project that is already checked out from the HEAD branch.
  4. You might consider placing a “tag” on the main branch (HEAD) to mark the spot on the branch just prior to the merge. that way, if you make a mistake when resolving conflicts (see below), it will be easy to get back to the current version. To place a tag, right-click on a project with an up-to-date copy of the HEAD, and select “Team→Tag as Version…”
  5. Right-click on the HEAD project and select “Team→Merge…”. You will be asked what branch you want to merge with. Select the appropriate branch to be merged into HEAD. You will be asked to supply the common base version from which the branch was split - usually Eclipse will guess this correctly. You can also decide whether to have CVS automatically make any non-conflicting changes or to let you preview them. (If there are any conflicts - files that have been updated in both the branch and the HEAD since the two were split - you will still have the opportunity to review these.)
  6. CVS will compare the checked-in copy of the branch against the files in your HEAD project. You should see the familiar Synchronization view, in which you can preview changes and resolve conflicts. The goal of this process is to leave you with a local copy of all the desired changes. This is not a commit of the merged changes to the HEAD branch.
  7. Having resolved a all changes, you now have a copy of the prject, associated with the HEAD branch in CVS, with a merged copy of code from the former HEAD and from the other branch. You can test it out, and then decide whether to commit the changes into the HEAD. Only after you do another (ordinary) synchronize and commit withh the merged code appear in the repository for others to check out.

5. Testing & Debugging

5.1 Unit Tests

Good unit tests are essential to the development process. We employ self-checking unit tests – people should not have to actually view output and determine for themselves if it is correct. At the end of this section are references describing a systematic process for writing self-checking tests for most abstract data types, for working with the JUnit testing package, and for applying JUnit to the systematic process.

Required Practices on the Extract Project

  • With the exception of GUI classes and a few other exceptional classes that, by their nature, do not lend themselves to self-checking, every new class should be accompanied by a corresponding unit test class.
  • Unit tests should be checked in together with the actual target class.

Recommended Practices on the Extract Project

  • When developing a new class as part of the project, the unit tests should be written together with, preferably before, the new target class itself. (This is one of the key practices rediscovered by the proponents of extreme programming, http://www.extremeprogramming.org/rules/testfirst.html).
  • When fixing a bug in a class, first add a unit test that exhibits the bug (by failing the test). Then start on the fix. That way you will
    1. know when you have succeeded in fixing the problem.
    2. have an easily accessible starting point for any debugging activities you need to perform while developing the fix.

Running Tests From Eclipse

It’s often convenient to run single tests or test suites directly via Eclipse rather than via Ant. This is particularly useful for debugging.

Make sure, though, that you have run unit tests via Ant at least once. Often, the ant script sets up files required by the tests, and the procedures described below, though convenient, will not know to do this. If you change data files used in the tests, you will need to run the Ant build again to copy the changes into the appropriate locations.

  1. Open src/test/java. Select a test class. Right click and look at Run As… (or Debug As…) You should have an option “Junit Test” in each case. Select that to run (or debug) that specific unit test suite.
  2. To run a whole batch of tests, go to the Run drop-down menu in the toolbar, select “Run…” Choose Run all tests in the selected … folder. Search for the folder if necessary. Give this run configuration a Name, and run it.

Unit Test References

Zeil, Testing ADTs in C++, http://cocoon.cs.odu.edu/cocoon/~cs330web/dbook/testingcpp/, (CS330 lecture, log in as “guest”)

JUnit programming team, JUnit Test Infected, http://junit.sourceforge.net/doc/testinfected/testing.htm

Zeil, Testing ADTs in Java, http://cocoon.cs.odu.edu/cocoon/~cs330web/dbook/testingjava/, (CS330 lecture, log in as “guest”)

Wolski & Hennebrueder, Eclipse Junit testing tutorial, http://www.laliluna.de/eclipse-junit-testing-tutorial.html

Williams, Ho, and Smith, Unit Testing in Eclipse Using JUnit, http://open.ncsu.edu/se/tutorials/junit/

5.2 System Tests

A system test is a test that is run on the entire program to see if it works or not.

Before running any system test, you should really think about why you are testing. If you have just made a change and want to see if that change works, you would probably do much better to develop one or more unit tests to demonstrate the effect of your change.

Certainly if you are debugging, you are better off finding (or creating) a unit test that illustrates the bug, then debugging from that unit test, as it is almost always easier to isolate the cause of a bug when dealing wit ha small fraction of the total program.

When you do want to conduct a system test, you can

  1. Install the system and run the main program (metadataextraction.bat)
    • This is very time consuming and offers minimal support for debugging.
  2. In Eclipse, launch edu.odu.cs.extract.control.ExtractGUI from src/main/java by right-clicking on that class, selecting “Run as…” (or “Debug as…”) “Java application”.
    • The program will be run using the settings for the special “test” collection.
  3. To run a system test or debug using a “real” collection, go to the src/test/java directory in the collection subproject, open the package edu.odu.cs.extract.debug, and look for a class named Debug_collection (e.g., Debug_dtic). Right-click on that class, “Run as…” (or “Debug as…”) “Java application”.
    • The program will be run using the settings for that particular collection.
  • The instructions above for running in Eclipse assume you have done a complete ant build of the subproject. This is necessary to create the various data files required for execution of the full system.
  • If this is the first time you have run or debugged this program, instead of doing “Java Application” from your “Run as…” (or “Debug as…”), select “Run Configurations” (or “Debug Configurations”) and, on the Execute tab, enter -Xmx768M in the VM Arguments box. You may also need to add ”-Djava.library.path=”../lib” on a separate line in that box.

5.3 Regression Tests

Most testing is done to determine if the system is working. Regression testing is somewhat different. Regression testing is performed to see whether the behavior of the system has changed. Regression tests combine inputs to the system with the outputs that a prior version of the system was able to produce. Most regression tests will be cases where the system produced correct input. We run these tests mainly to be sure that changes to the system have not broken things that used to work. It’s not unusual, though, for regression tests to include test cases where the system produced incorrect output. These tests allow us to see if future fixes actually correct known bad behaviors.

In most projects, the main difficulty with regression tests is getting people to actually run them. Some companies solve this problem by scheduling a daily batch run of all regression tests late at night. The next problem, in practice, is getting people to actually look at the results of the nightly regression run.

On the ODU Extract project, we have taken a slightly different tack towards regression testing. For a collection subproject, src/rtest contains one ore more “suites” of documents (PDF or IDM) to serve as regression inputs and the most recent expected value for that output. A regression test is built in to the system that, when activated, compares all metadata outputs against the current expected value. If a difference is detected, a window pops up showing the expected and actual outputs and describing the difference found. The programmer can elect to ignore the difference for now or may instead elect to save it, in which case it becomes the expected value for future regression tests.

Important: The expected outputs are stored as part of the src/rtest directory structure and is considered “source code” for future tests. As such, these outputs will be under CVS source control. This is deliberate - it means that changes in the regression suite (inputs and/or outputs) can be automatically communicated to other project members and that evolutionary changes in expected outputs over time can be examined via the CVS version history.

There are two ways to activate the regression test.

  1. Within the src/test/java code, a debug package is always provided to launch the main Extract program within a debugging environment (Eclipse). One of the launch points provided is the class Regression_collection, which activates regression checking. When using this approach, the programmer should take care to load only input files from the regression test suites.
  2. When running ant with the collection build.xml, use the “regression” target. This runs the main program with regression testing active on each input in the regression suites.
extract/devguide/developers_guide.txt · Last modified: 2011/07/13 09:46 by zeil
www.chimeric.de Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0