Authors: Sam Ratcliff, David R. White and John A. Clark
This paper investigated the use of GP to generate invariants for existing source code. We used a set of examples taken from Gries’ “Science of Programming” and a few from online programming repositories. We used the ECJ Toolkit. Please note that Version 19 of the toolkit was used. We then used muJava to create mutants of the original program, which were used to create a heuristic estimating how “interesting” a particular invariant was.
This webpage contains online resources designed to enable other researchers to recreate and build upon our work.
We actively encourage feedback, debate, suggestions and constructive criticism: please feel free to contact us.
The code written for this work is not very general. It assumes that you are using Linux, and existing file structures and constants are hard-coded. We make no apologies for this – it is prototypical code. It is probably best to follow our instructions exactly until you are clear as to how the system works. Alterations can subsequently be made. If you improve the code, please let us know! The source code is a mess: there are lots of extra files, and some classes are very poorly documented.
We recommend running these experiments on a machine with at least 4GB RAM, due to the unnecessarily inefficient way data is passed from Daikon’s trace output to the evolutionary search. This results in significant slowdowns in running the experiments, which are separate from the actual time it takes to search the invariant space. We do apologise for this poor design.
Download the Archive
This archive contains the directory structure and files required to repeat our work.
If you’re interested in our work, please contact us so we can keep track of who is working in this area, and whether it is worthwhile spending some time cleaning up and documenting the source code properly, as well as making the tool more general.
File System Structure
Decompressing the archive should create a folder structure as follows, where “exp_root” is the containing directory:
You will need to add the Daikon and ECJ jar files. Hence lib should contain the following:
- daikon.jar – version 4.6.4 of Daikon.
- ecj-19.jar – version 19 of the ECJ toolkit.
- invariants.jar – the jar file included with the archive.
- openjava2005.jar – OpenJava, required (and supplied) by muJava.
You’ll also need to locate tools.jar for your JDK, so that you can add it to your classpath in the script used in Experiment C.
The contents of each experiment directory are detailed in the relevant sections below.
The work consisted of three experiments: A, B and C. The details of how to run each experiment follow.
1. Create the Program Traces
First, traces must be created for each example program. The programs can be found in exp_root/experiment_a/examples.
Change to the experiment_a directory and run “create_traces”. This will do some house-keeping and place the resulting traces in the exp_root/experiment_a/traces directory.
For each example, a display will show the progress of creating the 20 traces. This may take some time, and the traces produced are large. Note that the master seed used in the actual experimentation is still present in the script (taken from random.org).
2. Run Daikon
Generate Daikon’s invariants for comparison to the output of search.
Run “run_daikon” to do this. The results can be found in daikon_invariants.
Again, progress will be reported as this runs.
3. Run Chicory Translate
Run “chicory_translate”, which will translate the trace files from the Chicory front-end into ECJ parameter files. This is a slow process, as it is implemented inefficiently and dealing with large data. It was a poor design decision to require this translation, but for historical reasons it remains. To be removed in the future.
The output will be placed in the exp_root/experiment_a/ecj_param directory, containing parameter files for each program point in the traces of the example programs.
4. Run ECJ
Run “run_ecj” to actually execute the evolutionary search using the parameter files just generated. This script was originally designed to be executed on the Sun Grid Engine (SGE). To execute it on a single PC, ensure the line “SGE_TASK_ID=$1” is uncommented, and then supply an integer as a command line argument to the script. You’ll need to run the script with integers 1..10.
Again, due to the very inefficient way that the data is passed around, there may be significant delays. Whilst the evolutionary search takes only seconds, parsing the data from the parameter file brings a large overhead in space and time. This is a legacy issue. As a result, you’ll need at least 4GB of RAM to be sure of completing all runs of ECJ successfully.
5. Compare Daikon to ECJ Results
This required some manual work, due to the different output formats from the two tools. We took the output in daikon_invariants and manually wrote the files in the “expected” directory. We translated the invariants from Daikon format to a format comparable to the output of evolutionary search. These can be checked by hand, or you can assume we have done this correctly! Within the “expected” files, the format is simply one invariant per line, and in most cases we have also listed simple syntactic variants of the same invariants.
In order to compare the expected invariants, those found by Daikon, to the invariants found using search, run “check_daikon”. This uses the “check_against_daikon.py” script. As well as the commandline summary, a useful CSV format listing is produced named “expa.csv”. This can be used to recreate Tables 4 and 5 in the paper.
6. Examining invariant growth versus Computation Effort
To recreate Figure 3 from the paper, which examines the number of invariants found using search as the computational effort is increased, first run “chicory_growth” to create the ECJ input files. Then run “ecj_growth” to run the experiments. Again, this was written to be run on the Sun Grid Engine, so if you’re not using the SGE you’ll need to ensure SGE_TASK_ID=$1 is uncommented, and then run the file with integer arguments 1-10. The level of RAM your platform has will probably prove even more crucial here.
The output files are placed in ecj_invariants_growth, and wc -l can be used to produce the results given in Figure 3 of the paper.
In Experiment B, we attempt to find the invariants written by Gries and not those produced by Daikon.
1. Generate ECJ parameter files
First, run “chicory_translate” to produce the ECJ input files specific for the Gries invariants. This will automatically add in the functions from the bottom of Table 2 in the paper, those functions relevant to the Gries problems only. These will be placed in the ecj_param directory. Again, this could take some time.
2. Run ECJ
Run “run_ecj X” with arguments from 1-10 to execute the search with the extended function set, as well as larger population and generation sizes (of 250).
3. Compare ECJ Output to Gries
The invariants specified by Gries have been manually translated to a convenient format and placed in the “expected” directory. This allows a similar check to Experiment A, using python and bash scripts. Run “./check_gries” to produce human-readable stdout information, and a file “expb.csv” with the data in CSV format. These results are summarised in Table 6 of the paper.
In Experiment C, we re-run the 7th repetition from Experiment B (as it was the most successful in finding Gries invariants), and then attempt to order the invariants by interest using a “mutant fragility test”.
At the moment, we are unable to provide a simple way to recreate the mutants we used, as we took advantage of the muJava source code kindly provided by Jeff Offut. Instead we provide the mutants, and if you wish to download muJava and generate them, you are welcome to do so. The process is transparent – you can see the generated mutants in mujava/result.
Create Mutant Traces
Run “create_mutant_traces” to generate trace files for each mutant. The mnemonics represent muJava mutant types. The progress through each mutant is recorded in parentheses, so you can gauge the progress through each set of mutants for each program point.
Create ECJ parameter files
Run “chicory_translate” to create input files for ECJ.
Run “run_ecj” to repeat the 7th repetition from Experiment B. At the end of the run, a mutant fragility test will be performed on the invariants before the results files are written.
The results were analysed by opening the invariant CSV files in ecj_invariants, and sorting the list by the mutation score. Finding a particular invariant that has not been easily highlighted by the fragility test is a little tricky, the simplest method being to use the stdout output from Experiment B’s “check_gries” script to locate where a match was found, such that the particular syntactic form of the invariant is identified. The CSV files can then be searched for that string.