Generating Benchmarks from Change History Data to Support Evaluation of Software Maintenance Tasks - Online Appendix

This web page is a companion to our 10th Working Conference on Mining Software Repositories (MSR 2013) submission entitled "Generating Benchmarks from Change History Data to Support Evaluation of Software Maintenance Tasks" [pdf][slides]

This webpage superseeds our original datasets from http://www.cs.wm.edu/semeru/data/benchmarks/. It contains two new datasets for ArgoUML0.24 and ArgoUML0.26.2, as well as a suite of Java tools used to generate these benchmarks, and two Matlab scripts that use VSM and LSI to compute the similarities between a query and the methods of a system (i.e., the corpus).

Datasets

Dataset
(size)
Source code URL
[Webpage]
Period Issues
[URL to Issue Tracking System]
Trace Type (Format) Number of Gold Set Methods
ArgoUML0.22
(462 MB)
Source Code
[ArgoUML]
0.20-0.22 74 Defects
10 Enhancements
2 Features
5 Patches
(91 Total)
[URL Issues]
Full
(TPTP)
701
ArgoUML0.24
(206 MB)
Source Code
[ArgoUML]
0.22-0.24 32 Defects
4 Enhancements
15 Patches
1 Task
(52 Total)
[URL Issues]
Full
(TPTP)
357
ArgoUML0.26.2
(921 MB)
Source Code
[ArgoUML]
0.24-0.26.2 181 Defects
19 Enhancements
2 Features
4 Patches
3 Task
(209 Total)
[URL Issues]
Full
(TPTP)
1,560
JabRef2.6
(22 MB)
Source Code
[JabRef]
2.0-2.6 36 Defects
3 Features
(39 Total)
[URL Issues]
Full
(TPTP)
280
jEdit4.3
(34 MB)
Source Code
[jEdit]
4.2-4.3 86 Bugs
34 Features
30 Patches
(150 Total)
[URL Issues]
Marked
(JPDA)
748
muCommander0.8.5
(278 MB)
Source Code
[muCommander]
0.8.0-0.8.5 81 Defects
11 Enhancements
(92 Total)
[URL Issues]
Full
(TPTP)
717

Tools

In Eclipse, click "File->Import...". Under "General", select "Existing Projects into Workspace" and click next. Choose "Select archive file" and point to the EclipseProjects.zip (34MB) archive file which contains all the Eclipse Projects. Select the ones you want to include in your workspace, then click Finish. In each of these Eclipse projects, the main class contains "Main" in its name.

Data Format Details

Traces Format

The format of TPTP traces is in XML format and it is pretty self explanatory.

The format of a JPDA trace is as following:


thread name  Number of pipes ("|") denote call stack depth methodName  --  ClassNameWithFullPath$InnerClass

Example:


main:0:| 5:2  processOptions  --  org.mozilla.javascript.tools.shell.Main
main:0:| 5:2  init  --  org.mozilla.javascript.tools.shell.Global
main:0:| | 5:2  <init>  --  org.mozilla.javascript.tools.shell.Global$1
main:0:| | 5:2  call  --  org.mozilla.javascript.ContextFactory
main:0:| | 5:2  call  --  org.mozilla.javascript.ContextFactory
main:0:| | 5:2  <init>  --  org.mozilla.javascript.ScriptableObject$Slot
main:0:| | | 5:2  <clinit>  --  org.mozilla.javascript.Context
main:0:| | | | 5:2  <clinit>  --  org.mozilla.javascript.ScriptRuntime
main:0:| | | | | 5:2  classOrNull  --  org.mozilla.javascript.Kit

Remarks

  • $1 denotes an anonymous class
  • <init> is the class constructor, and should be replaced with the actual name of the class (e.g., from org.mozilla.javascript.tools.shell.Global.<init> to org.mozilla.javascript.tools.shell.Global.Global)
  • <clinit> is for static block or class initialization (can be discarded)
  • the trace does not capture the signature of the methods

Participants


We gratefully acknowledge financial support from the NSF on this research project.