Configuring and Assembling Information Retrieval based Solutions for Software Engineering Tasks - Online Appendix

This web page is a companion to Bogdan Dit's Dissertation entitled "Configuring and Assembling Information Retrieval based Solutions for Software Engineering Tasks"

B. Dit, Configuring and Assembling Information Retrieval based Solutions for Software Engineering Tasks, Doctoral, Computer Science Department, The College of William and Mary, Williamsburg, VA, USA, 2015.

Chapter 2 Preprocessing Techniques – Splitting Identifiers

The material from Chapter 2 was originally published in the proceedings of the 19th IEEE International Conference on Program Comprehension (ICPC 2011)

Dit, B., Guerrouj, L., Poshyvanyk, D., and Antoniol, G., "Can Better Identifier Splitting Techniques Help Feature Location?", in Proc. of 19th IEEE International Conference on Program Comprehension (ICPC'11), Kingston, Ontario, Canada, June 22 - June 24 2011, pp. 11-20 (24% acceptance rate)

Results

The spreadsheet EffectivenessRhinojEdit.xls contains the effectiveness measure of the two feature location techniques (i.e., IR and IRDyn) using the three splitting algorithms: CamelCase, Samurai and Oracle. The spreadsheet also contains information about the effectiveness measure for the four datasets (i.e., RhinoFeatures, RhinoBugs, jEditFeatures and jEditBugs). The spreadsheet's worksheets are color coded as follows:

The yellow worksheets display the box plots (see Figure 1 and Figure 2)
The red worksheets show the effectiveness measures of the of FLT from the column for the feature/bug from the right
The blue worksheets contain the data for the percentages of times the effectiveness of the FLT from the row is higher than the effectiveness of the FLT from the column (see Table 4 and Table 5)
The green worksheets contain the p-values of the Wilcoxon signed-rank test (see Table 6 and Table 7)

Participants

Bogdan Dit, The College of William and Mary
E-mail: bdit at cs dot wm dot edu
Latifa Guerrouj, École Polytechnique de Montréal (now at McGill University)
E-mail: latifa dot guerrouj at polymtl dot ca
Denys Poshyvanyk, The College of William and Mary
E-mail: denys at cs dot wm dot edu
Giuliano Antoniol, École Polytechnique de Montréal
E-mail: giuliano dot antoniol at polymtl dot ca

Chapter 3 and Chapter 4

Configuring Latent Dirichlet Allocation: LDA-GA and Configuring and Assembling IR Techniques: IR-GA

The material from Chapter 3 was originally published in the proceedings of the 35th IEEE/ACM International Conference on Software Engineering (ICSE'13) and in the proceedings of the 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE'13)

Panichella, A., Dit, B., Oliveto, R., Di Penta, M., Poshyvanyk, D., and De Lucia, A., "How to Effectively Use Topic Models for Software Engineering Tasks? An Approach based on Genetic Algorithms", in Proceedings of 35th IEEE/ACM International Conference on Software Engineering (ICSE'13), San Francisco, CA, May 18-26, 2013, pp. 522-531 (18.5% acceptance rate)

Dit, B., Panichella, A., Moritz, E., Oliveto, R., Di Penta, M., Poshyvanyk, D., and De Lucia, A., "Configuring Topic Models for Software Engineering Tasks in TraceLab", in Proceedings of 7th ICSE'13 International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE'13), San Francisco, California, May 19, 2013, 105-109

Chapter 3

Object systems

Raw Data

Chapter 4

Object systems

The preprocessed corpora can be downloaded from the following links:

Raw Data

Raw data and results can be download from the following link rawdata.rar.

Participants

Annibale Panichella, University of Salerno, Italy (now at Delft University of Technology)
Bogdan Dit, The College of William and Mary
Rocco Oliveto, University of Molise, Italy
Massimiliano Di Penta, University of Sannio, Italy
Denys Poshyvanyk, The College of William and Mary
Andrea De Lucia, University of Salerno, Italy
Evan Moritz, The College of William and Mary

Chapter 5 Preprocessing Techniques – Splitting Identifiers

The material from Chapter 5 was originally published in the proceedings of the 29th IEEE International Conference on Software Maintenance (ICSM'13) and Empirical Software Engineering

Dit, B., Moritz, E., Linares-Vásquez, M., and Poshyvanyk, D., "Supporting and Accelerating Reproducible Research in Software Maintenance using TraceLab Component Library", in Proceedings of 29th IEEE International Conference on Software Maintenance (ICSM'13), Eindhoven, the Netherlands, September 22-28, 2013, pp. 330-339 (22% acceptance rate) - Best Paper Award

Dit, B., Moritz, E., Linares-Vasquez, M., Poshyvanyk, D., and Cleland-Huang, J. "Supporting and Accelerating Reproducible Empirical Research in Software Evolution and Maintenance using TraceLab Component Library", Empirical Software Engineering (EMSE), accepted, pp. to appear

Installing TraceLab

TraceLab can be downloaded from the TraceLab download page on the CoEST website. Instructions about details for installation can be found here. If needed, you may be required to create a free account in order to download TraceLab and your TraceLab key file. Follow the instructions of the installer, then download your unique TraceLab key and place it in your [USER_FOLDER]/Documents/TraceLab directory.

These experiments require the TraceLab Component Library which can be downloaded and unzipped from the files below. Once downloaded, copy the DLLs in the Components directory to your TraceLab components directory (typically [USER_FOLDER]/Documents/TraceLab/Components). Do the same for the DLLs in the Types directory, copying them to your TraceLab types directory.

Additionally, some experiments require the TraceLab RPlugin components. Download the package from this page. Once downloaded, double click the package file to automatically install it in TraceLab.

File	Description
TraceLab	TraceLab installation file (external link, requires registration)
Component Library	Component Library and Component Development Kit (built under TraceLab 0.5.1.0)
Mapping Study table	Complete paper-by-component table results of the mapping study
Datasets and experiments	Collection of data and TraceLab experiments containing the motivating example, new ideas in feature location, and reproduced approaches from the mapping study.

How to Run the LDA-GA experiment in TraceLab

Open the experiment in TraceLab and specify the settings for the experiment and datasets.

Data

Open the info pane on the "Source Artifacts" component and set the configuration to the source artifacts directory of the dataset.
Open the info pane on the "Target Artifacts" component and set the configuration to the target artifacts directory of the dataset.
Open the info pane on the "Oracle" component and set the configuration to the oracle file of the dataset.

Dependencies

Open the info pane on the "LDA-GA Configuration" component and set the "RScript executable" configuration to the location of RScript.exe on your computer. This is usually C:\Program Files\R\R-X.XX.X\bin\RScript.exe. A script will attempt to install any R libraries you are missing - this will require your permission.
Repeat for the "Configured LDA" component.
Repeat for the "Baseline LDA" component.

Participants

Bogdan Dit, The College of William and Mary
Evan Moritz, The College of William and Mary
Mario Linares Vásquez, The College of William and Mary
Denys Poshyvanyk, The College of William and Mary
Jane Cleland-Huang, DePaul University

Appendix A

Generating Benchmarks for Feature Location

The material from Appendix A was originally published in the proceedings of the 10th IEEE Working Conference on Mining Software Repositories (MSR'13)

Dit, B., Holtzhauer, A., Poshyvanyk, D., and Kagdi, H., "A Dataset from Change History to Support Evaluation of Software Maintenance Tasks", in Proceedings of 10th Working Conference on Mining Software Repositories (MSR'13), Data Track, San Francisco, CA, 2013, pp. 131-134 (55.6% acceptance ratio)

Datasets

Dataset (size)	Source code URL [Webpage]	Period	Issues [URL to Issue Tracking System]	Trace Type (Format)	Number of Gold Set Methods
ArgoUML0.22 (462 MB)	Source Code [ArgoUML]	0.20-0.22	74 Defects 10 Enhancements 2 Features 5 Patches (91 Total) [URL Issues]	Full (TPTP)	701
ArgoUML0.24 (206 MB)	Source Code [ArgoUML]	0.22-0.24	32 Defects 4 Enhancements 15 Patches 1 Task (52 Total) [URL Issues]	Full (TPTP)	357
ArgoUML0.26.2 (921 MB)	Source Code [ArgoUML]	0.24-0.26.2	181 Defects 19 Enhancements 2 Features 4 Patches 3 Task (209 Total) [URL Issues]	Full (TPTP)	1,560
JabRef2.6 (22 MB)	Source Code [JabRef]	2.0-2.6	36 Defects 3 Features (39 Total) [URL Issues]	Full (TPTP)	280
jEdit4.3 (34 MB)	Source Code [jEdit]	4.2-4.3	86 Bugs 34 Features 30 Patches (150 Total) [URL Issues]	Marked (JPDA)	748
muCommander0.8.5 (278 MB)	Source Code [muCommander]	0.8.0-0.8.5	81 Defects 11 Enhancements (92 Total) [URL Issues]	Full (TPTP)	717

Tools

In Eclipse, click "File->Import...". Under "General", select "Existing Projects into Workspace" and click next. Choose "Select archive file" and point to the EclipseProjects.zip (34MB) archive file which contains all the Eclipse Projects. Select the ones you want to include in your workspace, then click Finish. In each of these Eclipse projects, the main class contains "Main" in its name.

Data Format Details: Traces Format

The format of TPTP traces is in XML format and it is pretty self explanatory.

The format of a JPDA trace is as following:


thread name  Number of pipes ("|") denote call stack depth methodName  --  ClassNameWithFullPath$InnerClass

Example:


main:0:| 5:2  processOptions  --  org.mozilla.javascript.tools.shell.Main
main:0:| 5:2  init  --  org.mozilla.javascript.tools.shell.Global
main:0:| | 5:2  <init>  --  org.mozilla.javascript.tools.shell.Global$1
main:0:| | 5:2  call  --  org.mozilla.javascript.ContextFactory
main:0:| | 5:2  call  --  org.mozilla.javascript.ContextFactory
main:0:| | 5:2  <init>  --  org.mozilla.javascript.ScriptableObject$Slot
main:0:| | | 5:2  <clinit>  --  org.mozilla.javascript.Context
main:0:| | | | 5:2  <clinit>  --  org.mozilla.javascript.ScriptRuntime
main:0:| | | | | 5:2  classOrNull  --  org.mozilla.javascript.Kit

Remarks

$1 denotes an anonymous class
<init> is the class constructor, and should be replaced with the actual name of the class (e.g., from org.mozilla.javascript.tools.shell.Global.<init> to org.mozilla.javascript.tools.shell.Global.Global)
<clinit> is for static block or class initialization (can be discarded)
the trace does not capture the signature of the methods

Participants

Bogdan Dit, The College of William and Mary
Andrew Holtzhauer, The College of William and Mary
Denys Poshyvanyk, The College of William and Mary
Huzefa Kagdi, Wichita State University

We gratefully acknowledge financial support from the NSF on this research project.

Software Engineering Maintenance and Evolution Research Unit

at the College of William and Mary

Configuring and Assembling Information Retrieval based Solutions for Software Engineering Tasks - Online Appendix

Chapter 2

Preprocessing Techniques – Splitting Identifiers

Results

Participants

Chapter 3 and Chapter 4

Configuring Latent Dirichlet Allocation: LDA-GA and Configuring and Assembling IR Techniques: IR-GA

Chapter 3

Object systems

Raw Data

Chapter 4

Object systems

Raw Data

Participants

Chapter 5

Preprocessing Techniques – Splitting Identifiers

Installing TraceLab

How to Run the LDA-GA experiment in TraceLab

Participants

Appendix A

Generating Benchmarks for Feature Location

Datasets

Tools

Data Format Details: Traces Format

Participants