Configuring and Assembling Information Retrieval based Solutions for Software Engineering Tasks - Online Appendix
This web page is a companion to Bogdan Dit's Dissertation entitled "Configuring and Assembling Information Retrieval based Solutions for Software Engineering Tasks"
B. Dit, Configuring and Assembling Information Retrieval based Solutions for Software Engineering Tasks, Doctoral, Computer Science Department, The College of William and Mary, Williamsburg, VA, USA, 2015.
Chapter 2
Preprocessing Techniques – Splitting Identifiers
The material from Chapter 2 was originally published in the proceedings of the 19th IEEE International Conference on Program Comprehension (ICPC 2011)
Dit, B., Guerrouj, L., Poshyvanyk, D., and Antoniol, G., "Can Better Identifier Splitting Techniques Help Feature Location?", in Proc. of 19th IEEE International Conference on Program Comprehension (ICPC'11), Kingston, Ontario, Canada, June 22 - June 24 2011, pp. 11-20 (24% acceptance rate)
Results
The spreadsheet EffectivenessRhinojEdit.xls contains the effectiveness measure of the two feature location techniques (i.e., IR and IRDyn) using the three splitting algorithms: CamelCase, Samurai and Oracle. The spreadsheet also contains information about the effectiveness measure for the four datasets (i.e., RhinoFeatures, RhinoBugs, jEditFeatures and jEditBugs). The spreadsheet's worksheets are color coded as follows:
-
The yellow worksheets display the box plots (see Figure 1 and Figure 2)
-
The red worksheets show the effectiveness measures of the of FLT from the column for the feature/bug from the right
-
The blue worksheets contain the data for the percentages of times the effectiveness of the FLT from the row is higher than the effectiveness of the FLT from the column (see Table 4 and Table 5)
-
The green worksheets contain the p-values of the Wilcoxon signed-rank test (see Table 6 and Table 7)
Participants
-
Bogdan Dit, The College of William and Mary
E-mail: bdit at cs dot wm dot edu
-
Latifa Guerrouj, École Polytechnique de Montréal (now at McGill University)
E-mail: latifa dot guerrouj at polymtl dot ca
-
Denys Poshyvanyk, The College of William and Mary
E-mail: denys at cs dot wm dot edu
-
Giuliano Antoniol, École Polytechnique de Montréal
E-mail: giuliano dot antoniol at polymtl dot ca
Chapter 3 and Chapter 4
Configuring Latent Dirichlet Allocation: LDA-GA and Configuring and Assembling IR Techniques: IR-GA
The material from Chapter 3 was originally published in the proceedings of the 35th IEEE/ACM International Conference on Software Engineering (ICSE'13) and in the proceedings of the 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE'13)
Panichella, A., Dit, B., Oliveto, R., Di Penta, M., Poshyvanyk, D., and De Lucia, A., "How to Effectively Use Topic Models for Software Engineering Tasks? An Approach based on Genetic Algorithms", in Proceedings of 35th IEEE/ACM International Conference on Software Engineering (ICSE'13), San Francisco, CA, May 18-26, 2013, pp. 522-531 (18.5% acceptance rate)
Dit, B., Panichella, A., Moritz, E., Oliveto, R., Di Penta, M., Poshyvanyk, D., and De Lucia, A., "Configuring Topic Models for Software Engineering Tasks in TraceLab", in Proceedings of 7th ICSE'13 International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE'13), San Francisco, California, May 19, 2013, 105-109
Chapter 3
Object systems
- Used in the "Feature location" experiment
- Used in the "Labeling" experiment
- Used in the "Traceability recovery" experiment
Raw Data
Chapter 4
Object systems
The preprocessed corpora can be downloaded from the following links:
- EasyClinicITA (used in the Traceability Link Recovery experiment)
- eTourITA (used in the Traceability Link Recovery experiment)
- iTrustENG (used in the Traceability Link Recovery experiment)
- JabRef2.6 (used in the Feature Location experiment)
- jEdit4.3 (used in the Feature Location experiment)
- Eclipse3.0 (used in the Identification of Duplicate Bug Reports experiment)
Raw Data
Raw data and results can be download from the following link rawdata.rar.Participants
- Annibale Panichella, University of Salerno, Italy (now at Delft University of Technology)
- Bogdan Dit, The College of William and Mary
- Rocco Oliveto, University of Molise, Italy
- Massimiliano Di Penta, University of Sannio, Italy
- Denys Poshyvanyk, The College of William and Mary
- Andrea De Lucia, University of Salerno, Italy
- Evan Moritz, The College of William and Mary
Chapter 5
Preprocessing Techniques – Splitting Identifiers
The material from Chapter 5 was originally published in the proceedings of the 29th IEEE International Conference on Software Maintenance (ICSM'13) and Empirical Software Engineering
Dit, B., Moritz, E., Linares-Vásquez, M., and Poshyvanyk, D., "Supporting and Accelerating Reproducible Research in Software Maintenance using TraceLab Component Library", in Proceedings of 29th IEEE International Conference on Software Maintenance (ICSM'13), Eindhoven, the Netherlands, September 22-28, 2013, pp. 330-339 (22% acceptance rate) - Best Paper Award
Dit, B., Moritz, E., Linares-Vasquez, M., Poshyvanyk, D., and Cleland-Huang, J. "Supporting and Accelerating Reproducible Empirical Research in Software Evolution and Maintenance using TraceLab Component Library", Empirical Software Engineering (EMSE), accepted, pp. to appear
Installing TraceLab
TraceLab can be downloaded from the TraceLab download page on the CoEST website. Instructions about details for installation can be found here. If needed, you may be required to create a free account in order to download TraceLab and your TraceLab key file. Follow the instructions of the installer, then download your unique TraceLab key and place it in your [USER_FOLDER]/Documents/TraceLab directory.
These experiments require the TraceLab Component Library which can be downloaded and unzipped from the files below. Once downloaded, copy the DLLs in the Components directory to your TraceLab components directory (typically [USER_FOLDER]/Documents/TraceLab/Components). Do the same for the DLLs in the Types directory, copying them to your TraceLab types directory.
Additionally, some experiments require the TraceLab RPlugin components. Download the package from this page. Once downloaded, double click the package file to automatically install it in TraceLab.
File | Description |
---|---|
TraceLab | TraceLab installation file (external link, requires registration) |
Component Library | Component Library and Component Development Kit (built under TraceLab 0.5.1.0) |
Mapping Study table | Complete paper-by-component table results of the mapping study |
Datasets and experiments | Collection of data and TraceLab experiments containing the motivating example, new ideas in feature location, and reproduced approaches from the mapping study. |
How to Run the LDA-GA experiment in TraceLab
Open the experiment in TraceLab and specify the settings for the experiment and datasets.- Data
- Open the info pane on the "Source Artifacts" component and set the configuration to the source artifacts directory of the dataset.
- Open the info pane on the "Target Artifacts" component and set the configuration to the target artifacts directory of the dataset.
- Open the info pane on the "Oracle" component and set the configuration to the oracle file of the dataset.
- Dependencies
- Open the info pane on the "LDA-GA Configuration" component and set the "RScript executable" configuration to the location of RScript.exe on your computer. This is usually C:\Program Files\R\R-X.XX.X\bin\RScript.exe. A script will attempt to install any R libraries you are missing - this will require your permission.
- Repeat for the "Configured LDA" component.
- Repeat for the "Baseline LDA" component.
Participants
- Bogdan Dit, The College of William and Mary
- Evan Moritz, The College of William and Mary
- Mario Linares Vásquez, The College of William and Mary
- Denys Poshyvanyk, The College of William and Mary
- Jane Cleland-Huang, DePaul University
Appendix A
Generating Benchmarks for Feature Location
The material from Appendix A was originally published in the proceedings of the 10th IEEE Working Conference on Mining Software Repositories (MSR'13)
Dit, B., Holtzhauer, A., Poshyvanyk, D., and Kagdi, H., "A Dataset from Change History to Support Evaluation of Software Maintenance Tasks", in Proceedings of 10th Working Conference on Mining Software Repositories (MSR'13), Data Track, San Francisco, CA, 2013, pp. 131-134 (55.6% acceptance ratio)
Datasets
Dataset (size) |
Source code URL [Webpage] |
Period | Issues [URL to Issue Tracking System] |
Trace Type (Format) | Number of Gold Set Methods |
---|---|---|---|---|---|
ArgoUML0.22 (462 MB) |
Source Code [ArgoUML] |
0.20-0.22 | 74 Defects 10 Enhancements 2 Features 5 Patches (91 Total) [URL Issues] |
Full (TPTP) |
701 |
ArgoUML0.24 (206 MB) |
Source Code [ArgoUML] |
0.22-0.24 | 32 Defects 4 Enhancements 15 Patches 1 Task (52 Total) [URL Issues] |
Full (TPTP) |
357 |
ArgoUML0.26.2 (921 MB) |
Source Code [ArgoUML] |
0.24-0.26.2 | 181 Defects 19 Enhancements 2 Features 4 Patches 3 Task (209 Total) [URL Issues] |
Full (TPTP) |
1,560 |
JabRef2.6 (22 MB) |
Source Code [JabRef] |
2.0-2.6 | 36 Defects 3 Features (39 Total) [URL Issues] |
Full (TPTP) |
280 |
jEdit4.3 (34 MB) |
Source Code [jEdit] |
4.2-4.3 | 86 Bugs 34 Features 30 Patches (150 Total) [URL Issues] |
Marked (JPDA) |
748 |
muCommander0.8.5 (278 MB) |
Source Code [muCommander] |
0.8.0-0.8.5 | 81 Defects 11 Enhancements (92 Total) [URL Issues] |
Full (TPTP) |
717 |
Tools
In Eclipse, click "File->Import...". Under "General", select "Existing Projects into Workspace" and click next. Choose "Select archive file" and point to the EclipseProjects.zip (34MB) archive file which contains all the Eclipse Projects. Select the ones you want to include in your workspace, then click Finish. In each of these Eclipse projects, the main class contains "Main" in its name.
Data Format Details: Traces Format
The format of TPTP traces is in XML format and it is pretty self explanatory.
The format of a JPDA trace is as following:
thread name Number of pipes ("|") denote call stack depth methodName -- ClassNameWithFullPath$InnerClass
Example:
main:0:| 5:2 processOptions -- org.mozilla.javascript.tools.shell.Main main:0:| 5:2 init -- org.mozilla.javascript.tools.shell.Global main:0:| | 5:2 <init> -- org.mozilla.javascript.tools.shell.Global$1 main:0:| | 5:2 call -- org.mozilla.javascript.ContextFactory main:0:| | 5:2 call -- org.mozilla.javascript.ContextFactory main:0:| | 5:2 <init> -- org.mozilla.javascript.ScriptableObject$Slot main:0:| | | 5:2 <clinit> -- org.mozilla.javascript.Context main:0:| | | | 5:2 <clinit> -- org.mozilla.javascript.ScriptRuntime main:0:| | | | | 5:2 classOrNull -- org.mozilla.javascript.Kit
Remarks
- $1 denotes an anonymous class
- <init> is the class constructor, and should be replaced with the actual name of the class (e.g., from org.mozilla.javascript.tools.shell.Global.<init> to org.mozilla.javascript.tools.shell.Global.Global)
- <clinit> is for static block or class initialization (can be discarded)
- the trace does not capture the signature of the methods
Participants
- Bogdan Dit, The College of William and Mary
- Andrew Holtzhauer, The College of William and Mary
- Denys Poshyvanyk, The College of William and Mary
- Huzefa Kagdi, Wichita State University
We gratefully acknowledge financial support from the NSF on this research project.