Integrating Information Retrieval, Execution and Link Analysis Algorithms to Improve Feature Location in Software - Online Appendix

This web page is a companion to our Empirical Software Engineering submission entitled "Integrating Information Retrieval, Execution and Link Analysis Algorithms to Improve Feature Location in Software"

Dit, B., Revelle, M., and Poshyvanyk, D., "Integrating Information Retrieval, Execution and Link Analysis Algorithms to Improve Feature Location in Software", Empirical Software Engineering (EMSE), accepted [site]

Data

	Eclipse 3.0	Rhino	jEdit 4.3
Corpora	Corpus-Eclipse3.0.zip	Corpus-Rhino.zip	Corpus-jEdit4.3.zip
Features	Eclipse 3.0 features	Rhino features	jEdit 4.3 features
Queries	Queries-Eclipse3.0.zip	Queries-Rhino.zip	Queries-jEdit4.3.zip

Download all the Eclipse 3.0 data (8.37MB), Rhino data (1.42MB) and jEdit 4.3 data (0.8MB).

Results

Download the data used to compute the effectiveness measure for the feature location techniques that combine information retrieval, dynamic information and web mining (IR+Dyn+WebMining), as well as information retrieval, static information and web mining (IR+Static+WebMining), when we filter top x methods and bottom x methods (x=0%, 10%, …, 100%).
EclipseIRDynStaticAllTopAndBottomFilters.xls
RhinoIRDynStaticAllTopAndBottomFilters.xls
jEditIRDynAllTopAndBottomFilters.xls - (No IR+Static+WebMining)

Notes:
- The numbers in the red worksheets represent the ranks of the methods in the gold set, after filtering X% of methods (X is denoted by the column header). Each red tab (that contains the raw data) has a corresponding tab which visualizes the data using box plots.
- Where the word "Binary" is not explicitly stated in the worksheets names, we refer to using the frequency weights (as opposed to the binary weights).
- "Dyn" stands for dynamic data extracted from traces, whereas "static" stands for data extracted from a static call graph.
- The suffix "BR" stands for "Best Rank" (i.e., for each feature we report the best rank of the methods from the gold set of that feature); the suffix "AR" stands for "All Ranks" (i.e., for each feature we report the ranks of all the methods from the gold set of that feature).
The results of comparing the all the feature location techniques based on their effectiveness can be downloaded here:
EffectivenessEclipse3.0.xls
EffectivenessRhino.xls
EffectivenessjEdit4.3.xls
The ranks of all methods from the gold set for the standalone web mining feature location techniques can be downloaded here:
EclipseAllRanksStandaloneTechniques.xls
RhinoAllRanksStandaloneTechniques.xls
jEditAllRanksStandaloneTechniques.xls
The results of applying simple heuristics, such as filtering "getter" and "setter" methods, as well as methods based on their fan-in values, could be downloaded here:
EclipseGetterSettersFanIn.xls
RhinoGetterSettersFanIn.xls

Participants

Bogdan Dit
E-mail: bdit at cs dot wm dot edu
Meghan Revelle
E-mail: meghan at cs dot wm dot edu
Denys Poshyvanyk
E-mail: denys at cs dot wm dot edu

We gratefully acknowledge financial support from the NSF on this research project.

Software Engineering Maintenance and Evolution Research Unit

at the College of William and Mary

Integrating Information Retrieval, Execution and Link Analysis Algorithms to Improve Feature Location in Software - Online Appendix

Data

Results

Participants