Integrating Information Retrieval, Execution and Link Analysis Algorithms to Improve Feature Location in Software - Online Appendix
This web page is a companion to our Empirical Software Engineering submission entitled "Integrating Information Retrieval, Execution and Link Analysis Algorithms to Improve Feature Location in Software"
Dit, B., Revelle, M., and Poshyvanyk, D., "Integrating Information Retrieval, Execution and Link Analysis Algorithms to Improve Feature Location in Software", Empirical Software Engineering (EMSE), accepted [site]
Data
Eclipse 3.0 | Rhino | jEdit 4.3 | |
---|---|---|---|
Corpora | Corpus-Eclipse3.0.zip | Corpus-Rhino.zip | Corpus-jEdit4.3.zip |
Features | Eclipse 3.0 features | Rhino features | jEdit 4.3 features |
Queries | Queries-Eclipse3.0.zip | Queries-Rhino.zip | Queries-jEdit4.3.zip |
-
Download all the Eclipse 3.0 data (8.37MB), Rhino data (1.42MB) and jEdit 4.3 data (0.8MB).
Results
-
Download the data used to compute the effectiveness measure for the feature location techniques that combine information retrieval, dynamic information and web mining (IR+Dyn+WebMining), as well as information retrieval, static information and web mining (IR+Static+WebMining), when we filter top x methods and bottom x methods (x=0%, 10%, …, 100%).
EclipseIRDynStaticAllTopAndBottomFilters.xls
RhinoIRDynStaticAllTopAndBottomFilters.xls
jEditIRDynAllTopAndBottomFilters.xls - (No IR+Static+WebMining)Notes:
-
The numbers in the red worksheets represent the ranks of the methods in the gold set, after filtering X% of methods (X is denoted by the column header). Each red tab (that contains the raw data) has a corresponding tab which visualizes the data using box plots.
-
Where the word "Binary" is not explicitly stated in the worksheets names, we refer to using the frequency weights (as opposed to the binary weights).
-
"Dyn" stands for dynamic data extracted from traces, whereas "static" stands for data extracted from a static call graph.
-
The suffix "BR" stands for "Best Rank" (i.e., for each feature we report the best rank of the methods from the gold set of that feature); the suffix "AR" stands for "All Ranks" (i.e., for each feature we report the ranks of all the methods from the gold set of that feature).
-
-
The results of comparing the all the feature location techniques based on their effectiveness can be downloaded here:
EffectivenessEclipse3.0.xls
EffectivenessRhino.xls
EffectivenessjEdit4.3.xls -
The ranks of all methods from the gold set for the standalone web mining feature location techniques can be downloaded here:
EclipseAllRanksStandaloneTechniques.xls
RhinoAllRanksStandaloneTechniques.xls
jEditAllRanksStandaloneTechniques.xls -
The results of applying simple heuristics, such as filtering "getter" and "setter" methods, as well as methods based on their fan-in values, could be downloaded here:
EclipseGetterSettersFanIn.xls
RhinoGetterSettersFanIn.xls
Participants
- Bogdan Dit
E-mail: bdit at cs dot wm dot edu
- Meghan Revelle
E-mail: meghan at cs dot wm dot edu
- Denys Poshyvanyk
E-mail: denys at cs dot wm dot edu
We gratefully acknowledge financial support from the NSF on this research project.