Integrating Information Retrieval, Execution and Link Analysis Algorithms to Improve Feature Location in Software - Online Appendix

This web page is a companion to our Empirical Software Engineering submission entitled "Integrating Information Retrieval, Execution and Link Analysis Algorithms to Improve Feature Location in Software"

Dit, B., Revelle, M., and Poshyvanyk, D., "Integrating Information Retrieval, Execution and Link Analysis Algorithms to Improve Feature Location in Software", Empirical Software Engineering (EMSE), accepted [site]

Data

Eclipse 3.0 Rhino jEdit 4.3
Corpora Corpus-Eclipse3.0.zip Corpus-Rhino.zip Corpus-jEdit4.3.zip
Features Eclipse 3.0 features Rhino features jEdit 4.3 features
Queries Queries-Eclipse3.0.zip Queries-Rhino.zip Queries-jEdit4.3.zip

Results

  • Download the data used to compute the effectiveness measure for the feature location techniques that combine information retrieval, dynamic information and web mining (IR+Dyn+WebMining), as well as information retrieval, static information and web mining (IR+Static+WebMining), when we filter top x methods and bottom x methods (x=0%, 10%, …, 100%).
    EclipseIRDynStaticAllTopAndBottomFilters.xls
    RhinoIRDynStaticAllTopAndBottomFilters.xls
    jEditIRDynAllTopAndBottomFilters.xls - (No IR+Static+WebMining)

    Notes:

    • The numbers in the red worksheets represent the ranks of the methods in the gold set, after filtering X% of methods (X is denoted by the column header). Each red tab (that contains the raw data) has a corresponding tab which visualizes the data using box plots.

    • Where the word "Binary" is not explicitly stated in the worksheets names, we refer to using the frequency weights (as opposed to the binary weights).

    • "Dyn" stands for dynamic data extracted from traces, whereas "static" stands for data extracted from a static call graph.

    • The suffix "BR" stands for "Best Rank" (i.e., for each feature we report the best rank of the methods from the gold set of that feature); the suffix "AR" stands for "All Ranks" (i.e., for each feature we report the ranks of all the methods from the gold set of that feature).

  • The results of comparing the all the feature location techniques based on their effectiveness can be downloaded here:
    EffectivenessEclipse3.0.xls
    EffectivenessRhino.xls
    EffectivenessjEdit4.3.xls

  • The ranks of all methods from the gold set for the standalone web mining feature location techniques can be downloaded here:
    EclipseAllRanksStandaloneTechniques.xls
    RhinoAllRanksStandaloneTechniques.xls
    jEditAllRanksStandaloneTechniques.xls

  • The results of applying simple heuristics, such as filtering "getter" and "setter" methods, as well as methods based on their fan-in values, could be downloaded here:
    EclipseGetterSettersFanIn.xls
    RhinoGetterSettersFanIn.xls

Participants


We gratefully acknowledge financial support from the NSF on this research project.