Machine Learning-Based Detection of Open Source License Exceptions - ICSE 2017 Online Appendix

This web page is a companion to our ICSE paper entitled "Machine Learning-Based Detection of Open Source License Exceptions".


1. Data


License Files

The canonical license files canonical licenses (license can also be found at SPDX or Open Source Initiative).

License Exceptions

The canonical license exceptions canonical exceptions (license exceptions can also be found at SPDX).

Classifier Datasets

Synthetic Datasets - contains datasets for training/testing for each sample size (Warning: ~2.5GB compressed and decompresses to ~11GB) Synthetic Datasets (compressed).

Real Dataset - contains dataset from real exceptions real dataset (Note: Empty folders are required by weka to represent the classes. Without empty folders, the results do not align in the output properly).

2. Results

Classifiers

Raw data result/outputs from classifiers synthetic results and real results.

Synthetic Results

F-Measure vs dataset size for machine learners and baseline approach: pdf and png.




Confidence intervals (95%) with mean F-Measure fore machine learners and baseline: pdf and png.




ROC vs dataset size for machine learners (baseline omitted as it has 0 FPR): pdf and png.




Confidence intervals (95%) with mean ROC for machine learners: pdf and png.




True Positive Rate and False Positive Rate Data (Note: Baseline does not yield false positives and it is omitted since FPR = 0): TP.csv and FP.csv.

Real Results

F-Measure vs dataset size for machine learners and baseline approach: pdf and png.




Confidence intervals (95%) with mean F-Measure fore machine learners and baseline: pdf and png.




ROC vs dataset size for machine learners (baseline omitted as it has 0 FPR): pdf and png.




Confidence intervals (95%) with mean ROC for machine learners: pdf and png.






*Authors

  • Christopher Vendome - The College of William and Mary, VA, USA.
    E-mail: cvendome at cs dot wm dot edu
  • Mario Linares-Vásquez - Universidad de los Andes, Bogota, Colombia.
    E-mail: m dot linaresv at uniandes dot edu dot co
  • Gabriele Bavota - Free University of Bolzano, Italy.
    Email: gabriele.bavota at unibz dot it
  • Massimiliano Di Penta - University of Sannio, Benevento, Italy.
    Email: dipenta at unisannio dot it
  • Daniel German - University of Victoria.
    E-mail: dmg at cs dot uvic dot ca
  • Denys Poshyvanyk - The College of William and Mary.
    E-mail: denys at cs dot wm dot edu