License Usage and Changes: A Large-Scale Study on GitHub - EMSE Special Issue Online Appendix

This web page is a companion to our EMSE paper entitled "License Usage and Changes: A Large-Scale Study on GitHub". This work extends our prior ICPC'15 Paper.


1. Data


GitHub Projects

The list of Java projects and corresponding GitHub urls is available as a CSV file.

Commit and Issue Tracker Data


Sampled Commit Notes

The Java commit notes are contained in the following archive: Java Commits each file in the archive corresponds to a commit note.

The commits notes below are fomatted: [project],[commit_hash],[commit_note]

The Java commit notes corresponding to commits with atomic license changes are contained in the following csv: Java License Changes Commits

The C commit notes are in the following csv: C Commits

The C++ commit notes are in the following csv: C++ Commits

The C# commit notes are in the following csv: C# Commits

The Javascript commit notes are in the following csv: Javascript Commits

The Python commit notes are in the following csv: Python Commits

The Ruby commit notes are in the following csv: Ruby Commits

Sampled Issue Tracker Urls

The C commit notes are in the following csv: C Issues

The C++ commit notes are in the following csv: C++ Issues

The C# commit notes are in the following csv: C# Issues

The Java commit notes are in the following csv: Java Issues

The Javascript commit notes are in the following csv: Javascript Issues

The Python commit notes are in the following csv: Python Issues

The Ruby commit notes are in the following csv: Ruby Issues

2. Scripts

Analysis Scripts

RQ1: The following script and text file were used to generate our license usage data: License Usage Script and License List (this list was extracted from our dataset)

Usage: >./lic_usage.sh [ProjectsList] [LicenseList] [PathToProjectResultsFolder]

Note: The script requires bash 4 and because of this the script cannot be run using "sh"

RQ2: The following script was used to generate the atomic license changes: Generate Sequences

Usage: >python AtomicLicenseChanges.py [MARKOS_result_file] [output_file]


3. Results

MARKOS License Analyzer Results

The MARKOS License Analyzer raw results from our Java dataset can be found in the following archive: MARKOS Results(~4Gb).



*Authors

  • Christopher Vendome - The College of William and Mary, VA, USA.
    E-mail: cvendome at cs dot wm dot edu
  • Gabriele Bavota - Free University of Bolzano, Italy.
    Email: gabriele.bavota at unibz dot it
  • Massimiliano Di Penta - University of Sannio, Benevento, Italy.
    Email: dipenta at unisannio dot it
  • Mario Linares-Vásquez - The College of William and Mary, VA, USA.
    E-mail: mlinarev at cs dot wm dot edu
  • Daniel German - University of Victoria.
    E-mail: dmg at cs dot uvic dot ca
  • Denys Poshyvanyk - The College of William and Mary.
    E-mail: denys at cs dot wm dot edu