License Usage and Changes: A Large-Scale Study of Java Projects on GitHub - ICPC 2015 Online Appendix
This web page is a companion to our ICPC paper entitled "License Usage and Changes: A Large-Scale Study of Java Projects on GitHub".
1. Data
GitHub Projects
The list of projects and corresponding GitHub urls is available as a CSV file
Commit and Issue Tracker Data
For access to mined commit messages and issue tracker discussion data, please send us an email.
2. Scripts
Analysis Scripts
RQ1: The following script and text file were used to generate our license usage data: License Usage Script and License List (this list was extracted from our dataset)
Usage: >./lic_usage.sh [ProjectsList] [LicenseList] [PathToProjectResultsFolder]
Note: The script requires bash 4 and because of this the script cannot be run using "sh"
RQ2: The following script was used to generate the atomic license changes:Generate Sequences
Usage: >python AtomicLicenseChanges.py [MARKOS_result_file] [output_file]
3. Results
Grounded Theory Taxonomy:The following table contains the list categories from our Grounded Theory analysis and explanation:
Categories | Explanation |
---|---|
Adjust Licensing | Made a minor modification to the license |
Compliance | Changes to licensing to improve license compliance |
Copyright Update | Changes to copyright header (subcategories: updated year and updated authors) |
Dependency License Added | Adding a license due to dependency constraint |
Dependecy License Issue | Licensing issue introduced by a dependency |
False Positive | Unrelated Messages |
Fix Licensing | Fixed an error in the licensing or incorrect licensing |
License Added | States that some license was attributed |
License Declaration | Declares license in message |
License Rollback | Related to reverting to some previously used license |
License Change | Generic discussion of changing license |
Removed Feature for Compliance | Remove code due to conflicting licenses |
Removed Licensing | Licensing removed (subcategory - removed dual-license and licensed files removed) |
Renamed License File | License file was renamed, but no change to licensing |
Textual License Update | Updated the text of the license, but no license change |
Unclear | Possibly related to licensing, but its relationship is unclear (possibly a false positive) |
Update License | License file updated (generic update to license) |
Verify Licensing | Making sure the licensing was correct |
License Clarification | Discussion asked about that actual license of the project |
Reuse | Discussion related to reusing the code from some project |
Choosing License | Discussion related to picking the project's license |
License Compatibility | Discussion related to conflicting licensing or about the compatibility of two licenses |
License Agreement | Discussion related to an Contributor License Agreement for someone new to contribute to the project |
License Clarifications | Discussion related to the actual terms/implications of the license |
MARKOS License Analyzer Results
For access to the results of the MARKOS License Analyzer, please send us an email.
*Authors
- Christopher Vendome
- The College of William and Mary, VA, USA.
E-mail: cvendome at cs dot wm dot edu - Mario Linares-Vásquez
- The College of William and Mary, VA, USA.
E-mail: mlinarev at cs dot wm dot edu - Gabriele Bavota -
Free University of Bolzano, Italy.
Email: gabriele.bavota at unibz dot it - Massimiliano Di Penta - University of Sannio, Benevento, Italy.
Email: dipenta at unisannio dot it - Daniel German
- University of Victoria.
E-mail: dmg at cs dot uvic dot ca - Denys Poshyvanyk
- The College of William and Mary.
E-mail: denys at cs dot wm dot edu