License Usage and Changes: A Large-Scale Study of Java Projects on GitHub - ICPC 2015 Online Appendix

This web page is a companion to our ICPC paper entitled "License Usage and Changes: A Large-Scale Study of Java Projects on GitHub".

1. Data

GitHub Projects

The list of projects and corresponding GitHub urls is available as a CSV file

Commit and Issue Tracker Data

For access to mined commit messages and issue tracker discussion data, please send us an email.

2. Scripts

Analysis Scripts

RQ1: The following script and text file were used to generate our license usage data: License Usage Script and License List (this list was extracted from our dataset)

Usage: >./ [ProjectsList] [LicenseList] [PathToProjectResultsFolder]

Note: The script requires bash 4 and because of this the script cannot be run using "sh"

RQ2: The following script was used to generate the atomic license changes:Generate Sequences

Usage: >python [MARKOS_result_file] [output_file]

3. Results

Grounded Theory Taxonomy:The following table contains the list categories from our Grounded Theory analysis and explanation:

Categories Explanation
Adjust Licensing Made a minor modification to the license
Compliance Changes to licensing to improve license compliance
Copyright Update Changes to copyright header (subcategories: updated year and updated authors)
Dependency License Added Adding a license due to dependency constraint
Dependecy License Issue Licensing issue introduced by a dependency
False Positive Unrelated Messages
Fix Licensing Fixed an error in the licensing or incorrect licensing
License Added States that some license was attributed
License Declaration Declares license in message
License Rollback Related to reverting to some previously used license
License Change Generic discussion of changing license
Removed Feature for Compliance Remove code due to conflicting licenses
Removed Licensing Licensing removed (subcategory - removed dual-license and licensed files removed)
Renamed License File License file was renamed, but no change to licensing
Textual License Update Updated the text of the license, but no license change
Unclear Possibly related to licensing, but its relationship is unclear (possibly a false positive)
Update License License file updated (generic update to license)
Verify Licensing Making sure the licensing was correct
License Clarification Discussion asked about that actual license of the project
Reuse Discussion related to reusing the code from some project
Choosing License Discussion related to picking the project's license
License Compatibility Discussion related to conflicting licensing or about the compatibility of two licenses
License Agreement Discussion related to an Contributor License Agreement for someone new to contribute to the project
License Clarifications Discussion related to the actual terms/implications of the license

MARKOS License Analyzer Results

For access to the results of the MARKOS License Analyzer, please send us an email.


  • Christopher Vendome - The College of William and Mary, VA, USA.
    E-mail: cvendome at cs dot wm dot edu
  • Mario Linares-Vásquez - The College of William and Mary, VA, USA.
    E-mail: mlinarev at cs dot wm dot edu
  • Gabriele Bavota - Free University of Bolzano, Italy.
    Email: gabriele.bavota at unibz dot it
  • Massimiliano Di Penta - University of Sannio, Benevento, Italy.
    Email: dipenta at unisannio dot it
  • Daniel German - University of Victoria.
    E-mail: dmg at cs dot uvic dot ca
  • Denys Poshyvanyk - The College of William and Mary.
    E-mail: denys at cs dot wm dot edu