TY - DATA T1 - Data underlying the BSc project: "An analysis of Java release practices on GitHub" PY - 2024/01/29 AU - Vivian Roest UR - DO - 10.4121/67a790fe-b65a-4c30-aae0-c5b2dc7e5d4d.v1 KW - GitHub KW - Java KW - POM.xml KW - Maven KW - Scraper N2 -

This dataset contains the following inside a tar.zst file:

  1. A list of all Java repositories on GitHub in a CSV format
  2. The POM.xml file from those repositories if there was one at the root of the repo
  3. A sample of 500 000 repositories that
  4. Have been searched recursively for POM.xml files
  5. Of those that have a POM.xml file an 'effective' POM.xml has been created
  6. Of those that have distribution repositories configured, GitHub workflow files if they exist
  7. a report.json file that contains aggregate information of the sample


The scraper written to retrieve this data is also included.


This dataset was created for a Computer Science Bachelor Research Project titled "An analysis of Java release practices on GitHub" by Vivian Roest.

ER -