%0 Computer Program %A Roest, Vivian %D 2024 %T Data underlying the BSc project: "An analysis of Java release practices on GitHub" %U %R 10.4121/67a790fe-b65a-4c30-aae0-c5b2dc7e5d4d.v1 %K GitHub %K Java %K POM.xml %K Maven %K Scraper %X

This dataset contains the following inside a tar.zst file:

  1. A list of all Java repositories on GitHub in a CSV format
  2. The POM.xml file from those repositories if there was one at the root of the repo
  3. A sample of 500 000 repositories that
  4. Have been searched recursively for POM.xml files
  5. Of those that have a POM.xml file an 'effective' POM.xml has been created
  6. Of those that have distribution repositories configured, GitHub workflow files if they exist
  7. a report.json file that contains aggregate information of the sample


The scraper written to retrieve this data is also included.


This dataset was created for a Computer Science Bachelor Research Project titled "An analysis of Java release practices on GitHub" by Vivian Roest.

%I 4TU.ResearchData