The Maven Dependency Dataset
datasetposted on 10.01.2013 by Steven Raemaekers, A. (Arie) van Deursen, Joost Visser
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
The Maven Dependency Dataset contains the data as described in the paper "Mining Metrics, Changes and Dependencies from the Maven Dependency Dataset". NOTE: See the README.TXT file for more information on the data in this dataset. The dataset consists of multiple parts: A snapshot of the Maven repository dated July 30, 2011 (maven.tar.gz), a MySQL database (complete.tar.gz) containing information on individual methods, classes and packages of different library versions, a Berkeley DB database (berkeley.tar.gz) containing metrics on all methods, classes and packages in the repository, a Neo4j graph database (graphdb.tar.gz) containing a call graph of the entire repository, scripts and analysis files (scriptsAndData.tar.gz), Source code and a binary package of the analysis software (fullmaven.jar and fullmaven-sources.jar), and text dumps of data in these databases (graphdump.tar.gz, processed.tar.gz, calls.tar.gz and units.tar.gz).