Benchmarking logs to test scalability of process discovery algorithms
datasetposted on 12.10.2017 by Wil van der Aalst
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
The set of event logs included, are aimed to support the evaluation of the performance of process discovery algorithms. The largest event logs in this data set have millions of events. If you need even bigger datasets, you can generate these yourself using the CPN Tools sources files included (*.cpn). Each file has two parameters nofcases (i.e., the number of process instances) and nofdupl (i.e., the number of times a process is replicated with unique new names).