Data underlying the publication: A Ground Truth Approach for Assessing Process Mining Techniques
DOI: 10.4121/bc43e334-74e1-44ff-abf1-ed32847250c9
Dataset
Licence CC BY-NC 4.0
This folder contains the synthetically generated dataset (process model and event logs) containing process data of a synthetically designed package delivery process, as described in [1]. The event logs present simulations of a process model, each with an incorporated issue, be it a behavioral deviation, i.e., where the process is differently exhibited with regard to the expected behavior described by the process model, or a recording error, i.e., where the execution of the process is recorded differently with regard to how it is exhibited. Each issue is added to the process model through a model transformation providing ground truth to the discrepancies introduced in the simulated event log.
The package delivery process starts with the choice of home or depot delivery, after which the package queues for a warehouse employee to pick and load it into a van. In case of home delivery, a courier drives off and rings a door after which he continues to either immediately hand over the package, or deliver it at the corresponding depot after registration, where it is left for collection. Alternatively, for depot delivery, "ringing" and therefore also "deliver at home" is omitted in the subprocess.
models/delivery_base_model.json contains the specification of the process model that incorporates this "expected behavior", and is depicted in models/delivery_base_model.pdf.
On top of this, six patterns of behavioral deviations (BI) and six patterns of recording errors (RI) are applied to the base model:
BI5: Overtaking in the FIFO queue for picking packages;
BI7: Switching roles from a courier to that of a warehouse employee;
BI10: Batching is ignored, leaving with a delivery van before it was fully loaded;
BI3: Skipping the activity of ringing, modeling behavior where e.g., the door was already opened upon arrival;
BI9: Different resource memory where the package is delivered to a different depot than where it is registered;
BI2: Multitasking of couriers during the delivery of multiple packages, modeling interruption of a delivery;
RI1: Incorrect event, recording an order for depot delivery when it was intended for home delivery;
RI2: Incorrect event, vice versa, i.e., recording an order for home delivery when it was intended for depot delivery;
RI3: Missing event for the activity of loading a package in a truck;
RI4: Missing object of the involved van for loading, e.g., due to a temporary connection failure of a recording device;
RI5: Incorrect object of the involved courier when ringing, e.g., due to not logging out by the courier on the previous shift;
RI6: Missing positions for the recording of the delivery and the collection at a depot, e.g., due to coarse timestamp logging.
The behavior of each deviation pattern is added separately to the base model, resulting in twelve process models, accordingly named models/package_delivery_<deviation>.json.
Each model is simulated resulting in twelve logs, accordingly named logs/package_delivery_<deviation>.json. Each log is a partially ordered set of transition firings, of which the elements are denoted by the list M, with partial order relation as specified by the matrix r, such that r[i][j] = 1 iff M[I] < M[j]. A transition firing in M is formatted as follows: [transition_name, transition_label, binding, subtracted_marking, added_marking, timestamp]. Note that the log is composed of the labels of only the labeled transition firings, i.e., with transition_label != null. However, having the complete execution of the process model with transition names provides ground truth to which issues are introduced in the simulated event log.
All models and corresponding generated logs with the applied patterns are also available at gitlab.com/dominiquesommers/mira/-/tree/main/mira/simulation, which additionally includes scripts to load and process the data.
We refer to [1] for more information on the dataset.
[1] Dominique Sommers, Natalia Sidorova, Boudewijn F. van Dongen. A ground truth approach for assessing process mining techniques. arXiv preprint, https://doi.org/10.48550/arXiv.2501.14345, 2025.
History
- 2025-02-04 first online, published, posted
Publisher
4TU.ResearchDataFormat
application/jsonReferences
Organizations
TU Eindhoven, Department of Mathemathics and Computer ScienceDATA
Files (3)
- 4,125 bytesMD5:
3e5b5678948174a6996d94ca4e82af6b
readme.txt - 14,718 bytesMD5:
2a37fbee7eb70e2614547bda334fd11a
logs.zip - 562,745 bytesMD5:
61b0e15c66c4aa285a34c9011df37393
models.zip -
download all files (zip)
581,588 bytes unzipped