Introduction:
Generated Petri Net Markup Language (PNML) models and log traces (in XES)
Using PTandLogGenerator[1], 4320 Petri net models with specific characteristics
were generated (in PNML), along with a noisy log trace (in XES) per model.
Generation:
The models and logs were generated as follows:
- We specify the number of different activities in the Petri net to be on
average 25, 50, or 75 (ACT). This resulted in a Petri net containing on
average respectively 89, 256, or 442 transitions and 84, 242, or 410 places.
- We set the process operators to what is considered a standard (STD) setting
[1]. The operators are as follows:
– - the probability for sequence operators: 45% (SEQ),
– - the probability for XOR operators: 20% (XOR),
– - the probability for parallel operators: 20% (PAR),
– - the probability for loop operators: 10% (LOOP),
– - the probability for OR operators: 5% (OR).
- We also consider variants, where sequence operators occur 25% instead of 45%
and with a 20% increase in probability for one of the other operators (the
total must be 100%):
– - the probability for XOR: 40% (+XOR),
– - the probability for parallel: 40% (+PAR),
– - the probability for loop: 30% (+LOOP),
– - the probability for OR: 25% (+OR).
We also consider another variant (ALT) with sequence, parallel, XOR, loop,
and OR respectively set to 46%, 35%, 19%, 0%, 0%, to resemble standard models
without loops [2], [3].
- We also set additional features, for which the standard (STD) setting is as
follows:
– - the probability for silent activities: 20% (SIL),
– - the probability for duplicate activities: 20% (DUP),
– - the probability for long-term dependencies: 20% (LONG).
For each of these parameters, we consider a variant with one feature set to
0% (–SIL, –DUP, –LONG) and a variant with one feature set to 50% (+SIL, +DUP,
+LONG). We only use these variants for standard process operators and use the
standard additional features setting when we change one of the process
operator settings, such that only one aspect is changed.
With the above we obtain 3 x (6+6) = 36 different settings. For each setting,
we generate four models and generate a single random log trace per model. In
the four log traces we add 10%, 30%, 50% and 70% noise by adding, removing, and
swapping events. We now have 36 x 4 = 144 different models and log traces and
we repeat this procedure 30 times (EXP) to obtain 144 x 30 = 4320 different
experiments.
File structure:
Each model-log pair is in a separate folder, the name of the folder describes
the settings used for the generation in the following form:
"g-[SEQ]-[XOR]-[PAR]-[LOOP]-[OR]--[SIL]-[DUP]-[LONG]--[NOISE]--[ACT]--[EXP]"
thus, as an example, consider the following folder name:
"g-25-20-20-30-5--20-20-20--50--50--18"
Here, there is a 25% probability on sequence operators, 20% on XOR operators,
etc.
Each folder contains a file model.pnml and a file log.xes, which are the
generated model and log pairs. The log file contains exactly one log trace.
Most folders also contain a model.dot file, which is a DOT representation of
the Petri net.
References:
[1] T. Jouck and B. Depaire, “PTandLogGenerator: A Generator for Artifi- cial
Event Data,” in Proceedings of the BPM Demo Track 2016 Co- located with the
14th International Conference on Business Process Management (BPM 2016), Rio de
Janeiro, Brazil, September 21, 2016., pp. 23–27, 2016.
[2] M. Kunze, A. Luebbe, M. Weidlich, and M. Weske, Towards Under- standing
Process Modeling – The Case of the BPM Academic Initiative, pp. 44–58. Berlin,
Heidelberg: Springer Berlin Heidelberg, 2011.
[32] S. J. van Zelst, A. Bolt, and B. F. van Dongen, “Tuning Alignment
Computation: An Experimental Evaluation,” in Proceedings of the International
Workshop on Algorithms & Theories for the Analysis of Event Data, ATAED 2017,
Zaragoza, Spain, June 25-30, 2017., pp. 1– 15, 2017.