Event Graph of BPI Challenge 2017
doi:10.4121/14169584.v1
The doi above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future.
For a link that will always point to the latest version, please use
doi: 10.4121/14169584
doi: 10.4121/14169584
Datacite citation style:
Fahland, Dirk; Esser, Stefan (2021): Event Graph of BPI Challenge 2017. Version 1. 4TU.ResearchData. dataset. https://doi.org/10.4121/14169584.v1
Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite
Dataset
usage stats
2815
views
1
shares
4
citations
924
downloads
licence
CC BY 4.0
Business process event data modeled as labeled property graphs
Data Format
-----------
The dataset comprises one labeled property graph in two different file formats.
#1) Neo4j .dump format
A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/
/bin/neo4j-admin.(bat|sh) load --database=graph.db --from=
The .dump was created with Neo4j v3.5.
#2) .graphml format
A .zip file containing a .graphml file of the entire graph
Data Schema
-----------
The graph is a labeled property graph over business process event data. Each graph uses the following concepts
:Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"
:Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")
:Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node
:Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations
:CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities
:DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.
:HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log
:OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph
:REL relationship - placeholder for any structural relationship between two :Entity nodes
The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552
Data Contents
-------------
neo4j-bpic17-2021-02-17 (.dump|.graphml.zip)
An integrated graph describing the raw event data of the entire BPI Challenge 2017 dataset.
van Dongen, B.F. (Boudewijn) (2017): BPI Challenge 2017. 4TU.ResearchData. Collection. https://doi.org/10.4121/uuid:5f3067df-f10b-45da-b98b-86ae4c7a310b
This event log pertains to a loan application process of a Dutch financial institute. The data contains all applications filed trough an online system in 2016 and their subsequent events until February 1st 2017, 15:11. The company providing the data and the process under consideration is the same as doi:10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f. However, the system supporting the process has changed in the meantime. In particular, the system now allows for multiple offers per application. These offers can be tracked through their IDs in the log.
The data contains the following entities and their events
- Application - a credit application document submitted by a customer to a Dutch financial institute
- Offer - a loan offer document created by the institute and sent to the customer
- Workflow - a logical grouping of activities by the case management system supporting workers at the financial institute to handle applications and offers
- Case_R - a user or worker of the financial institute
- Case_AO - a derived entity describing the reified relation between an offer and its related application
- Case_AW - a derived entity describing the reified relation between the workflow and its related application
- Case_WO - a derived entity describing the reified relation between an offer and its related workflow
Data Size
---------
BPIC17, nodes: 1425995, relationships: 10300197
Data Format
-----------
The dataset comprises one labeled property graph in two different file formats.
#1) Neo4j .dump format
A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/
/bin/neo4j-admin.(bat|sh) load --database=graph.db --from=
The .dump was created with Neo4j v3.5.
#2) .graphml format
A .zip file containing a .graphml file of the entire graph
Data Schema
-----------
The graph is a labeled property graph over business process event data. Each graph uses the following concepts
:Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"
:Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")
:Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node
:Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations
:CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities
:DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.
:HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log
:OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph
:REL relationship - placeholder for any structural relationship between two :Entity nodes
The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552
Data Contents
-------------
neo4j-bpic17-2021-02-17 (.dump|.graphml.zip)
An integrated graph describing the raw event data of the entire BPI Challenge 2017 dataset.
van Dongen, B.F. (Boudewijn) (2017): BPI Challenge 2017. 4TU.ResearchData. Collection. https://doi.org/10.4121/uuid:5f3067df-f10b-45da-b98b-86ae4c7a310b
This event log pertains to a loan application process of a Dutch financial institute. The data contains all applications filed trough an online system in 2016 and their subsequent events until February 1st 2017, 15:11. The company providing the data and the process under consideration is the same as doi:10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f. However, the system supporting the process has changed in the meantime. In particular, the system now allows for multiple offers per application. These offers can be tracked through their IDs in the log.
The data contains the following entities and their events
- Application - a credit application document submitted by a customer to a Dutch financial institute
- Offer - a loan offer document created by the institute and sent to the customer
- Workflow - a logical grouping of activities by the case management system supporting workers at the financial institute to handle applications and offers
- Case_R - a user or worker of the financial institute
- Case_AO - a derived entity describing the reified relation between an offer and its related application
- Case_AW - a derived entity describing the reified relation between the workflow and its related application
- Case_WO - a derived entity describing the reified relation between an offer and its related workflow
Data Size
---------
BPIC17, nodes: 1425995, relationships: 10300197
history
- 2021-04-22 first online, published, posted
publisher
4TU.ResearchData
format
zipped graphml
Neo4j database dump (binary)
associated peer-reviewed publication
Multi-Dimensional Event Data in Graph Databases
references
organizations
TU Eindhoven, Department of Mathematics and Computer Science
DATA
files (3)
- 4,144 bytesMD5:
78538984f38e5d44858bdd17b17d2b06
readme_bpic17.txt - 399,080,542 bytesMD5:
f194a554137c015190ff3ffbd05a4842
neo4j-bpic17-2021-02-17.dump - 127,866,297 bytesMD5:
6513b2315588d59c72ae0b23766e3e5f
neo4j-bpic17-2021-02-17.graphml.zip -
download all files (zip)
526,950,983 bytes unzipped