Event Graph of BPI Challenge 2017

doi: 10.4121/14169584.v1
The doi above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future. For a link that will always point to the latest version, please use
doi: 10.4121/14169584
Datacite citation style:
Fahland, Dirk; Esser, Stefan (2021): Event Graph of BPI Challenge 2017. Version 1. 4TU.ResearchData. dataset. https://doi.org/10.4121/14169584.v1
Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite
Dataset
Business process event data modeled as labeled property graphs

Data Format
-----------

The dataset comprises one labeled property graph in two different file formats.

#1) Neo4j .dump format

A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/

/bin/neo4j-admin.(bat|sh) load --database=graph.db --from=

The .dump was created with Neo4j v3.5.

#2) .graphml format

A .zip file containing a .graphml file of the entire graph


Data Schema
-----------

The graph is a labeled property graph over business process event data. Each graph uses the following concepts

:Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"

:Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")

:Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node

:Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations

:CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities

:DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.

:HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log

:OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph

:REL relationship - placeholder for any structural relationship between two :Entity nodes

The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552


Data Contents
-------------

neo4j-bpic17-2021-02-17 (.dump|.graphml.zip)

An integrated graph describing the raw event data of the entire BPI Challenge 2017 dataset.
van Dongen, B.F. (Boudewijn) (2017): BPI Challenge 2017. 4TU.ResearchData. Collection. https://doi.org/10.4121/uuid:5f3067df-f10b-45da-b98b-86ae4c7a310b

This event log pertains to a loan application process of a Dutch financial institute. The data contains all applications filed trough an online system in 2016 and their subsequent events until February 1st 2017, 15:11. The company providing the data and the process under consideration is the same as doi:10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f. However, the system supporting the process has changed in the meantime. In particular, the system now allows for multiple offers per application. These offers can be tracked through their IDs in the log.

The data contains the following entities and their events

- Application - a credit application document submitted by a customer to a Dutch financial institute
- Offer - a loan offer document created by the institute and sent to the customer
- Workflow - a logical grouping of activities by the case management system supporting workers at the financial institute to handle applications and offers
- Case_R - a user or worker of the financial institute
- Case_AO - a derived entity describing the reified relation between an offer and its related application
- Case_AW - a derived entity describing the reified relation between the workflow and its related application
- Case_WO - a derived entity describing the reified relation between an offer and its related workflow


Data Size
---------

BPIC17, nodes: 1425995, relationships: 10300197

history
  • 2021-04-22 first online, published, posted
publisher
4TU.ResearchData
format
zipped graphml Neo4j database dump (binary)
associated peer-reviewed publication
Multi-Dimensional Event Data in Graph Databases
organizations
TU Eindhoven, Department of Mathematics and Computer Science

DATA

files (3)