Event Graph of BPI Challenge 2014

doi: 10.4121/14169494.v1
The doi above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future. For a link that will always point to the latest version, please use
doi: 10.4121/14169494
Datacite citation style:
Fahland, Dirk; Esser, Stefan (2021): Event Graph of BPI Challenge 2014. Version 1. 4TU.ResearchData. dataset. https://doi.org/10.4121/14169494.v1
Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite
Business process event data modeled as labeled property graphs

Data Format

The dataset comprises one labeled property graph in two different file formats.

#1) Neo4j .dump format

A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/

/bin/neo4j-admin.(bat|sh) load --database=graph.db --from=

The .dump was created with Neo4j v3.5.

#2) .graphml format

A .zip file containing a .graphml file of the entire graph

Data Schema

The graph is a labeled property graph over business process event data. Each graph uses the following concepts

:Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"

:Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")

:Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node

:Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations

:CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities

:DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.

:HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log

:OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph

:REL relationship - placeholder for any structural relationship between two :Entity nodes

The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552

Data Contents

neo4j-bpic14-2021-02-17 (.dump|.graphml.zip)

An integrated graph describing the raw event data of the entire BPI Challenge 2014 dataset.
van Dongen, B.F. (Boudewijn) (2014): BPI Challenge 2014. 4TU.ResearchData. Collection. https://doi.org/10.4121/uuid:c3e5d162-0cfd-4bb0-bd82-af5268819c35

BPI Challenge 2014: Similar to other ICT companies, Rabobank Group ICT has to implement an increasing number of software releases, while the time to market is decreasing. Rabobank Group ICT has implemented the ITIL-processes and therefore uses the Change-proces for implementing these so called planned changes. Rabobank Group ICT is looking for fact-based insight into sub questions, concerning the impact of changes in the past, to predict the workload at the Service Desk and/or IT Operations after future changes. The challenge is to design a (draft) predictive model, which can be used to implement in a BI environment. The purpose of this predictive model will be to support Business Change Management in implementing software releases with less impact on the Service Desk and/or IT Operations. We have prepared several case-files with anonymous information from Rabobank Netherlands Group ICT for this challenge. The files contain record details from an ITIL Service Management tool called HP Service Manager.

The original data had the information as extracts in CSV with the Interaction-, Incident- or Change-number as case ID. Next to these case-files, we provide you with an Activity-log, related to the Incident-cases. There is also a document detailing the data in the CSV file and providing background to the Service Management tool. All this information is integrated in the labeled property graph in this dataset.

The data contains the following entities and their events

- ServiceComponent - an IT hardware or software component in a financial institute
- ConfigurationItem - an part of a ServiceComponent that can be configured, changed, or modified
- Incident - a problem or issue that occurred at a configuration item of a service component
- Interaction - a logical grouping of activities performed for investigating an incident and identifying a solution for the incident
- Change - a logical grouping of activities performed to change or modify one or more configuration items
- Case_R - a user or worker involved in any of the steps
- KM - an entry in the knowledge database used to resolve incidents

Data Size

BPIC14, nodes: 919838, relationships: 6682386

  • 2021-04-22 first online, published, posted
zipped graphml Neo4j database dump (binary)
associated peer-reviewed publication
Multi-Dimensional Event Data in Graph Databases
TU Eindhoven, Department of Mathematics and Computer Science


files (3)