3 files

Event Graph of BPI Challenge 2016

posted on 22.04.2021, 06:58 by Dirk Fahland, Stefan Esser
Business process event data modeled as labeled property graphs

Data Format

The dataset comprises one labeled property graph in two different file formats.

#1) Neo4j .dump format

A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/

/bin/neo4j-admin.(bat|sh) load --database=graph.db --from=

The .dump was created with Neo4j v3.5.

#2) .graphml format

A .zip file containing a .graphml file of the entire graph

Data Schema

The graph is a labeled property graph over business process event data. Each graph uses the following concepts

:Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"

:Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")

:Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node

:Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations

:CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities

:DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.

:HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log

:OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph

:REL relationship - placeholder for any structural relationship between two :Entity nodes

The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552

Data Contents

neo4j-bpic16-2021-02-17 (.dump|.graphml.zip)

An integrated graph describing the raw event data of the entire BPI Challenge 2016 dataset.
Dees, Marcus; van Dongen, B.F. (Boudewijn) (2016): BPI Challenge 2016. 4TU.ResearchData. Collection. https://doi.org/10.4121/uuid:360795c8-1dd6-4a5b-a443-185001076eab

UWV (Employee Insurance Agency) is an autonomous administrative authority (ZBO) and is commissioned by the Ministry of Social Affairs and Employment (SZW) to implement employee insurances and provide labour market and data services in the Netherlands. The Dutch employee insurances are provided for via laws such as the WW (Unemployment Insurance Act), the WIA (Work and Income according to Labour Capacity Act, which contains the IVA (Full Invalidity Benefit Regulations), WGA (Return to Work (Partially Disabled) Regulations), the Wajong (Disablement Assistance Act for Handicapped Young Persons), the WAO (Invalidity Insurance Act), the WAZ (Self-employed Persons Disablement Benefits Act), the Wazo (Work and Care Act) and the Sickness Benefits Act. The data in this collection pertains to customer contacts over a period of 8 months and UWV is looking for insights into their customers' journeys. Data has been collected from several different sources, namely: 1) Clickdata from the site www.werk.nl collected from visitors that were not logged in, 2) Clickdata from the customer specific part of the site www.werk.nl (a link is made with the customer that logged in), 3) Werkmap Message data, showing when customers contacted the UWV through a digital channel, 4) Call data from the callcenter, showing when customers contacted the call center by phone, and 5) Complaint data showing when customers complained. All data is accompanied by data fields with anonymized information about the customer as well as data about the site visited or the contents of the call and/or complaint. The texts in the dataset are provided in both Dutch and English where applicable. URL's are included based on the structure of the site during the period the data has been collected. UWV is interested in insights on how their channels are being used, when customers move from one contact channel to the next and why and if there are clear customer profiles to be identified in the behavioral data. Furthermore, recommendations are sought on how to serve customers without the need to change the contact channel.

The data contains the following entities and their events

- Customer - customer of a Dutch public agency for handling unemployment benefits
- Office_U - user or worker involved in an activity handling a customer interaction
- Office_W - user or worker involved in an activity handling a customer interaction
- Complaint - a complaint document handed in by a customer
- ComplaintDossier - a collection of complaints by the same customer
- Session - browser-session identifier of a user browsing the website of the agency
- IP - IP address of a user browsing the website of the agency

Data Size

BPIC16, nodes: 8109680, relationships: 86833139





zipped graphml Neo4j database dump (binary)


TU Eindhoven, Department of Mathematics and Computer Science