cff-version: 1.2.0
abstract: "Business process event data modeled as labeled property graphs
Data Format
-----------
The dataset comprises one labeled property graph in two different file formats.
#1) Neo4j .dump format
A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/
/bin/neo4j-admin.(bat|sh) load --database=graph.db --from=
The .dump was created with Neo4j v3.5.
#2) .graphml format
A .zip file containing a .graphml file of the entire graph
Data Schema
-----------
The graph is a labeled property graph over business process event data. Each graph uses the following concepts
:Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"
:Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")
:Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node
:Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations
:CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities
:DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.
:HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log
:OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph
:REL relationship - placeholder for any structural relationship between two :Entity nodes
The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552
Data Contents
-------------
neo4j-bpic16-2021-02-17 (.dump|.graphml.zip)
An integrated graph describing the raw event data of the entire BPI Challenge 2016 dataset.
Dees, Marcus; van Dongen, B.F. (Boudewijn) (2016): BPI Challenge 2016. 4TU.ResearchData. Collection. https://doi.org/10.4121/uuid:360795c8-1dd6-4a5b-a443-185001076eab
UWV (Employee Insurance Agency) is an autonomous administrative
authority (ZBO) and is commissioned by the Ministry of Social Affairs
and Employment (SZW) to implement employee insurances and provide labour
market and data services in the Netherlands. The Dutch employee
insurances are provided for via laws such as the WW (Unemployment
Insurance Act), the WIA (Work and Income according to Labour Capacity
Act, which contains the IVA (Full Invalidity Benefit Regulations), WGA
(Return to Work (Partially Disabled) Regulations), the Wajong
(Disablement Assistance Act for Handicapped Young Persons), the WAO
(Invalidity Insurance Act), the WAZ (Self-employed Persons Disablement
Benefits Act), the Wazo (Work and Care Act) and the Sickness Benefits
Act. The data in this collection pertains to customer contacts over a
period of 8 months and UWV is looking for insights into their customers'
journeys. Data has been collected from several different sources,
namely: 1) Clickdata from the site www.werk.nl collected from visitors
that were not logged in, 2) Clickdata from the customer specific part of
the site www.werk.nl (a link is made with the customer that logged in),
3) Werkmap Message data, showing when customers contacted the UWV
through a digital channel, 4) Call data from the callcenter, showing
when customers contacted the call center by phone, and 5) Complaint data
showing when customers complained. All data is accompanied by data
fields with anonymized information about the customer as well as data
about the site visited or the contents of the call and/or complaint. The
texts in the dataset are provided in both Dutch and English where
applicable. URL's are included based on the structure of the site during
the period the data has been collected. UWV is interested in insights
on how their channels are being used, when customers move from one
contact channel to the next and why and if there are clear customer
profiles to be identified in the behavioral data. Furthermore,
recommendations are sought on how to serve customers without the need to
change the contact channel.
The data contains the following entities and their events
- Customer - customer of a Dutch public agency for handling unemployment benefits
- Office_U - user or worker involved in an activity handling a customer interaction
- Office_W - user or worker involved in an activity handling a customer interaction
- Complaint - a complaint document handed in by a customer
- ComplaintDossier - a collection of complaints by the same customer
- Session - browser-session identifier of a user browsing the website of the agency
- IP - IP address of a user browsing the website of the agency
Data Size
---------
BPIC16, nodes: 8109680, relationships: 86833139
"
authors:
- family-names: Fahland
given-names: Dirk
- family-names: Esser
given-names: Stefan
title: "Event Graph of BPI Challenge 2016"
keywords:
version: 1
identifiers:
- type: doi
value: 10.4121/14164220.v1
license: CC BY 4.0
date-released: 2021-04-22