Event Graph of BPI Challenge 2016
doi: 10.4121/14164220.v1
The doi above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future.
For a link that will always point to the latest version, please use
doi: 10.4121/14164220
doi: 10.4121/14164220
Datacite citation style:
Fahland, Dirk; Esser, Stefan (2021): Event Graph of BPI Challenge 2016. Version 1. 4TU.ResearchData. dataset. https://doi.org/10.4121/14164220.v1
Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite
Dataset
usage stats
2300
views
1
shares
4
citations
709
downloads
licence
CC BY 4.0
Business process event data modeled as labeled property graphs
Data Format
-----------
The dataset comprises one labeled property graph in two different file formats.
#1) Neo4j .dump format
A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/
/bin/neo4j-admin.(bat|sh) load --database=graph.db --from=
The .dump was created with Neo4j v3.5.
#2) .graphml format
A .zip file containing a .graphml file of the entire graph
Data Schema
-----------
The graph is a labeled property graph over business process event data. Each graph uses the following concepts
:Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"
:Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")
:Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node
:Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations
:CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities
:DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.
:HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log
:OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph
:REL relationship - placeholder for any structural relationship between two :Entity nodes
The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552
Data Contents
-------------
neo4j-bpic16-2021-02-17 (.dump|.graphml.zip)
An integrated graph describing the raw event data of the entire BPI Challenge 2016 dataset.
The data contains the following entities and their events
- Customer - customer of a Dutch public agency for handling unemployment benefits
- Office_U - user or worker involved in an activity handling a customer interaction
- Office_W - user or worker involved in an activity handling a customer interaction
- Complaint - a complaint document handed in by a customer
- ComplaintDossier - a collection of complaints by the same customer
- Session - browser-session identifier of a user browsing the website of the agency
- IP - IP address of a user browsing the website of the agency
Data Size
---------
BPIC16, nodes: 8109680, relationships: 86833139
Data Format
-----------
The dataset comprises one labeled property graph in two different file formats.
#1) Neo4j .dump format
A neo4j (https://neo4j.com) database dump that contains the entire graph and can be imported into a fresh neo4j database instance using the following command, see also the neo4j documentation: https://neo4j.com/docs/
/bin/neo4j-admin.(bat|sh) load --database=graph.db --from=
The .dump was created with Neo4j v3.5.
#2) .graphml format
A .zip file containing a .graphml file of the entire graph
Data Schema
-----------
The graph is a labeled property graph over business process event data. Each graph uses the following concepts
:Event nodes - each event node describes a discrete event, i.e., an atomic observation described by attribute "Activity" that occurred at the given "timestamp"
:Entity nodes - each entity node describes an entity (e.g., an object or a user), it has an EntityType and an identifier (attribute "ID")
:Log nodes - describes a collection of events that were recorded together, most graphs only contain one log node
:Class nodes - each class node describes a type of observation that has been recorded, e.g., the different types of activities that can be observed, :Class nodes group events into sets of identical observations
:CORR relationships - from :Event to :Entity nodes, describes whether an event is correlated to a specific entity; an event can be correlated to multiple entities
:DF relationships - "directly-followed by" between two :Event nodes describes which event is directly-followed by which other event; both events in a :DF relationship must be correlated to the same entity node. All :DF relationships form a directed acyclic graph.
:HAS relationship - from a :Log to an :Event node, describes which events had been recorded in which event log
:OBSERVES relationship - from an :Event to a :Class node, describes to which event class an event belongs, i.e., which activity was observed in the graph
:REL relationship - placeholder for any structural relationship between two :Entity nodes
The concepts a further defined in Stefan Esser, Dirk Fahland: Multi-Dimensional Event Data in Graph Databases. CoRR abs/2005.14552 (2020) https://arxiv.org/abs/2005.14552
Data Contents
-------------
neo4j-bpic16-2021-02-17 (.dump|.graphml.zip)
An integrated graph describing the raw event data of the entire BPI Challenge 2016 dataset.
Dees, Marcus; van Dongen, B.F. (Boudewijn) (2016): BPI Challenge 2016. 4TU.ResearchData. Collection. https://doi.org/10.4121/uuid:360795c8-1dd6-4a5b-a443-185001076eab
UWV (Employee Insurance Agency) is an autonomous administrative
authority (ZBO) and is commissioned by the Ministry of Social Affairs
and Employment (SZW) to implement employee insurances and provide labour
market and data services in the Netherlands. The Dutch employee
insurances are provided for via laws such as the WW (Unemployment
Insurance Act), the WIA (Work and Income according to Labour Capacity
Act, which contains the IVA (Full Invalidity Benefit Regulations), WGA
(Return to Work (Partially Disabled) Regulations), the Wajong
(Disablement Assistance Act for Handicapped Young Persons), the WAO
(Invalidity Insurance Act), the WAZ (Self-employed Persons Disablement
Benefits Act), the Wazo (Work and Care Act) and the Sickness Benefits
Act. The data in this collection pertains to customer contacts over a
period of 8 months and UWV is looking for insights into their customers'
journeys. Data has been collected from several different sources,
namely: 1) Clickdata from the site www.werk.nl collected from visitors
that were not logged in, 2) Clickdata from the customer specific part of
the site www.werk.nl (a link is made with the customer that logged in),
3) Werkmap Message data, showing when customers contacted the UWV
through a digital channel, 4) Call data from the callcenter, showing
when customers contacted the call center by phone, and 5) Complaint data
showing when customers complained. All data is accompanied by data
fields with anonymized information about the customer as well as data
about the site visited or the contents of the call and/or complaint. The
texts in the dataset are provided in both Dutch and English where
applicable. URL's are included based on the structure of the site during
the period the data has been collected. UWV is interested in insights
on how their channels are being used, when customers move from one
contact channel to the next and why and if there are clear customer
profiles to be identified in the behavioral data. Furthermore,
recommendations are sought on how to serve customers without the need to
change the contact channel.
The data contains the following entities and their events
- Customer - customer of a Dutch public agency for handling unemployment benefits
- Office_U - user or worker involved in an activity handling a customer interaction
- Office_W - user or worker involved in an activity handling a customer interaction
- Complaint - a complaint document handed in by a customer
- ComplaintDossier - a collection of complaints by the same customer
- Session - browser-session identifier of a user browsing the website of the agency
- IP - IP address of a user browsing the website of the agency
Data Size
---------
BPIC16, nodes: 8109680, relationships: 86833139
history
- 2021-04-22 first online, published, posted
publisher
4TU.ResearchData
format
zipped graphml
Neo4j database dump (binary)
associated peer-reviewed publication
Multi-Dimensional Event Data in Graph Databases
references
organizations
TU Eindhoven, Department of Mathematics and Computer Science
DATA
files (3)
- 5,479 bytesMD5:
7a74d3c634ddc1d6ead575669b565dcf
readme_bpic16.txt - 2,534,105,083 bytesMD5:
8b0e815cd239afaab7970e5dd888106b
neo4j-bpic16-2021-02-17.dump - 895,874,363 bytesMD5:
ddff0388fa59969f3f3673647766d85a
neo4j-bpic16-2021-02-17.graphml.zip - download all files (zip)
3,429,984,925 bytes unzipped