Dataset and Analyses for Extracting Schemas from Thought Records using Natural Language Processing

doi: 10.4121/16685347.v1
The doi above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future. For a link that will always point to the latest version, please use
doi: 10.4121/16685347
Datacite citation style:
Franziska Burger (2021): Dataset and Analyses for Extracting Schemas from Thought Records using Natural Language Processing. Version 1. 4TU.ResearchData. dataset.
Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite
This dataset contains all data and analysis scripts pertaining to the research conducted for the PLOSOne paper: “Natural language processing for cognitive therapy: extracting schemas from thought records.” The cognitive approach to psychotherapy aims to change patients' maladaptive schemas, that is, overly negative views on themselves, the world, or the future. To obtain awareness of these views, they record their thought processes in situations that caused pathogenic emotional responses. To date, the schemas underlying such thought records have been largely manually identified. Using recent advances in natural language processing, we take this one step further by automatically extracting schemas from thought records. We used the Amazon Mechanical Turk crowd sourcing platform to collect a set of 1600 thought records. In total, these thought records contain 5747 thoughts of various depth levels, with the automatic thought constituting the most shallow level and the core belief the deepest level.
We here deliver:

1. a natural language dataset: the thoughts delineated by participants in the scenario-based and open thought records
2. reliability analyses: all thoughts were labeled with respect to the degree to which they reflect a set of 9 possible schemas by the first author. An independent second coder also labeled a sample of the thoughts.
3. analyses to determine whether automatic identification of thoughts is possible.
4. additional materials (scenarios, instruction videos, qualtrics survey, osf preregistration form) that could assist in the replication of the study.
  • 2021-09-29 first online, published, posted
csv, docx, html, ipynb, pdf, png, rnw
  • 4TU research centre for Humans and Technology
TU Delft, Faculty of Electrical Engineering, Mathematics and Computer Science; 4TU research centre for Humans and Technology


files (1)