1. Introductory Information:
o  Title of the dataset: Complexity Origins Scoping Review
Authors: G.A. Garza Morales, K. Nizamis, G.M. Bonnema
Systems Engineering and Multidisciplinary Design Group, Design, Production, and Management Department, Faculty of Engineering Technology, University of Twente
Corresponding author: K. Nizamis

Contact Information:
k.nizamis@tudelft.nl
University of Twente - Faculty of Engineering Technology
Horst Complex building, number 20
PO Box 217
7500 AE Enschede
The Netherlands

*** General introduction ***
This dataset contains the data from the identification, screening, and selection of the papers for a Systematic Scoping Review into Complexity Origins in the context of Systems Engineering and Engineering Design. It also contains the coding scheme and the code-document analysis reports in detail.
o  Content: 3 files:
	a) Paper analysis_ComplexityOrigins_Final.xlsx --> Records of the paper identification, screening and selection process. 
	b) Code-Document Report and Coding Scheme Complexity Origins.xlsx ---> Code-Document report details the 72 documents and the Final code groups assigned to each one in binary representation. This information was used to quantify the identified complexity taxonomy. Description of the individual codes can be found in the main publication: Why is there complexity in engineering? A scoping review on complexity origins (10.1109/SysCon53073.2023.10131068) and is included in detail in the sheet called: "info" 
	The following tabs contain all the quotations per code group and the respective information of the document of origin. Each of these quotes constitute the binarized results shown in the Code-Document tab.
	c) ComplexityOrigins_ScopingReview.qdpx ---> Source file obtained from Atlas.TI QDA (qualitative data analysis) software. This file has all the documents and coding used for our review. allows to move entire projects from one software to another. The standard exchange file is XML based and therefore not just allows specialised QDA software to open it but allows any software that is able to process XML to access the data.
	d) PRISMA-ScR-Checklist_ComplexityOrigins.pdf ---> The checklist contains 20 essential reporting items and 2 optional items to include when completing a scoping review. For more information about this document please check: https://www.prisma-statement.org/scoping

2. Methodological Information:
	a) Review type selection. We used a scoping review based on the PRISMA extension for scoping reviews (PRISMA-ScR)1. As directed by the PRISMA-ScR checklist, we followed the scoping review approach by Arksey and O’Malley.
	b) Objectives and Research Questions Definition. Our work aimed to map the diversified origins of complexity from the existing literature in systems engineering and engineering design.
	c) Database selection. To enhance comprehensiveness, we selected four wellknown engineering databases: Web of Science, Scopus, Springer Link, and IEEE Explore. We applied the query to title, abstract, and keywords when possible. 
	d) Identifying relevant studies and study selection. We defined several terms related to the concept of origins of complexity (including plural and order variations):
	* Driver
	* Source
	* Factor
	* Cause
	* Challenge
	* Type
	* Root cause
	* Influence
	* Source
	Along with our fields of interest, which are systems engineering and engineering design.
	To increase the fit of our results, we devised inclusion criteria. These criteria were applied in three steps: Before screening (by automation tools, i.e., the databases’ search engines), screening by title and abstract, and by full text. 
	Applying the inclusion criteria before screening resulted in a total of 133 documents. 
	Subsequently, screening the resulting documents by title and abstract using the eligibility criteria yielded 80 documents. 
	Finally, the full text of those documents was screened. 
	The exclusions included: 1) one document belonged to the field of economy, 2) four documents did not discuss origins or classifications of complexity, 3) two documents had wrong years in the database, and 4) one document was a complete book. 
	In total, 72 documents were included.
	
	
3. Data-specific Information: 
	FILE: Paper analysis_ComplexityOrigins_Final.xlsx
	Tab:TitleAbsScreening --> Column "Screened by title and abstract" uses INCLUDE, STRONG INCLUDE, and RESERVATION INCLUDE for the papers that passed the round. The second column "Reasons" details briefly the reasoning for inclusion/exclusion. 
	Tab: FullTextAnalysis --> Columns "Discuss complexity origins?" and "Discuss complexity classification?" can be filtered to see the selected papers (it is an OR condition). Rejected papers have a red background.
	Tab: FinalSelection --> List of final 72 papers selected
	Other tabs: Operations to create graphs and quantification for the main resource paper.
	
	FILE: Code-Document Report and Coding Scheme Complexity Origins.xlsx
	All information about the coding and quantification of the data can be found in the "info" Tab.
	
	FILE: ComplexityOrigins_ScopingReview.qdpx
	The “.qdpx” file that is created for the exchange, is basically a .zip file.
	This file contains a folder “sources” with the plain documents and one file called “project.qde” which is an XML.
	The XML file contains an element <codebook> which lists all the <codes> used for annotation. 
	The element <sources> contains a list of the documents (the element <TextSource>) and the <codings> within each document.
	The codings are defined by start and end position in the linked document.
	Furthermore, elements <note> for memos and <sets> for document groups and variables are available.
	
4. Sharing and Access information.
	Public Domain Dedication (CC0)
	CC0 (Creative Commons Zero) enables scientists and other creators and owners of copyright- or database-protected content to waive those interests in their works. This means that they place their work as completely as possible in the public domain, so that others may freely build upon, enhance and reuse the works for any purposes without restriction under copyright or database law. 4TU.ResearchData has adopted CC0 as the default means for researchers to share their datasets. In many cases, it can be difficult to ascertain whether a dataset is subject to copyright law, as many types of data aren’t copyrightable in many jurisdictions. Putting a dataset in the public domain under CC0 is a way to remove any legal doubt about whether researchers can use the data in their projects. This leads to the enrichment of open datasets and further dissemination of knowledge.
	Attribution: Although CC0 doesn’t legally require users of the data to cite the source, it's best practice and good science to give proper credit to the original creator(s). Be aware that not citing the research data you’re using, could be considered plagiarism, which would compromise your reputation and the credibility of your work.