Data supporting the thesis “Exploring Hybrid Intelligence for Topic Interpretation in Colorectal Cancer Research: A Comparative Study of GPT-3.5 and Human Expertise”

doi: 10.4121/a7e63b3f-18f5-4ae4-8750-255528f82178.v1
The doi above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future. For a link that will always point to the latest version, please use
doi: 10.4121/a7e63b3f-18f5-4ae4-8750-255528f82178
Datacite citation style:
Patandin, Ayush (2023): Data supporting the thesis “Exploring Hybrid Intelligence for Topic Interpretation in Colorectal Cancer Research: A Comparative Study of GPT-3.5 and Human Expertise”. Version 1. 4TU.ResearchData. dataset. https://doi.org/10.4121/a7e63b3f-18f5-4ae4-8750-255528f82178.v1
Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite
Dataset

The research objective of this thesis is to bridge the gap between human and machine intelligence in the interpretation of colorectal cancer patient experiences extracted from patient web forums. This Computer Science thesis was done in collaboration with colorectal cancer human experts from Erasmus MC. To perform this scientific research and make these human experts and GPT-3.5 interpret colorectal cancer patient experiences, nearly 300k patient web forums were scraped from the American platform called Cancer Survivors Network USA (Colorectal Cancer — Cancer Survivors Network). For extracting the patient web forums, the Selenium webdriver was used to extract the page urls for each discussion thread, and BeautifulSoup4 (bs4) was used to access the page urls and parse the html elements from each type of patient forum, including main post, comment and reply, and store them in a local dataset. The patient forum attributes stored in the dataset are: URL – username (i.e. author of the post)– userposts (i.e. number of posts written by the author)– time (i.e. when the post was made)– title – post (i.e. text consisting of unstructured colorectal cancer patient experiences)

history
  • 2023-09-04 first online, published, posted
publisher
4TU.ResearchData
format
*.csv where the columns are separated with a semi-colon (i.e. ';')
organizations
TU Delft, Faculty of Electrical Engineering, Mathematics and Computer Science, Computer Science (EEMCS/EWI)

DATA

files (1)