Supplementary data for the paper 'Can ChatGPT pass high school exams on English language comprehension?'

DOI:10.4121/545f8ead-235a-4eb6-8f32-aebb030dbbad.v1

The DOI displayed above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future. For a link that will always point to the latest version, please use
DOI: 10.4121/545f8ead-235a-4eb6-8f32-aebb030dbbad

Datacite citation style

de Winter, Joost (2023): Supplementary data for the paper 'Can ChatGPT pass high school exams on English language comprehension?'. Version 1. 4TU.ResearchData. dataset. https://doi.org/10.4121/545f8ead-235a-4eb6-8f32-aebb030dbbad.v1

Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite

Dataset

Usage statistics

315

views

216

downloads

Keywords

Educational assessment GPT-3.5 GPT-4 Large language model Reading comprehension

Licence

CC BY 4.0

Export as...

RefWorks BibTeX Reference Manager Endnote DataCite NLM DC CFF

by Joost de Winter

Launched in late November 2022, ChatGPT, a large language model chatbot, has garnered considerable attention. However, ongoing questions remain regarding its capabilities. In this study, ChatGPT was used to complete national high school exams in the Netherlands on the topic of English reading comprehension. In late December 2022, we submitted the exam questions through the ChatGPT web interface (GPT-3.5). According to official norms, ChatGPT achieved a mean grade of 7.3 on the Dutch scale of 1 to 10—comparable to the mean grade of all students who took the exam in the Netherlands, 6.99. However, ChatGPT occasionally required re-prompting to arrive at an explicit answer; without these nudges, the overall grade was 6.5. In March 2023, API access was made available, and a new version of ChatGPT, GPT-4, was released. We submitted the same exams to the API, and GPT-4 achieved a score of 8.3 without a need for re-prompting. Additionally, employing a bootstrapping method that incorporated randomness through ChatGPT’s ‘temperature’ parameter proved effective in self-identifying potentially incorrect answers. Finally, a re-assessment conducted with the GPT-4 model updated as of June 2023 showed no substantial change in the overall score. The present findings highlight significant opportunities but also raise concerns about the impact of ChatGPT and similar large language models on educational assessment.

History

2023-09-12 first online, published, posted

Publisher

4TU.ResearchData

Format

scripts/.m; data/.mat; logs/.txt; scores/.xlsx; figure/.pptx; other documentation/.pdf; other documentation/.docx

Associated peer-reviewed publication

Can ChatGPT pass high school exams on English language comprehension?

Organizations

Delft University of Technology, Faculty of Mechanical, Maritime and Materials Engineering, Department of Cognitive Robotics

DATA

Files (1)

15,989,825 bytesMD5:45793c85296234d07c28af72d03beb65Scripts Inputs and Outputs - ChatGPT English Language Exams.zip