Data underlying the research between ASR and MT quality of Automatic Subtitling Platforms

doi: 10.4121/7cfa296a-72b7-4460-acd4-86193b43701e.v1
The doi above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future. For a link that will always point to the latest version, please use
doi: 10.4121/7cfa296a-72b7-4460-acd4-86193b43701e
Datacite citation style:
Li, Mingming (2023): Data underlying the research between ASR and MT quality of Automatic Subtitling Platforms . Version 1. 4TU.ResearchData. dataset. https://doi.org/10.4121/7cfa296a-72b7-4460-acd4-86193b43701e.v1
Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite
Dataset

In the first experiment of ASR accuracy comparison, 1 set of speech-to-text data (hereafter Veed 0 and Iflyrec 0 ) is generated after submitting the “Qantas Safety video” on “Iflyrec” and “Veed”. The reference speech-to-text data is transcribed from Qantas’ official channel on YouTube.

In the second experiment of automatic subtitling translation comparison, 3 sets of data are collected and analyzed. The author uses the original speech-to-text data of “Iflyrec” and “Veed” to generate one set of automatic subtitling translations (hereafter Veed 1 and Iflyrec 1), and then inputs the speech-to-text data on these two platforms to generate the final automatic subtitling translation version (hereafter Veed 2 and Iflyrec 2). For the human translation reference, this paper uses the translation from a tutor affiliated with the Civil Aviation University of China. 

history
  • 2023-10-18 first online, published, posted
publisher
4TU.ResearchData
format
txt
funding
  • “Chunhui Plan” of the Ministry of Education (China) (grant code HZKY20220001) Lei Jing
organizations
Minzu University of China, School of Foreign Studies

DATA

files (8)