Data underlying the research between ASR and MT quality of Automatic Subtitling Platforms
doi: 10.4121/7cfa296a-72b7-4460-acd4-86193b43701e
In the first experiment of ASR accuracy comparison, 1 set of speech-to-text data (hereafter Veed 0 and Iflyrec 0 ) is generated after submitting the “Qantas Safety video” on “Iflyrec” and “Veed”. The reference speech-to-text data is transcribed from Qantas’ official channel on YouTube.
In the second experiment of automatic subtitling translation comparison, 3 sets of data are collected and analyzed. The author uses the original speech-to-text data of “Iflyrec” and “Veed” to generate one set of automatic subtitling translations (hereafter Veed 1 and Iflyrec 1), and then inputs the speech-to-text data on these two platforms to generate the final automatic subtitling translation version (hereafter Veed 2 and Iflyrec 2). For the human translation reference, this paper uses the translation from a tutor affiliated with the Civil Aviation University of China.
- 2023-10-18 first online, published, posted
- “Chunhui Plan” of the Ministry of Education (China) (grant code HZKY20220001) Lei Jing
DATA
- 5,113 bytesMD5:
55fd8c9b3becd390f7d1be469e57f014
00-Human-ASR.txt - 5,153 bytesMD5:
ae97a95f087797e806acef6136e3167e
00-Iflyrec-ASR.txt - 5,153 bytesMD5:
3bdcad86ef3c54ca60a72b6eca484253
00-Veed-ASR.txt - 4,730 bytesMD5:
726a41389f80d0f255abb4fd75a5c2dd
01-Iflyrec-MT-Original.txt - 5,275 bytesMD5:
398c4699374214a2d8eb1d304494e479
01-VEED-MT-Original.txt - 4,331 bytesMD5:
c4fb59193ddebdf899714000cea75236
02-Human Translation.txt - 4,544 bytesMD5:
26eb50fabf8215d4e248a227d9308069
02-Iflyrec-MT-Final.txt - 4,917 bytesMD5:
4f9578e032f180b8a0357809d9479274
02-VEED-MT-Final.txt -
download all files (zip)
39,216 bytes unzipped