Data and code underlying the paper: "Can we predict the Most Replayed data of video streaming platforms?"

DOI:10.4121/0ca18691-3fef-4c9c-9080-12b20daae62a.v1

The DOI displayed above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future. For a link that will always point to the latest version, please use
DOI: 10.4121/0ca18691-3fef-4c9c-9080-12b20daae62a

Datacite citation style:

Duico, Alessandro; Strafforello, Ombretta; van Gemert, Jan (2024): Data and code underlying the paper: "Can we predict the Most Replayed data of video streaming platforms?". Version 1. 4TU.ResearchData. software. https://doi.org/10.4121/0ca18691-3fef-4c9c-9080-12b20daae62a.v1

Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite

Software

Usage statistics

views

downloads

Keywords

Computer vision Most Replayed Data Video streaming Video understanding

Licence

CC0

Export as...

RefWorks BibTeX Reference Manager Endnote DataCite NLM DC CFF

by Alessandro Duico, Ombretta Strafforello, Jan van Gemert

Predicting which specific parts of a video users will replay is important for several applications, including targeted advertisement placement on video platforms and assisting video creators. In this work, we explore whether it is possible to predict the Most Replayed (MR) data from YouTube videos. To this end, we curate a large video benchmark, the YTMR500 dataset, which comprises 500 YouTube videos with MR data annotations. We evaluate Deep Learning (DL) models of varying complexity on our dataset and perform an extensive ablation study. In addition, we conduct a user study to estimate the human performance on MR data prediction. Our results show that, although by a narrow margin, all the evaluated DL models outperform random predictions. Additionally, they exceed human-level accuracy. This suggests that predicting the MR data is a difficult task that can be enhanced through the assistance of DL. In this repository, we provide our code and dataset. The code includes our trained and tested models, our user studies and results analysis. The YTMR500 dataset is provided through an H5 file.

History

2024-05-24 first online, published, posted

Publisher

4TU.ResearchData

Format

Code: python and HTML. Dataset: H5.

Associated peer-reviewed publication

Can we predict the Most Replayed data of video streaming platforms?

References

https://arxiv.org/pdf/2309.06102

Organizations

TU Delft, TU Delft, Faculty of Electrical Engineering, Mathematics and Computer Science, Department of Intelligent Systems, Computer Vision Lab
TNO, Netherlands Organisation for Applied Scientific Research, Intelligent Imaging Group

DATA

To access the source code, use the following command:

git clone https://data.4tu.nl/v3/datasets/49e15d64-83de-474b-bc86-e29ca7551898.git

Or download the latest commit as a ZIP.

Files (1)

2,737,958,544 bytesMD5:20c193f1192d28f0a2f103498064f29cYTMR_500.h5