Code underlying the publication: "Video BagNet: short temporal receptive fields increase robustness in long-term action recognition"

DOI:10.4121/dc5e2fb8-6005-40cd-9afa-ff03c57d0a23.v1

The DOI displayed above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future. For a link that will always point to the latest version, please use
DOI: 10.4121/dc5e2fb8-6005-40cd-9afa-ff03c57d0a23

Datacite citation style

Strafforello, Ombretta; Liu, Xin; van Gemert, Jan; Schutte , Klamer (2024): Code underlying the publication: "Video BagNet: short temporal receptive fields increase robustness in long-term action recognition". Version 1. 4TU.ResearchData. software. https://doi.org/10.4121/dc5e2fb8-6005-40cd-9afa-ff03c57d0a23.v1

Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite

Software

Keywords

3D-CNN Action recognition BagNet Computer vision Temporal receptive field

Licence

CC0

Export as...

RefWorks BibTeX Reference Manager Endnote DataCite NLM DC CFF

by Ombretta Strafforello, Xin Liu, Jan van Gemert

, Klamer Schutte

Previous work on long-term video action recognition relies on deep 3D-convolutional models that have a large temporal receptive field (RF). We argue that these models are not always the best choice for temporal modeling in videos. A large temporal receptive field allows the model to encode the exact sub-action order of a video, which causes a performance decrease when testing videos have a different sub-action order. In this work, we investigate whether we can improve the model robustness to the sub-action order by shrinking the temporal receptive field of action recognition models. For this, we design Video BagNet, a variant of the 3D ResNet-50 model with the temporal receptive field size limited to 1, 9, 17 or 33 frames. We analyze Video Bag-Net on synthetic and real-world video datasets and experimentally compare models with varying temporal receptive fields. We find that short receptive fields are robust to sub-action order changes, while larger temporal receptive fields are sensitive to the sub-action order. In this repository, we provide our code, including the implementation of Video Bag-Net.

History

2024-05-24 first online, published, posted

Publisher

4TU.ResearchData

Format

GitHub repository with python code

Associated peer-reviewed publication

Video BagNet: short temporal receptive fields increase robustness in long-term action recognition

Organizations

TU Delft, TU Delft, Faculty of Electrical Engineering, Mathematics and Computer Science, Department of Intelligent Systems, Computer Vision Lab
TNO, Netherlands Organisation for Applied Scientific Research, Intelligent Imaging Group

To access the source code, use the following command:

git clone https://data.4tu.nl/v3/datasets/9f4b04e3-81a5-4d03-a0da-ccb8d7d7d311.git

Or download the latest commit as a ZIP.