Code underlying the publication: "Long-term behaviour recognition in videos with actor-focused region attention"

DOI:10.4121/0dd08a4e-cab6-49e2-98e4-f00f7d3cfccb.v1

The DOI displayed above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future. For a link that will always point to the latest version, please use
DOI: 10.4121/0dd08a4e-cab6-49e2-98e4-f00f7d3cfccb

Datacite citation style

Strafforello, Ombretta; Schutte , Klamer (2024): Code underlying the publication: "Long-term behaviour recognition in videos with actor-focused region attention". Version 1. 4TU.ResearchData. dataset. https://doi.org/10.4121/0dd08a4e-cab6-49e2-98e4-f00f7d3cfccb.v1

Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite

Dataset

Usage statistics

views

downloads

Keywords

Computer vision I3D Long-term behaviour recognition Video understanding

Licence

CC0

Export as...

RefWorks BibTeX Reference Manager Endnote DataCite NLM DC CFF

by Ombretta Strafforello, Klamer Schutte

Long-Term activities involve humans performing complex, minutes-long actions. Differently than in traditional action recognition, complex activities are normally composed of a set of sub-actions, that can appear in different order, duration, and quantity. These aspects introduce a large intra-class variability, that can be hard to model. Our approach aims to adaptively capture and learn the importance of spatial and temporal video regions for minutes-long activity classification. Inspired by previous work on Region Attention, our architecture embeds the spatio-temporal features from multiple video regions into a compact fixed-length representation. These features are extracted with a 3D convolutional backbone specially fine-tuned. Additionally, driven by the prior assumption that the most discriminative locations in the videos are centered around the human that is carrying out the activity, we introduce an Actor Focus mechanism to enhance the feature extraction both in training and inference phase. Our experiments show that the Multi-Regional fine-tuned 3D-CNN, topped with Actor Focus and Region Attention, largely improves the performance of baseline 3D architectures, achieving state-of-the-art results on Breakfast, a well known long-term activity recognition benchmark. In this repository, we provide our code implementation.

History

2024-05-24 first online, published, posted

Publisher

4TU.ResearchData

Format

Zip file containing python code.

Associated peer-reviewed publication

Long-term behaviour recognition in videos with actor-focused region attention

Organizations

TNO, Netherlands Organisation for Applied Scientific Research, Intelligent Imaging Group
TU Delft, TU Delft, Faculty of Electrical Engineering, Mathematics and Computer Science, Department of Intelligent Systems, Computer Vision Lab

DATA

Files (1)

985,785,506 bytesMD5:92f91388965a9deb460795bf3f346386long-term-behavior-recognition.zip