RecapNet: Action Proposal Generation Mimicking Human Cognitive Process

Tian Wang, Yang Chen, Zhiwei Lin, Aichun Zhu, Yong Li, Hichem Snoussi, Hui Wang

Research output: Contribution to journalArticle

Abstract

Generating action proposals in untrimmed videos is a challenging task, since video sequences usually contain lots of irrelevant contents and the duration of an action instance is arbitrary. The quality of action proposals is key to action detection performance. The previous methods mainly rely on sliding windows or anchor boxes to cover all ground-truth actions, but this is infeasible and computa- tionally inefficient. To this end, this paper proposes Recap- Net - a novel framework for generating action proposal, by mimicking the human cognitive process of understanding video content. Specifically, this RecapNet includes a resid- ual causal convolution module to build a short memory of the past events, based on which the joint probability ac- tionness density ranking mechanism is designed to retrieve the action proposals. The RecapNet can handle videos with arbitrary length and more importantly, a video sequence will need to be processed only in one single pass in order to generate all action proposals. The experiments show that, the proposed RecapNet outperforms the state-of-the- art under all metrics on the benchmark THUMOS14 and ActivityNet-1.3 datasets.
LanguageEnglish
Number of pages11
JournalIEEE Transactions on Cybernetics
Publication statusAccepted/In press - 2 Jan 2020

Fingerprint

Anchors
Convolution
Data storage equipment
Experiments

Cite this

Wang, T., Chen, Y., Lin, Z., Zhu, A., Li, Y., Snoussi, H., & Wang, H. (Accepted/In press). RecapNet: Action Proposal Generation Mimicking Human Cognitive Process. IEEE Transactions on Cybernetics.
Wang, Tian ; Chen, Yang ; Lin, Zhiwei ; Zhu, Aichun ; Li, Yong ; Snoussi, Hichem ; Wang, Hui. / RecapNet: Action Proposal Generation Mimicking Human Cognitive Process. In: IEEE Transactions on Cybernetics. 2020.
@article{890bf58367304a8b97834ab10bfe0a22,
title = "RecapNet: Action Proposal Generation Mimicking Human Cognitive Process",
abstract = "Generating action proposals in untrimmed videos is a challenging task, since video sequences usually contain lots of irrelevant contents and the duration of an action instance is arbitrary. The quality of action proposals is key to action detection performance. The previous methods mainly rely on sliding windows or anchor boxes to cover all ground-truth actions, but this is infeasible and computa- tionally inefficient. To this end, this paper proposes Recap- Net - a novel framework for generating action proposal, by mimicking the human cognitive process of understanding video content. Specifically, this RecapNet includes a resid- ual causal convolution module to build a short memory of the past events, based on which the joint probability ac- tionness density ranking mechanism is designed to retrieve the action proposals. The RecapNet can handle videos with arbitrary length and more importantly, a video sequence will need to be processed only in one single pass in order to generate all action proposals. The experiments show that, the proposed RecapNet outperforms the state-of-the- art under all metrics on the benchmark THUMOS14 and ActivityNet-1.3 datasets.",
author = "Tian Wang and Yang Chen and Zhiwei Lin and Aichun Zhu and Yong Li and Hichem Snoussi and Hui Wang",
year = "2020",
month = "1",
day = "2",
language = "English",
journal = "IEEE Transactions on Cybernetics",
issn = "2168-2267",

}

RecapNet: Action Proposal Generation Mimicking Human Cognitive Process. / Wang, Tian; Chen, Yang; Lin, Zhiwei; Zhu, Aichun; Li, Yong; Snoussi, Hichem; Wang, Hui.

In: IEEE Transactions on Cybernetics, 02.01.2020.

Research output: Contribution to journalArticle

TY - JOUR

T1 - RecapNet: Action Proposal Generation Mimicking Human Cognitive Process

AU - Wang, Tian

AU - Chen, Yang

AU - Lin, Zhiwei

AU - Zhu, Aichun

AU - Li, Yong

AU - Snoussi, Hichem

AU - Wang, Hui

PY - 2020/1/2

Y1 - 2020/1/2

N2 - Generating action proposals in untrimmed videos is a challenging task, since video sequences usually contain lots of irrelevant contents and the duration of an action instance is arbitrary. The quality of action proposals is key to action detection performance. The previous methods mainly rely on sliding windows or anchor boxes to cover all ground-truth actions, but this is infeasible and computa- tionally inefficient. To this end, this paper proposes Recap- Net - a novel framework for generating action proposal, by mimicking the human cognitive process of understanding video content. Specifically, this RecapNet includes a resid- ual causal convolution module to build a short memory of the past events, based on which the joint probability ac- tionness density ranking mechanism is designed to retrieve the action proposals. The RecapNet can handle videos with arbitrary length and more importantly, a video sequence will need to be processed only in one single pass in order to generate all action proposals. The experiments show that, the proposed RecapNet outperforms the state-of-the- art under all metrics on the benchmark THUMOS14 and ActivityNet-1.3 datasets.

AB - Generating action proposals in untrimmed videos is a challenging task, since video sequences usually contain lots of irrelevant contents and the duration of an action instance is arbitrary. The quality of action proposals is key to action detection performance. The previous methods mainly rely on sliding windows or anchor boxes to cover all ground-truth actions, but this is infeasible and computa- tionally inefficient. To this end, this paper proposes Recap- Net - a novel framework for generating action proposal, by mimicking the human cognitive process of understanding video content. Specifically, this RecapNet includes a resid- ual causal convolution module to build a short memory of the past events, based on which the joint probability ac- tionness density ranking mechanism is designed to retrieve the action proposals. The RecapNet can handle videos with arbitrary length and more importantly, a video sequence will need to be processed only in one single pass in order to generate all action proposals. The experiments show that, the proposed RecapNet outperforms the state-of-the- art under all metrics on the benchmark THUMOS14 and ActivityNet-1.3 datasets.

M3 - Article

JO - IEEE Transactions on Cybernetics

T2 - IEEE Transactions on Cybernetics

JF - IEEE Transactions on Cybernetics

SN - 2168-2267

ER -

Wang T, Chen Y, Lin Z, Zhu A, Li Y, Snoussi H et al. RecapNet: Action Proposal Generation Mimicking Human Cognitive Process. IEEE Transactions on Cybernetics. 2020 Jan 2.