TY - JOUR
T1 - Spatial enhancement and temporal constraint for weakly supervised action localization
AU - Qin, Xiaolei
AU - Ge, Yongxin
AU - Yu, Hui
AU - Chen, Feiyu
AU - Yang, Dan
N1 - Funding Information:
Manuscript received July 12, 2020; revised August 5, 2020; accepted August 12, 2020. Date of publication August 27, 2020; date of current version September 7, 2020. This work was supported in part by the Graduate Research and Innovation Foundation of Chongqing, China under Grant CYS18065, in part by the Chongqing Research Program of Basic Science & Frontier Technology under Grant cstc2018jcyjAX0410, and in part by the Fundamental Research Funds for the Central Universities under Grant 2019CDCGRJ217 and Grant 2019CDXYRJ0011. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. V. Sanchez. (Corresponding author: Yongxin Ge.) Xiaolei Qin, Yongxin Ge, Feiyu Chen, and Dan Yang are with the School of Big Data & Software Engineering, Chongqing University, Chongqing 401331, China (e-mail: [email protected]; [email protected]; [email protected]; [email protected]).
Publisher Copyright:
© 2020 IEEE.
PY - 2020/8/27
Y1 - 2020/8/27
N2 - Weakly supervised temporal action localization (WSTAL) is a practical but challenging issue in video understanding. However, most existing methods have to activate background snippets or deactivate action snippets in cases of no boundary annotations, which inevitably affects the localization of action instances. In this letter, we propose a spatial enhancement and temporal constraint (SETC) model to address this problem from three aspects. Specifically, we first propose a spatial enhancement module to enhance the discrimination of the extracted features. Then we leverage the instance sparse constraint to restrain the drastic fluctuation class activation sequence (CAS). Finally, we use the confidence connectivity enhancement to connect the snippets that are broken up by mistake. Experiments on THUMOS'14 and ActivityNet datasets validate the efficacy of SETC against existing state-of-the-artWSTAL algorithms.
AB - Weakly supervised temporal action localization (WSTAL) is a practical but challenging issue in video understanding. However, most existing methods have to activate background snippets or deactivate action snippets in cases of no boundary annotations, which inevitably affects the localization of action instances. In this letter, we propose a spatial enhancement and temporal constraint (SETC) model to address this problem from three aspects. Specifically, we first propose a spatial enhancement module to enhance the discrimination of the extracted features. Then we leverage the instance sparse constraint to restrain the drastic fluctuation class activation sequence (CAS). Finally, we use the confidence connectivity enhancement to connect the snippets that are broken up by mistake. Experiments on THUMOS'14 and ActivityNet datasets validate the efficacy of SETC against existing state-of-the-artWSTAL algorithms.
KW - Confidence connectivity enhancement
KW - Instance sparse constraint
KW - Spatial enhancement
KW - Weakly supervised temporal action localization
UR - http://www.scopus.com/inward/record.url?scp=85103073797&partnerID=8YFLogxK
U2 - 10.1109/LSP.2020.3018914
DO - 10.1109/LSP.2020.3018914
M3 - Article
AN - SCOPUS:85103073797
SN - 1070-9908
VL - 27
SP - 1520
EP - 1524
JO - IEEE Signal Processing Letters
JF - IEEE Signal Processing Letters
ER -