TY - JOUR
T1 - A review on the attention mechanism of deep learning
AU - Niu, Zhaoyang
AU - Zhong, Guoqiang
AU - Yu, Hui
N1 - Funding Information:
This work was supported by the National Key Research and Development Program of China under Grant No. 2018AAA0100400, the Joint Fund of the Equipments Pre-Research and Ministry of Education of China under Grant No. 6141A020337, the Natural Science Foundation of Shandong Province under Grant No. ZR2020MF131, and the Science and Technology Program of Qingdao under Grant No. 21-1-4-ny-19-nsh.
Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2021/9/10
Y1 - 2021/9/10
N2 - Attention has arguably become one of the most important concepts in the deep learning field. It is inspired by the biological systems of humans that tend to focus on the distinctive parts when processing large amounts of information. With the development of deep neural networks, attention mechanism has been widely used in diverse application domains. This paper aims to give an overview of the state-of-the-art attention models proposed in recent years. Toward a better general understanding of attention mechanisms, we define a unified model that is suitable for most attention structures. Each step of the attention mechanism implemented in the model is described in detail. Furthermore, we classify existing attention models according to four criteria: the softness of attention, forms of input feature, input representation, and output representation. Besides, we summarize network architectures used in conjunction with the attention mechanism and describe some typical applications of attention mechanism. Finally, we discuss the interpretability that attention brings to deep learning and present its potential future trends.
AB - Attention has arguably become one of the most important concepts in the deep learning field. It is inspired by the biological systems of humans that tend to focus on the distinctive parts when processing large amounts of information. With the development of deep neural networks, attention mechanism has been widely used in diverse application domains. This paper aims to give an overview of the state-of-the-art attention models proposed in recent years. Toward a better general understanding of attention mechanisms, we define a unified model that is suitable for most attention structures. Each step of the attention mechanism implemented in the model is described in detail. Furthermore, we classify existing attention models according to four criteria: the softness of attention, forms of input feature, input representation, and output representation. Besides, we summarize network architectures used in conjunction with the attention mechanism and describe some typical applications of attention mechanism. Finally, we discuss the interpretability that attention brings to deep learning and present its potential future trends.
KW - Attention mechanism
KW - Computer vision applications
KW - Convolutional Neural Network (CNN)
KW - Deep learning
KW - Encoder-decoder
KW - Natural language processing applications
KW - Recurrent Neural Network (RNN)
KW - Unified attention model
UR - http://www.scopus.com/inward/record.url?scp=85105870630&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2021.03.091
DO - 10.1016/j.neucom.2021.03.091
M3 - Article
AN - SCOPUS:85105870630
SN - 0925-2312
VL - 452
SP - 48
EP - 62
JO - Neurocomputing
JF - Neurocomputing
ER -