TY - JOUR
T1 - Multi-instance semantic similarity transferring for knowledge distillation
AU - Zhao, Haoran
AU - Sun, Xin
AU - Dong, Junyu
AU - Yu, Hui
AU - Wang, Gaige
N1 - Funding Information:
This work was supported in part by the National Natural Science Foundation of China (No. 61971388 , U1706218 ) and Alexander von Humboldt Foundation .
Funding Information:
We thank supports of the National Natural Science Foundation of China (No. 61971388 , U1706218 , 61976123 , 61601427 ); Taishan Young Scholars Program of Shandong Province; and Alexander von Humboldt Foundation, Germany.
Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/11/28
Y1 - 2022/11/28
N2 - Knowledge distillation is a popular paradigm for learning portable neural networks by transferring the knowledge from a large model into a smaller one. Most existing approaches enhance the student model by utilizing the similarity information between the categories of instance level provided by the teacher model. However, these works ignore the similarity correlation between different instances that plays an important role in confidence prediction. To tackle this issue, we propose a novel method in this paper, called multi-instance semantic similarity transferring for knowledge distillation (STKD), which aims to fully utilize the similarities between categories of multiple samples. Furthermore, we propose to better capture the similarity correlation between different instances by the mixup technique, which creates virtual samples by a weighted linear interpolation. Note that, our distillation loss can fully utilize the incorrect classes similarities by the mixed labels. The proposed approach promotes the performance of student model as the virtual sample created by multiple images produces a similar probability distribution in the teacher and student networks. Experiments and ablation studies on several public classification datasets including CIFAR-10, CIFAR-100, CINIC-10 and Tiny-ImageNet verify that this light-weight method can effectively boost the performance of the compact student model. It shows that STKD has substantially outperformed the vanilla knowledge distillation and achieved superior accuracy over the state-of-the-art knowledge distillation methods.
AB - Knowledge distillation is a popular paradigm for learning portable neural networks by transferring the knowledge from a large model into a smaller one. Most existing approaches enhance the student model by utilizing the similarity information between the categories of instance level provided by the teacher model. However, these works ignore the similarity correlation between different instances that plays an important role in confidence prediction. To tackle this issue, we propose a novel method in this paper, called multi-instance semantic similarity transferring for knowledge distillation (STKD), which aims to fully utilize the similarities between categories of multiple samples. Furthermore, we propose to better capture the similarity correlation between different instances by the mixup technique, which creates virtual samples by a weighted linear interpolation. Note that, our distillation loss can fully utilize the incorrect classes similarities by the mixed labels. The proposed approach promotes the performance of student model as the virtual sample created by multiple images produces a similar probability distribution in the teacher and student networks. Experiments and ablation studies on several public classification datasets including CIFAR-10, CIFAR-100, CINIC-10 and Tiny-ImageNet verify that this light-weight method can effectively boost the performance of the compact student model. It shows that STKD has substantially outperformed the vanilla knowledge distillation and achieved superior accuracy over the state-of-the-art knowledge distillation methods.
KW - Deep neural networks
KW - Image classification
KW - Knowledge distillation
KW - Model compression
UR - http://www.scopus.com/inward/record.url?scp=85138042496&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2022.109832
DO - 10.1016/j.knosys.2022.109832
M3 - Article
AN - SCOPUS:85138042496
SN - 0950-7051
VL - 256
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 109832
ER -