Variational Bayesian Group-Level Sparsification for Knowledge Distillation

Yue Ming, Hao Fu, Yibo Jiang, Hui Yu

    Research output: Contribution to journalArticlepeer-review

    122 Downloads (Pure)

    Abstract

    Deep neural networks are capable of learning powerful representation, but often limited by heavy network architectures and high computational cost. Knowledge distillation (KD) is one of the effective ways to perform model compression and inference acceleration. But the final student models remain parameter redundancy. To tackle these issues, we propose a novel approach, called Variational Bayesian Group-level Sparsification for Knowledge Distillation (VBGS-KD), to distill a large teacher network into a small and sparse student network while preserving accuracy. We impose the sparsity-inducing prior on the groups of parameters in the student model, and introduce the variational Bayesian approximation to learn structural sparseness, which can effectively prune most part of weights. The prune threshold is learned during training without extra fine-tuning. The proposed method can learn the robust student networks that
    have achieved satisifying accuracy and compact sizes compared with the state-of-the-arts methods. We have validated our method on the MNIST and CIFAR-10 datasets, observing 90.3% sparsity with 0.19% accuracy boosting in MNIST. Extensive experiments on the CIFAR-10 dataset demonstrate the efficiency of the proposed approach.
    Original languageEnglish
    Pages (from-to)126628-126636
    JournalIEEE Access
    Volume8
    DOIs
    Publication statusPublished - 13 Jul 2020

    Fingerprint

    Dive into the research topics of 'Variational Bayesian Group-Level Sparsification for Knowledge Distillation'. Together they form a unique fingerprint.

    Cite this