Real-time 3D facial tracking via cascaded compositional learning

Jianwen Lou, Xiaoxu Cai, Junyu Dong, Hui Yu

Research output: Contribution to journalArticlepeer-review

73 Downloads (Pure)


We propose to learn a cascade of globally-optimized modular boosted ferns (GoMBF) to solve multi-modal facial motion regression for real-time 3D facial tracking from a monocular RGB camera. GoMBF is a deep composition of multiple regression models with each is a boosted ferns initially trained to predict partial motion parameters of the same modality, and then concatenated together via a global optimization step to form a singular strong boosted ferns that can effectively handle the whole regression target. It can explicitly cope with the modality variety in output variables, while manifesting increased fitting power and a faster learning speed comparing against the conventional boosted ferns. By further cascading a sequence of GoMBFs (GoMBF-Cascade) to regress facial motion parameters, we achieve competitive tracking performance on a variety of in-the-wild videos comparing to the state-of-the-art methods which either have higher computational complexity or require much more training data. It provides a robust and highly elegant solution to real-time 3D facial tracking using a small set of training data and hence makes it more practical in real-world applications.
We further deeply investigate the effect of synthesized facial images on training non-deep learning methods such as GoMBF-Cascade for 3D facial tracking. We apply three types synthetic images with various naturalness levels for training two different tracking methods, and compare the performance of the tracking models trained on real data, on synthetic data and on a mixture of data. The experimental results indicate that, i) the model trained purely on synthetic facial imageries can hardly generalize well to unconstrained real-world data, ii) involving synthetic faces into training benefits tracking in some certain scenarios but degrades the tracking model’s generalization ability. These two insights could benefit a range of non-deep learning facial image analysis tasks where the labelled real data is difficult to acquire.
Original languageEnglish
Pages (from-to)3844-3857
Number of pages14
JournalIEEE Transactions on Image Processing
Publication statusPublished - 18 Mar 2021


  • UKRI
  • EP/N025849/1
  • 3D facial tracking
  • compositional learning
  • boosted ferns
  • synthetic training imagery


Dive into the research topics of 'Real-time 3D facial tracking via cascaded compositional learning'. Together they form a unique fingerprint.

Cite this