Me
Zhiwen Shao (邵志文)
Associate Professor [at] CUMT
Learn from You
My research interests lie in the fields of Computer Vision and Deep Learning. I am honored to discuss with researchers related to these fields.

Biography

He is now an Associate Professor and a Graduate Advisor at the School of Computer Science and Technology, China University of Mining and Technology (CUMT), as well as a Postdoctoral Fellow at the Department of Computer Science and Engineering, Shanghai Jiao Tong University (SJTU) . He received the Ph.D. degree in Computer Science and Technology from SJTU in 2020, advised by Prof. Lizhuang Ma. From 2017 to 2018, he was a joint Ph.D. student at the Multimedia and Interactive Computing Lab, Nanyang Technological University (NTU), advised by Prof. Jianfei Cai. Before that, he received the B.Eng. degree in Computer Science and Technology from the Northwestern Polytechnical University (NPU) in 2015. He has been sponsored with fundings such as Young Scientists Fund of the National Natural Science Foundation of China, High-Level Talent Program for Innovation and Entrepreneurship (ShuangChuang Doctor) of Jiangsu Province, Talent Program for Deputy General Manager of Science and Technology of Jiangsu Province, and Young Scientists Fund of the Fundamental Research Funds for the Central Universities. He has published more than 30 academic papers in popular journals and conferences. He has been serving as a program committee member or a reviewer in top journals and conferences such as IEEE TPAMI, IJCV, IEEE TIP, IEEE CVPR, IJCAI, and AAAI. His research interests lie in the fields of Computer Vision and Deep Learning. The official faculty websites can be found here and here. [ Résumé ]

News

Jan. 2023: I am a recipient of Outstanding Young Teacher Program of China University of Mining and Technology.

Jan. 2023: I serve as a Publication Chair in Computer Graphics International (CGI) 2023.

Dec. 2022: Our paper Facial Action Unit Detection Using Attention and Relation Learning becomes an ESI highly cited paper.

July 2022: I am a recipient of Talent Program for Deputy General Manager of Science and Technology of Jiangsu Province.

June 2022: Our paper TextDCT: Arbitrary-Shaped Text Detection via Discrete Cosine Transform Mask is accepted by IEEE TMM.

Aug. 2021: I am a recipient of Young Scientists Fund of the National Natural Science Foundation of China.

July 2021: I am a recipient of High-Level Talent Program for Innovation and Entrepreneurship (ShuangChuang Doctor) of Jiangsu Province.

June 2021: Our paper Unconstrained Facial Action Unit Detection via Latent Feature Domain is accepted by IEEE TAFFC.

Apr. 2021: Our paper Explicit Facial Expression Transfer via Fine-Grained Representations is accepted by IEEE TIP.

Sept. 2020: I serve as a PC member in AAAI 2021 and IJCAI 2021.

Aug. 2020: Our paper JÂA-Net: Joint Facial Action Unit Detection and Face Alignment via Adaptive Attention is accepted by IJCV.

Oct. 2019: Our paper Facial Action Unit Detection Using Attention and Relation Learning is accepted by IEEE TAFFC.

July 2018: Our paper Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment is accepted by ECCV 2018.

Nov. 2017: I am now studying at Multimedia and Interactive Computing Lab of Nanyang Technological University (NTU), Singapore as a Research Assistant, advised by Prof. Jianfei Cai.

Timeline

Jan. 2023 -
Tenured Associate Professor @ CUMT
Computer Vision, Deep Learning
Dec. 2022 -
Postdoctoral Fellow @ SJTU
Fine-Grained Visual Affective Analysis
Aug. 2020 - Dec. 2022
Tenure-Track Associate Professor @ CUMT
Computer Vision, Deep Learning
Nov. 2017 - Nov. 2018
Research Assistant @ NTU
Facial Action Unit Recognition, Face Alignment
Sept. 2015 - Aug. 2020
Ph.D. candidate @ SJTU
Face Analysis, Deep Learning
Mar. 2015 - Sept. 2016
Intern @ Tencent YouTu Lab
Face Analysis, Deep Learning
Sept. 2011 - July 2015
Undergraduate student @ NPU
B.Eng. in Computer Science and Technology
Thesis Advisor: Prof. Dongmei Jiang

Sponsored Projects

2023: Outstanding Young Teacher Program of China University of Mining and Technology, Principal Investigator

2022: Participation in Computer Graphics International (CGI) 2022 Supported by K.C.Wong Education Foundation, Principal Investigator

2022: Patent License Project for Method and Device of Facial Action Unit Recognition Based on Joint Learning and Optical Flow Estimation, Principal Investigator

2022: Talent Program for Deputy General Manager of Science and Technology of Jiangsu Province, Principal Investigator

2021: Young Scientists Fund of the National Natural Science Foundation of China, Principal Investigator

2021: High-Level Talent Program for Innovation and Entrepreneurship (ShuangChuang Doctor) of Jiangsu Province, Principal Investigator

2021: Young Scientists Fund of the Fundamental Research Funds for the Central Universities, Principal Investigator

2020: Start-Up Grant of China University of Mining and Technology, Principal Investigator

Selected Publications

IterativePFN: True Iterative Point Cloud Filtering
Dasith de Silva Edirimuni , Xuequan Lu , Zhiwen Shao , Gang Li , Antonio Robles-Kelly , Ying He

We propose IterativePFN (iterative point cloud filtering network), which consists of multiple IterationModules that model the true iterative filtering process internally, within a single network. We train our IterativePFN network using a novel loss function that utilizes an adaptive ground truth target at each iteration to capture the relationship between intermediate filtering results during training. This ensures filtered results converge faster to the clean surfaces.

CVPR 2023 (CCF A) in Vancouver, Canada
Identity-Invariant Representation and Transformer-Style Relation for Micro-Expression Recognition
Zhiwen Shao , Feiran Li , Yong Zhou , Hao Chen , Hancheng Zhu , Rui Yao

We propose a novel MER method by identity-invariant representation learning and transformer-style relational modeling. Specifically, we propose to disentangle the identity information from the input via an adversarial training strategy. Considering the coherent relationships between AUs and MEs, we further employ AU recognition as an auxiliary task to learn AU representations with ME information captured. Moreover, we introduce a transformer to achieve MER by modeling the correlations among AUs. MER and AU recognition are jointly trained, in which the two correlated tasks can contribute to each other.

Applied Intelligence (APIN), 2023 (CCF C, SCI Q2)
[ site ]
Facial Action Unit Detection Using Attention and Relation Learning

We propose an end-to-end deep learning based attention and relation learning framework for AU detection with only AU labels, which has not been explored before. In particular, multi-scale features shared by each AU are learned firstly, and then both channel-wise and spatial attentions are adaptively learned to select and extract AU-related local features. Moreover, pixel-level relations for AUs are further captured to refine spatial attentions so as to extract more relevant local features. Without changing the network architecture, our framework can be easily extended for AU intensity estimation.

IEEE Transactions on Affective Computing (TAFFC), 2022 (CCF B, SCI Q2)
TextDCT: Arbitrary-Shaped Text Detection via Discrete Cosine Transform Mask
Yuchen Su† , Zhiwen Shao†* , Yong Zhou , Fanrong Meng , Hancheng Zhu , Bing Liu , Rui Yao

We propose a novel light-weight anchor-free text detection framework called TextDCT, which adopts the discrete cosine transform (DCT) to encode the text masks as compact vectors. Further, considering the imbalanced number of training samples among pyramid layers, we only employ a single-level head for top-down prediction. To model the multi-scale texts in a single-level head, we introduce a novel positive sampling strategy by treating the shrunk text region as positive samples, and design a feature awareness module (FAM) for spatial-awareness and scale-awareness by fusing rich contextual information and focusing on more significant features. Moreover, we propose a segmented non-maximum suppression (S-NMS) method that can filter low-quality mask regressions.

IEEE Transactions on Multimedia (TMM), 2022 (CCF B, SCI Q2)
Facial Action Unit Detection via Hybrid Relational Reasoning
Zhiwen Shao , Yong Zhou , Bing Liu , Hancheng Zhu , Wen-Liang Du , Jiaqi Zhao

We propose a novel hybrid relational reasoning (HRR) framework for AU detection. In particular, we propose to adaptively reason pixel-level correlations of each AU, under the constraint of predefined regional correlations by facial landmarks, as well as the supervision of AU detection. Moreover, we propose to adaptively reason AU-level correlations using a graph convolutional network, by considering both predefined AU relationships and learnable relationship weights. Our framework is beneficial for integrating the advantages of correlation predefinition and correlation learning.

The Visual Computer (TVC), 2022 (CCF C, SCI Q4)
[ site ]
Survey of Expression Action Unit Recognition Based on Deep Learning
Zhiwen Shao , Yong Zhou , Xin Tan , Lizhuang Ma , Bing Liu , Rui Yao

Expression action unit (AU) recognition based on deep learning is a hot topic in the fields of computer vision and affective computing. Each AU describes a facial local expression action, and the combinations of AUs can quantitatively represent any expression. Current AU recognition mainly faces three challenging factors: scarcity of labels, difficulty of feature capture, and imbalance of labels. On the basis of this, this paper categorizes the existing researches into transfer learning based, region learning based, and relation learning based methods, and comments and summarizes each category of representative methods. Finally, this paper compares and analyzes different methods, and further discusses the future research directions of AU recognition.

Acta Electronica Sinica, 2022 (CCF Chinese A)
[ site ]
Unconstrained Facial Action Unit Detection via Latent Feature Domain

We propose an end-to-end unconstrained facial AU detection framework based on domain adaptation, which transfers accurate AU labels from a constrained source domain to an unconstrained target domain by exploiting labels of AU-related facial landmarks. Specifically, we map a source image with label and a target image without label into a latent feature domain by combining source landmark-related feature with target landmark-free feature. Due to the combination of source AU-related information and target AU-free information, the latent feature domain with transferred source label can be learned by maximizing the target-domain AU detection performance. Moreover, we introduce a novel landmark adversarial loss to disentangle the landmark-free feature from the landmark-related feature by treating the adversarial learning as a multi-player minimax game.

IEEE Transactions on Affective Computing (TAFFC), 2022 (CCF B, SCI Q2)
Show, Deconfound and Tell: Image Captioning with Causal Inference
Bing Liu , Dong Wang , Xu Yang , Yong Zhou , Rui Yao , Zhiwen Shao , Jiaqi Zhao

We first use Structural Causal Models (SCMs) to show how two confounders damage the image captioning. Then we apply the backdoor adjustment to propose a novel causal inference based image captioning (CIIC) framework, which consists of an interventional object detector (IOD) and an interventional transformer decoder (ITD) to jointly confront both confounders. In the encoding stage, the IOD is able to disentangle the region-based visual features by deconfounding the visual confounder. In the decoding stage, the ITD introduces causal intervention into the transform decoder and deconfounds the visual and linguistic confounders simultaneously. Two modules collaborate with each other to eliminate the spurious correlations caused by the unobserved confounders.

CVPR 2022 (CCF A) in New Orleans, USA
[ pdf ]
Weakly Supervised Few-Shot Semantic Segmentation via Pseudo Mask Enhancement and Meta Learning
Man Zhang , Yong Zhou , Bing Liu , Jiaqi Zhao , Rui Yao , Zhiwen Shao , Hancheng Zhu

We propose a weakly supervised few-shot semantic segmentation model based on the meta learning framework, which utilizes prior knowledge and adjusts itself according to new tasks. Thereupon then, the proposed network is capable of both high efficiency and generalization ability to new tasks. In the pseudo mask generation stage, we develop a WRCAM method with the channel-spatial attention mechanism to refine the coverage size of targets in pseudo masks. In the few-shot semantic segmentation stage, the optimization based meta learning method is used to realize few-shot semantic segmentation by virtue of the refined pseudo masks.

IEEE Transactions on Multimedia (TMM), 2022 (CCF B, SCI Q2)
[ site ]
Unsupervised RGB-T Object Tracking with Attentional Multi-Modal Feature Fusion
Shenglan Li , Rui Yao , Yong Zhou , Hancheng Zhu , Bing Liu , Jiaqi Zhao , Zhiwen Shao

We propose a framework for visual tracking based on the attention mechanism fusion of multi-modal and multi-level features. This fusion method can give full play to the advantages of multi-level and multi-modal information. Specificly, we use a feature fusion module to fuse these features from different levels and different modalities at the same time. We use cycle consistency based on a correlation filter to implement unsupervised training of the model to reduce the cost of annotated data.

Multimedia Tools and Applications, 2023 (CCF C, SCI Q4)
[ site ]
GeoConv: Geodesic Guided Convolution for Facial Action Unit Recognition

We propose a novel geodesic guided convolution (GeoConv) for AU recognition by embedding 3D manifold information into 2D convolutions. Specifically, the kernel of GeoConv is weighted by our introduced geodesic weights, which are negatively correlated to geodesic distances on a coarsely reconstructed 3D morphable face model. Moreover, based on GeoConv, we further develop an end-to-end trainable framework named GeoCNN for AU recognition.

Pattern Recognition (PR), 2022 (CCF B, SCI Q2)
Explicit Facial Expression Transfer via Fine-Grained Representations
Zhiwen Shao , Hengliang Zhu , Junshu Tang , Xuequan Lu , Lizhuang Ma

We propose to explicitly transfer facial expression by directly mapping two unpaired input images to two synthesized images with swapped expressions. Specifically, considering AUs semantically describe fine-grained expression details, we propose a novel multi-class adversarial training method to disentangle input images into two types of fine-grained representations: AU-related feature and AU-free feature. Then, we can synthesize new images with preserved identities and swapped expressions by combining AU-free features with swapped AU-related features. Moreover, to obtain reliable expression transfer results of the unpaired input, we introduce a swap consistency loss to make the synthesized images and self-reconstructed images indistinguishable.

IEEE Transactions on Image Processing (TIP), 2021 (CCF A, SCI Q1)
EGGAN: Learning Latent Space for Fine-Grained Expression Manipulation

We propose an end-to-end expression-guided generative adversarial network (EGGAN), which synthesizes an image with expected expression given continuous expression label and structured latent code. In particular, an adversarial autoencoder is used to translate a source image into a structured latent space. The encoded latent code and the target expression label are input to a conditional GAN to synthesize an image with the target expression. Moreover, a perceptual loss and a multi-scale structural similarity loss are introduced to preserve facial identity and global shape during expression manipulation.

IEEE Multimedia (MM), 2021 (SCI Q2)
[ code ]
JÂA-Net: Joint Facial Action Unit Detection and Face Alignment via Adaptive Attention

We propose a novel end-to-end deep learning framework for joint AU detection and face alignment, which has not been explored before. In particular, multi-scale shared feature is learned firstly, and high-level feature of face alignment is fed into AU detection. Moreover, to extract precise local features, we propose an adaptive attention learning module to refine the attention map of each AU adaptively. Finally, the assembled local features are integrated with face alignment feature and global feature for AU detection.

International Journal of Computer Vision (IJCV), 2021 (CCF A, SCI Q1)
Deep Multi-Center Learning for Face Alignment
Zhiwen Shao , Hengliang Zhu , Xin Tan , Yangyang Hao , Lizhuang Ma

We propose a novel deep learning framework named Multi-Center Learning with multiple shape prediction layers for face alignment. In particular, each shape prediction layer emphasizes on the detection of a certain cluster of semantically relevant landmarks respectively. Challenging landmarks are focused firstly, and each cluster of landmarks is further optimized respectively. Moreover, to reduce the model complexity, we propose a model assembling method to integrate multiple shape prediction layers into one shape prediction layer.

Neurocomputing, 2020 (CCF C, SCI Q2)
SiTGRU: Single-Tunnelled Gated Recurrent Unit for Abnormality Detection

We propose a novel version of Gated Recurrent Unit (GRU), called Single-Tunnelled GRU for abnormality detection. Particularly, the Single-Tunnelled GRU discards the heavy-weighted reset gate from GRU cells that overlooks the importance of past content by only favouring current input to obtain an optimized single-gated-cell model. Moreover, we substitute the hyperbolic tangent activation in standard GRUs with sigmoid activation, as the former suffers from performance loss in deeper networks.

Information Sciences (INS), 2020 (CCF B, SCI Q2)
Fine-Grained Expression Manipulation via Structured Latent Space

We propose an end-to-end expression-guided generative adversarial network (EGGAN), which utilizes structured latent codes and continuous expression labels as input to generate images with expected expressions. Specifically, we adopt an adversarial autoencoder to map a source image into a structured latent space. Then, given the source latent code and the target expression label, we employ a conditional GAN to generate a new image with the target expression. Moreover, we introduce a perceptual loss and a multi-scale structural similarity loss to preserve identity and global shape during generation.

ICME 2020 (CCF B, oral) in London, United Kingdom
Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment

We propose a novel end-to-end deep learning framework for joint AU detection and face alignment, which has not been explored before. In particular, multi-scale shared features are learned firstly, and high-level features of face alignment are fed into AU detection. Moreover, to extract precise local features, we propose an adaptive attention learning module to refine the attention map of each AU adaptively. Finally, the assembled local features are integrated with face alignment features and global features for AU detection.

ECCV 2018 (CCF B, Tsinghua A) in Munich, Germany
Learning a Multi-Center Convolutional Network for Unconstrained Face Alignment
Zhiwen Shao , Hengliang Zhu , Yangyang Hao , Min Wang , Lizhuang Ma

We propose a novel multi-center convolutional neural network for unconstrained face alignment. To utilize structural correlations among different facial landmarks, we determine several clusters based on their spatial position. We pre-train our network to learn generic feature representations. We further fine-tune the pre-trained model to emphasize on locating a certain cluster of landmarks respectively. Fine-tuning contributes to searching an optimal solution smoothly without deviating from the pre-trained model excessively. We obtain an excellent solution by combining multiple fine-tuned models.

ICME 2017 (CCF B, oral) in Hong Kong
Face Alignment by Deep Convolutional Network with Adaptive Learning Rate
Zhiwen Shao , Shouhong Ding , Hengliang Zhu , Chengjie Wang , Lizhuang Ma

We propose a novel data augmentation strategy. And we design an innovative training algorithm with adaptive learning rate for two iterative procedures, which helps the network to search an optimal solution. Our convolutional network can learn global high-level features and directly predict the coordinates of facial landmarks.

ICASSP 2016 (CCF B, oral) in Shanghai, China

Granted Patents

2022: A Method and Device of User Personality Characteristic Prediction Based on Multi-Modal Information Fusion, The Third Inventor, ZL202111079044.4

2021: A Method and Device of Facial Action Unit Recognition Based on Joint Learning and Optical Flow Estimation, The First Inventor, ZL202110360938.4

2018: Identity Verification System V1.0 Based on Face Recognition, Software Copyright, The Second Inventor, 2018SR160441

Awards

2022: Honorable Mention for Teaching Competition at the School of Computer Science and Technology, China University of Mining and Technology

2021: Excellent Headteacher of the China University of Mining and Technology

2020: Outstanding Prize for Scientific and Technological Progress of Shanghai Municipality, 11/18

2020: One of the Top 10 Scientific Advances in the Shanghai Jiao Tong University, 8/9

2019: Super AI Leader (SAIL) TOP 30 project at World Artificial Intelligence Conference, 6/13

2016-2019: KoGuan Endeavor Scholarship, Suzhou Yucai Scholarship

2015: Outstanding Graduate of the Northwestern Polytechnical University

2012-2015: Outstanding Student of the Northwestern Polytechnical University

2012-2015: National Endeavor Scholarship, Samsung China Scholarship, Wu Yajun Scholarship

Teaching Experiences

2022: Image Processing and Computer Vision, Lecturer (Principal of Course)

2022: Practice for Python Programming, Lecturer

2021: Computational Thinking and Artificial Intelligence Foundation, Teaching Assistant

2020: Practice for Computational Thinking and Artificial Intelligence Foundation, Lecturer

2020: Technology of Cloud Computing and Big Data, Teaching Assistant

2020: Introduction to Information Science, Teaching Assistant

Academic Services

Publication Chair: Computer Graphics International (CGI) 2023

Session Chair: Computer Graphics International (CGI) 2022, Shanghai Cross-Media Intelligence and Computer Vision Forum 2019

Member in Chinese Association for Artificial Intelligence (CAAI): Professional Committee of Pattern Recognition, Professional Committee of Knowledge Engineering and Distributed Intelligence, Professional Committee of Machine Learning

Member in China Society of Image and Graphics (CSIG): Professional Committee of Machine Vision, Professional Committee of Animation and Digital Entertainment

Program Committee Member/Conference Reviewer: CVPR, IJCAI, AAAI, ACM MM, CGI, ICONIP, ICIG, NCIIP

Journal Reviewer: IEEE TPAMI, IJCV, IEEE TIP, IEEE TAFFC, IEEE TMM, Signal Processing, SPIC, IET IP, TVC, Computers & Graphics, IEEE Sensors, Journal of Electronic Imaging, Frontiers in Computer Science

Top