Me
Zhiwen Shao (邵志文)
Associate Professor [at] CUMT
Learn from You
My research interests lie in the fields of Computer Vision and Deep Learning. I am honored to discuss with researchers related to these fields.

Biography

I am now a Tenure-Track Associate Professor at the School of Computer Science and Technology, China University of Mining and Technology (CUMT). I received the Ph.D. degree in Computer Science and Technology from the Shanghai Jiao Tong University (SJTU), advised by Prof. Lizhuang Ma. From 2017 to 2018, I was a joint Ph.D. student at the Multimedia and Interactive Computing Lab, Nanyang Technological University (NTU), advised by Prof. Jianfei Cai. Before that, I received the B.Eng. degree in Computer Science and Technology from the Northwestern Polytechnical University (NPU). My research interests lie in the fields of Computer Vision and Deep Learning. The official faculty website can be found here. [ Résumé ]

News

Aug. 2021: I serve as a PC member in AAAI 2022 and IJCAI 2022.

Jun. 2021: Our paper Unconstrained Facial Action Unit Detection via Latent Feature Domain is accepted by IEEE TAFFC.

Apr. 2021: Our paper Explicit Facial Expression Transfer via Fine-Grained Representations is accepted by IEEE TIP.

Sept. 2020: I serve as a PC member in AAAI 2021 and IJCAI 2021.

Aug. 2020: Our paper JÂA-Net: Joint Facial Action Unit Detection and Face Alignment via Adaptive Attention is accepted by IJCV.

Oct. 2019: Our paper Facial Action Unit Detection Using Attention and Relation Learning is accepted by IEEE TAFFC.

Jul. 2018: Our paper Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment is accepted by ECCV 2018.

Nov. 2017: I am now studying at Multimedia and Interactive Computing Lab of Nanyang Technological University (NTU), Singapore as a Research Assistant, advised by Prof. Jianfei Cai.

Timeline

Aug. 2020 -
Tenure-Track Associate Professor @ CUMT
Computer Vision, Deep Learning
:
Nov. 2017 - Nov. 2018
Research Assistant @ NTU
Facial Action Unit Recognition, Face Alignment
Sept. 2015 - Aug. 2020
Ph.D. candidate @ SJTU
Face Analysis, Deep Learning
Mar. 2015 - Sept. 2016
Intern @ Tencent YouTu Lab
Face Analysis, Deep Learning
Sept. 2011 - July 2015
Undergraduate student @ NPU
B.Eng. in Computer Science and Technology
Thesis Advisor: Prof. Dongmei Jiang

Selected Publications

GeoConv: Geodesic Guided Convolution for Facial Action Unit Recognition

We propose a novel geodesic guided convolution (GeoConv) for AU recognition by embedding 3D manifold information into 2D convolutions. Specifically, the kernel of GeoConv is weighted by our introduced geodesic weights, which are negatively correlated to geodesic distances on a coarsely reconstructed 3D morphable face model. Moreover, based on GeoConv, we further develop an end-to-end trainable framework named GeoCNN for AU recognition.

Pattern Recognition, 2021 (CCF B, SCI Q2)
Unconstrained Facial Action Unit Detection via Latent Feature Domain

We propose an end-to-end unconstrained facial AU detection framework based on domain adaptation, which transfers accurate AU labels from a constrained source domain to an unconstrained target domain by exploiting labels of AU-related facial landmarks. Specifically, we map a source image with label and a target image without label into a latent feature domain by combining source landmark-related feature with target landmark-free feature. Due to the combination of source AU-related information and target AU-free information, the latent feature domain with transferred source label can be learned by maximizing the target-domain AU detection performance. Moreover, we introduce a novel landmark adversarial loss to disentangle the landmark-free feature from the landmark-related feature by treating the adversarial learning as a multi-player minimax game.

IEEE Transactions on Affective Computing, 2021 (CCF B, SCI Q2)
Sketch-to-Photo Face Generation Based on Semantic Consistency Preserving and Similar Connected Component Refinement
Luying Li , Junshu Tang , Zhiwen Shao* , Xin Tan , Lizhuang Ma*

We propose a two-stage sketch-to-photo generative adversarial network for face generation. In the first stage, we propose a semantic loss to maintain semantic consistency. In the second stage, we define the similar connected component and propose a color refinement loss to generate fine-grained details. Moreover, we introduce a multi-scale discriminator and design a patch-level local discriminator. We also propose a texture loss to enhance the local fidelity of synthesized images.

The Visual Computer (TVC), 2021 (CCF C, SCI Q4)
Explicit Facial Expression Transfer via Fine-Grained Representations
Zhiwen Shao , Hengliang Zhu , Junshu Tang , Xuequan Lu , Lizhuang Ma

We propose to explicitly transfer facial expression by directly mapping two unpaired input images to two synthesized images with swapped expressions. Specifically, considering AUs semantically describe fine-grained expression details, we propose a novel multi-class adversarial training method to disentangle input images into two types of fine-grained representations: AU-related feature and AU-free feature. Then, we can synthesize new images with preserved identities and swapped expressions by combining AU-free features with swapped AU-related features. Moreover, to obtain reliable expression transfer results of the unpaired input, we introduce a swap consistency loss to make the synthesized images and self-reconstructed images indistinguishable.

IEEE Transactions on Image Processing (TIP), 2021 (CCF A, SCI Q1)
EGGAN: Learning Latent Space for Fine-Grained Expression Manipulation

We propose an end-to-end expression-guided generative adversarial network (EGGAN), which synthesizes an image with expected expression given continuous expression label and structured latent code. In particular, an adversarial autoencoder is used to translate a source image into a structured latent space. The encoded latent code and the target expression label are input to a conditional GAN to synthesize an image with the target expression. Moreover, a perceptual loss and a multi-scale structural similarity loss are introduced to preserve facial identity and global shape during expression manipulation.

IEEE Multimedia (MM), 2021 (SCI Q2)
[ code ]
JÂA-Net: Joint Facial Action Unit Detection and Face Alignment via Adaptive Attention

We propose a novel end-to-end deep learning framework for joint AU detection and face alignment, which has not been explored before. In particular, multi-scale shared feature is learned firstly, and high-level feature of face alignment is fed into AU detection. Moreover, to extract precise local features, we propose an adaptive attention learning module to refine the attention map of each AU adaptively. Finally, the assembled local features are integrated with face alignment feature and global feature for AU detection.

International Journal of Computer Vision (IJCV), 2021 (CCF A, SCI Q1)
CPCS: Critical Points Guided Clustering and Sampling for Point Cloud Analysis
Wei Wang , Zhiwen Shao* , Wencai Zhong , Lizhuang Ma*

We introduce the Expectation-Maximization Attention module, to find the critical subset points and cluster the other points around them. Moreover, we explore a point cloud sampling strategy to sample points based on the critical subset.

ICONIP 2020 (CCF C) in Bangkok, Thailand
"Forget" the Forget Gate: Estimating Anomalies in Videos Using Self-Contained Long Short-Term Memory Networks

We introduce a bi-gated, light LSTM cell by discarding the forget gate and introducing sigmoid activation. Specifically, the proposed LSTM architecture fully sustains content from previous hidden state thereby enabling the trained model to be robust and make context-independent decision during evaluation. Removing the forget gate results in a simplified and undemanding LSTM cell with improved performance and computational efficiency.

CGI 2020 (CCF C, oral) in Geneva, Switzerland
Deep Multi-Center Learning for Face Alignment
Zhiwen Shao , Hengliang Zhu , Xin Tan , Yangyang Hao , Lizhuang Ma

We propose a novel deep learning framework named Multi-Center Learning with multiple shape prediction layers for face alignment. In particular, each shape prediction layer emphasizes on the detection of a certain cluster of semantically relevant landmarks respectively. Challenging landmarks are focused firstly, and each cluster of landmarks is further optimized respectively. Moreover, to reduce the model complexity, we propose a model assembling method to integrate multiple shape prediction layers into one shape prediction layer.

Neurocomputing, 2020 (CCF C, SCI Q2)
SiTGRU: Single-Tunnelled Gated Recurrent Unit for Abnormality Detection

We propose a novel version of Gated Recurrent Unit (GRU), called Single-Tunnelled GRU for abnormality detection. Particularly, the Single-Tunnelled GRU discards the heavy-weighted reset gate from GRU cells that overlooks the importance of past content by only favouring current input to obtain an optimized single-gated-cell model. Moreover, we substitute the hyperbolic tangent activation in standard GRUs with sigmoid activation, as the former suffers from performance loss in deeper networks.

Information Sciences (INS), 2020 (CCF B, SCI Q2)
Fine-Grained Expression Manipulation via Structured Latent Space

We propose an end-to-end expression-guided generative adversarial network (EGGAN), which utilizes structured latent codes and continuous expression labels as input to generate images with expected expressions. Specifically, we adopt an adversarial autoencoder to map a source image into a structured latent space. Then, given the source latent code and the target expression label, we employ a conditional GAN to generate a new image with the target expression. Moreover, we introduce a perceptual loss and a multi-scale structural similarity loss to preserve identity and global shape during generation.

ICME 2020 (CCF B, oral) in London, United Kingdom
Facial Action Unit Detection Using Attention and Relation Learning

We propose an end-to-end deep learning based attention and relation learning framework for AU detection with only AU labels, which has not been explored before. In particular, multi-scale features shared by each AU are learned firstly, and then both channel-wise and spatial attentions are adaptively learned to select and extract AU-related local features. Moreover, pixel-level relations for AUs are further captured to refine spatial attentions so as to extract more relevant local features. Without changing the network architecture, our framework can be easily extended for AU intensity estimation.

IEEE Transactions on Affective Computing, 2019 (CCF B, SCI Q2)
Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment

We propose a novel end-to-end deep learning framework for joint AU detection and face alignment, which has not been explored before. In particular, multi-scale shared features are learned firstly, and high-level features of face alignment are fed into AU detection. Moreover, to extract precise local features, we propose an adaptive attention learning module to refine the attention map of each AU adaptively. Finally, the assembled local features are integrated with face alignment features and global features for AU detection.

ECCV 2018 (CCF B, Tsinghua A) in Munich, Germany
Learning a Multi-Center Convolutional Network for Unconstrained Face Alignment
Zhiwen Shao , Hengliang Zhu , Yangyang Hao , Min Wang , Lizhuang Ma

We propose a novel multi-center convolutional neural network for unconstrained face alignment. To utilize structural correlations among different facial landmarks, we determine several clusters based on their spatial position. We pre-train our network to learn generic feature representations. We further fine-tune the pre-trained model to emphasize on locating a certain cluster of landmarks respectively. Fine-tuning contributes to searching an optimal solution smoothly without deviating from the pre-trained model excessively. We obtain an excellent solution by combining multiple fine-tuned models.

ICME 2017 (CCF B, oral) in Hong Kong
Learning Deep Representation from Coarse to Fine for Face Alignment
Zhiwen Shao , Shouhong Ding , Yiru Zhao , Qinchuan Zhang , Lizhuang Ma

We propose a novel face alignment method that trains deep convolutional network from coarse to fine. It divides given landmarks into principal subset and elaborate subset. We firstly keep a large weight for principal subset to make our network primarily predict their locations while slightly take elaborate subset into account. Next the weight of principal subset is gradually decreased until two subsets have equivalent weights. This process contributes to learn a good initial model and search the optimal model smoothly to avoid missing fairly good intermediate models in subsequent procedures.

ICME 2016 (CCF B) in Seattle, USA
Face Alignment by Deep Convolutional Network with Adaptive Learning Rate
Zhiwen Shao , Shouhong Ding , Hengliang Zhu , Chengjie Wang , Lizhuang Ma

We propose a novel data augmentation strategy. And we design an innovative training algorithm with adaptive learning rate for two iterative procedures, which helps the network to search an optimal solution. Our convolutional network can learn global high-level features and directly predict the coordinates of facial landmarks.

ICASSP 2016 (CCF B, oral) in Shanghai, China
FVNet: 3D Front-View Proposal Generation for Real-Time Object Detection from Point Clouds
Jie Zhou , Xin Tan , Zhiwen Shao* , Lizhuang Ma

We propose a novel framework called FVNet for 3D front-view proposal generation and object detection from point clouds. It consists of two stages: generation of front-view proposals and estimation of 3D bounding box parameters. We first project point clouds onto a cylindrical surface to generate front-view feature maps which retains rich information. We then introduce a proposal generation network to predict 3D region proposals from the generated maps and further extrude objects of interest from the whole point cloud. Finally, we present another network to extract the point-wise features from the extruded object points and regress the final 3D bounding box parameters in the canonical coordinates.

CISP-BMEI 2019 in Huaqiao, China
Top