SCIENCES
Image, Field Director Laure Blanc-Fraud
Information Seeking in Images and Videos,
Subject Head Hichem Sahbi
Face Analysis Under Uncontrolled Conditions
From Face Detection to Expression Recognition
Coordinated by
Romain Belmonte
Benjamin Allaert
First published 2022 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:
ISTE Ltd
27-37 St Georges Road
London SW19 4EU
UK
www.iste.co.uk
John Wiley & Sons, Inc.
111 River Street
Hoboken, NJ 07030
USA
www.wiley.com
ISTE Ltd 2022
The rights of Romain Belmonte and Benjamin Allaert to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s), contributor(s) or editor(s) and do not necessarily reflect the views of ISTE Group.
Library of Congress Control Number: 2022941463
British Library Cataloguing-in-Publication Data
A CIP record for this book is available from the British Library
ISBN 978-1-78945-111-5
ERC code:
PE6 Computer Science and Informatics
PE6_9 Human computer interaction and interface, visualisation and natural language processing
PE6_10 Web and information systems, database systems, information retrieval and digital libraries, data fusion
PE6_11 Machine learning, statistical data processing and applications using signal processing (e.g. speech, image, video)
Preface
Romain BELMONTE1 and Benjamin ALLAERT2
1University of Lille, France
2IMT Nord Europe, Lille, France
Face analysis is essential for a large number of applications such as humancomputer interaction or multimedia (e.g. content indexing and retrieval). Although many approaches have been proposed, performance under uncontrolled conditions is still not satisfactory. The variations that may impact facial appearance (e.g. pose, expression, illumination, occlusion, motion blur) make it a difficult problem to solve.
This book is composed of two parts based on the recent PhD work of Belmonte in (P2). The focus is on an updated review of the literature. Some experiments and benchmarks are also included.
addresses the modeling of temporal information. A benchmark of various architecture is proposed to determine the best design for video-based facial landmark detection. It also helps us to understand the complementarity between spatial and temporal information as well as local and global motion. The conclusion discusses possible research perspectives.
. Research perspectives are discussed in the conclusion.
Although landmark detection is used for expression recognition, the two parts of this book are independent and can be read in either order. Each chapter is also relatively self-contained. To go further, we invite the interested readers to look at other facial analysis tasks, to move towards more practical content, and to try to implement the algorithms mentioned in this book and why not transpose them into useful applications.
We would like to thank Hichem Sahbi for inviting us to contribute to the SCIENCES project. We would like to thank our PhD supervisor Chaabane Djeraba who trusted us and who always remained very available. We also thank our co-supervisors Ioan Marius Bilasco (P1 and P2), Pierre Tirilly (P1) and Nacim Ihaddadene (P1). We are sincerely grateful for the time they spent with us, their patience and the extensive advice they gave us. Also, we would like to thank our PhD reviewers (P1: Jean-Luc Dugelay and Hichem Sahbi; P2: Monique Noirhomme-Fraiture and Jenny Benois-Pineau) and examinators (P1: Karine Zeitouni and Nicu Sebe; P2: Moncef Gabbouj), who agreed to be part of this work and have greatly contributed to its improvement before being published as a book. Many thanks to all our colleagues at the University of Lille and ISEN-Lille, who have always been very kind and supportive. Finally, we would like to thank our family and friends for their constant support.
June 2022
PART 1
Facial Landmark Detection
Introduction to Part 1
Romain BELMONTE1, Pierre TIRILLY1, Ioan Marius BILASCO1, Nacim IHADDADENE2 and Chaabane DJERABA1
1University of Lille, France
2Junia ISEN, Lille, France
I1.1. Background
Over recent years, the ubiquity of sensors (i.e. smartphones, computers, in public spaces) has led to an explosion of data, especially visual data. Each day, massive amounts of visual data are produced. Every minute, 500 hours of video and 66,000 images are uploaded on YouTube and Instagram. Hence, it is critical to develop algorithms capable of understanding visual data, which is the purpose of computer vision. One of the most popular research topics within computer vision is the understanding of human behavior, since visual data represents a considerable source of information about people and their behavior. The face provides the ability to recognize people, estimate their age, gender, and their emotional state. Hence, a large amount of research is focused on the automatic analysis of facial images, which aims to extract such information. Applications cover a wide variety of domains, for example, education, transport, entertainment, security, surveillance or medicine, just to name a few. Concrete examples are engagement recognition in e-learning environments to improve teacherstudent interaction and stimulate learning (Gupta et al. 2016), driver inattention or drowsiness detection to prevent potential accidents (Reddy et al. 2017), or mental health assessment for objective diagnosis (Zhu et al. 2017). Apart from the use of affective states, facial analysis also allows image editing, which can provide entertainment on social networks or improve the customer experience in retail (Natsume et al. 2018).
Facial landmark detection, also known as face alignment, is a fundamental task in face analysis (e.g. face recognition, facial expression recognition, 3D face reconstruction, eye gaze tracking, face frontalization). As shown in , this task defines the facial geometry, so that the structure of the face can be identified and its different components characterized. Facial landmark detection is useful for most facial analysis tasks and more broadly in human computer interaction, animation, video compression, image super resolution, indexing and retrieval, etc.
Identifying the facial structure is trivial for humans. From the perspective of an algorithm, an image is represented by an array of pixels. From these pixels, the aim is to provide enough abstraction to reach a semantic level that allows facial landmarks to be retrieved. However, it is difficult for an algorithm to have this kind of high level of understanding. To overcome this semantic gap, two solutions have been proposed (see ). The first one, the traditional approach, uses domain-specific knowledge to transform raw data into features (i.e. feature engineering). These features provide useful input for the prediction task, performed by Machine Learning (ML) algorithms. The second one, the Deep Learning (DL) approach, instead of using handcrafted features, lets the algorithm discover the features (i.e. feature learning) needed for the prediction task directly from the raw data. DL involves a series of computational layers, which generally extract low-level features such as edges and curves, up to more abstract concepts. Such models are called Deep Neural Networks (DNNs). Thanks to the increase in computational power and available data, DNNs have shown over the past few years impressive capabilities. They have completely disrupted the field of computer vision (LeCun et al. 2015). In most tasks, the community has shifted from feature engineering to DNNs architecture engineering.
Next page