Deep Learning in Computer Vision
Digital Imaging and Computer Vision Series
Series Editor
Rastislav Lukac
Foveon, Inc./Sigma Corporation San Jose, California, U.S.A.
Dermoscopy Image Analysis
by M. Emre Celebi, Teresa Mendona, and Jorge S. Marques
Semantic Multimedia Analysis and Processing
by Evaggelos Spyrou, Dimitris Iakovidis, and Phivos Mylonas
Microarray Image and Data Analysis: Theory and Practice
by Luis Rueda
Perceptual Digital Imaging: Methods and Applications
by Rastislav Lukac
Image Restoration: Fundamentals and Advances
by Bahadir Kursat Gunturk and Xin Li
Image Processing and Analysis with Graphs: Theory and Practice
by Olivier Lzoray and Leo Grady
Visual Cryptography and Secret Image Sharing
by Stelvio Cimato and Ching-Nung Yang
Digital Imaging for Cultural Heritage Preservation: Analysis, Restoration, and Reconstruction of Ancient Artworks
by Filippo Stanco, Sebastiano Battiato, and Giovanni Gallo
Computational Photography: Methods and Applications
by Rastislav Lukac
Super-Resolution Imaging
by Peyman Milanfar
Deep Learning in Computer Vision
Principles and Applications
Edited by
Mahmoud Hassaballah and Ali Ismail Awad
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
2020 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Printed on acid-free paper
International Standard Book Number-13: 978-1-138-54442-0 (Hardback)
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe.
Library of Congress Cataloging-in-Publication Data
Names: Hassaballah, Mahmoud, editor. | Awad, Ali Ismail, editor.
Title: Deep learning in computer vision : principles and applications / edited by M. Hassaballah and Ali Ismail Awad.
Description: First edition. | Boca Raton, FL : CRC Press/Taylor and Francis, 2020. | Series: Digital imaging and computer vision | Includes bibliographical references and index.
Identifiers: LCCN 2019057832 (print) | LCCN 2019057833 (ebook) | ISBN 9781138544420 (hardback ; acid-free paper) | ISBN 9781351003827 (ebook)
Subjects: LCSH: Computer vision. | Machine learning.
Classification: LCC TA1634 .D437 2020 (print) | LCC TA1634 (ebook) | DDC 006.3/7--dc23
LC record available at https://lccn.loc.gov/2019057832
LC ebook record available at https://lccn.loc.gov/2019057833
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Contents
Kamel Abdelouahab, Maxime Pelcat, and Franois Berry
Kaidong Li, Wenchi Ma, Usman Sajid, Yuanwei Wu, and Guanghui Wang
Khan Muhammad, Salman Khan, and Sung Wook Baik
Alaa S. Al-Waisy, Shumoos Al-Fahdawi, and Rami Qahwaji
Amin Ullah, Khan Muhammad, Tanveer Hussain, Miyoung Lee, and Sung Wook Baik
Hazem Rashed, Senthil Yogamani, Ahmad El-Sallab, Mahmoud Hassaballah, and Mohamed ElHelw
Ahmed Nassar, and Mohamed ElHelw
Javier Ruiz-del-Solar and Patricio Loncomilla
Mahmoud Khaled Abd-Ellah, Ali Ismail Awad, Ashraf A. M. Khalaf, and Hesham F. A. Hamed
Mohammed A. Al-masni, Mugahed A. Al-antari, and Tae-Seong Kim
Khalid M. Hosny, Mohamed A. Kassem, and Mohamed M. Foaud
Deep learning, while it has multiple definitions in the literature, can be defined as inference of model parameters for decision making in a process mimicking the understanding process in the human brain; or, in short: brain-like model identification. We can say that deep learning is a way of data inference in machine learning, and the two together are among the main tools of modern artificial intelligence. Novel technologies away from traditional academic research have fueled R&D in convolutional neural networks (CNNs); companies like Google, Microsoft, and Facebook ignited the art of data manipulation, and the term deep learning became almost synonymous with decision making.
Various CNN structures have been introduced and invoked in many computer vision-related applications, with greatest success in face recognition, autonomous driving, and text processing. The reality is: deep learning is an art, not a science. This state of affairs will remain until its developers develop the theory behind its functionality, which would lead to cracking its code and explaining why it works, and how it can be structured as a function of the information gained with data. In fact, with deep learning, there is good and bad news. The good news is that the industrynot necessarily academiahas adopted it and is pushing its envelope. The bad news is that the industry does not share its secrets. Indeed, industries are never interested in procedural and textbook-style descriptions of knowledge.
This book, Deep Learning in Computer Vision: Principles and Applications as a journey in the progress made through deep learning by academiaconfines itself to deep learning for computer vision, a domain that studies sensory information used by computers for decision making, and has had its impacts and drawbacks for nearly 60 years. Computer vision has been and continues to be a system: sensors, computer, analysis, decision making, and action. This system takes various forms and the flow of information within its components, not necessarily in tandem. The linkages between computer vision and machine learning, and between it and artificial intelligence, are very fuzzy, as is the linkage between computer vision and deep learning. Computer vision has moved forward, showing amazing progress in its short history. During the sixties and seventies, computer vision dealt mainly with capturing and interpreting optical data. In the eighties and nineties, geometric computer vision added science (geometry plus algorithms) to computer vision. During the first decade of the new millennium, modern computing contributed to the evolution of object modeling using multimodality and multiple imaging. By the end of that decade, a lot of data became available, and so the term deep learning crept into computer vision, as it did into machine learning, artificial intelligence, and other domains.