Contents
Pagebreaks of the print version
Machine Learning
A Journey to Deep Learning
with Exercises and Answers
Machine Learning
A Journey to Deep Learning
with Exercises and Answers
Andreas Wichert
Luis Sa-Couto
Instituto Superior Tcnico - Universidade de Lisboa, Portugal
& INESC-ID, Portugal
Published by
World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224
USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library.
MACHINE LEARNING A JOURNEY TO DEEP LEARNING
with Exercises and Answers
Copyright 2021 by World Scientific Publishing Co. Pte. Ltd.
All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 978-981-123-405-7 (hardcover)
ISBN 978-981-123-406-4 (ebook for institutions)
ISBN 978-981-123-407-1 (ebook for individuals)
For any available supplementary material, please visit
https://www.worldscientific.com/worldscibooks/10.1142/12201#t=suppl
Printed in Singapore
Andreas:
To the memory of my father Andrzej Wichert
Lus:
In loving memory of Titi
Preface
Deep learning achieved tremendous results, and it is now common to identify artificial intelligence with deep learning and not with symbol manipulating systems. This results from the paradox of artificial intelligence, a discipline whose principal purpose is its own definition since the terms intelligence and intelligent human behavior are not very well defined and understood.
This book tells a story outgoing from a perceptron to deep learning highlighted with concrete examples. It discusses some core ideas for the development and implementation of machine learning from three different perspectives: the statistical perspective, the artificial neural network perspective and the deep learning methodology. The book represents a solid foundation in machine learning and should prepare the reader to apply and understand machine learning algorithms as well as to invent new machine learning methods.
The notes on which the book is based evolved in the course Machine Learning in the years 20182021 at Department of Computer Science and Engineering, Instituto Superior Tcnico, University of Lisbon. Our research benefited from discussions with Ana Paiva, Manuel Lopes, Eugnio Ribeiro, Joo Rico, Rui Henriques, Claudia Antunes, Diogo Ferrreira, Mikolas Janota and Luisa Coheur.
Most of the practical exercises were developed by Lus.
We would like to thank Senior Editor Steven Patt at World Scientific for his support.
Finally, we would like to thank our families, without their encouragement the book would never have been finished.
Andreas Wichert and Lus S-Couto
Contents
Chapter 1
Introduction
1.1What is Machine Learning
It is difficult to define learning overall. There are some parallels between human learning and machine learning. During learning, humans attempt to gain some knowledge to adjust behavioral tendencies by experience.
Many of the techniques are derived from the efforts of psychologists and biologists to make sense of human learning through computational models [Anderson (1995)]. In this book, we cover statistical machine learning, such as linear regression, clustering, kernel machines and artificial neural networks. We will not cover symbolical machine learning, which was popular between 1970-1990. Symbolical machine learning includes inductive learning, knowledge learning and analogical learning [Winston (1992)].
To understand the difference between both approaches, we provide an example of symbolical machine learning in the next section, followed by examples of statistical machine learning. Both approaches differ mainly in the method with which the information is represented, either by symbols or vectors.
1.1.1Symbolical Learning
Symbols are constructs of the human mind to simplify the process of problem solving. Symbols are used to denote or refer to something other than them, namely other things in the world (according to the pioneering work of Tarski [Tarski (1956)]). They are defined by their occurrence in a structure and by a formal language, which manipulates these structures [Simon (1991); Newell (1990)]. In this context, it is not possible to measure a meaningful similarity between symbols, only between the real world objects that they represent.
In symbolic concept acquisition, the system learns a symbolic representation by analyzing positive and negative examples of a concept. For example, the ARCH program learns concepts from examples represented by symbols in a structural domain of the block-world [Winston (1992)]. A scene is described by three blocks. In , we see how a symbolical learning procedure could use background knowledge to produce a unified graph representation of the concept.
.
Fig. 1.1 (a) Arch with a brick on top and (b) arch with a pyramid on top.
1.1.2Statistical Machine Learning
Another approach is to represent the objects directly. A way to this is to look into biology. In this approach, we represent a pattern that mirrors the way the biological sense organs describe the world. Since perception organs sense the world by receptors, we can create a vector where each dimension corresponds to a certain value in a receptor [Wichert (2009)].
Besides biology, we can justify the use of vectors with the idea of features. Let us imagine that we want to describe two species of fish, the sea bass and the salmon using their features (see ), [Duda et al. (2000)]. Each fish can be represented by a vector where each dimension corresponds to a feature and stores its value or presence.
Representing objects in this way, one can measure the dissimilarity between two objects by measuring the distance between the two D dimensional vectors that represent them. Concretely, one can measure this distance through the Euclidean distance function
The process of choosing the correct features to represent is called feature extraction. In our example, only two features are chosen, width and lightness. This allows us to plot each fish as point in a two-dimensional coordinate system. plots a sample of fish in feature space where each salmon is marked with a dot and each sea bass with a cross.