Contents
Guide
Pagebreaks of the print version
Introduction to Modeling Cognitive Processes
Tom Verguts
The MIT Press
Cambridge, Massachusetts
London, England
2022 The Massachusetts Institute of Technology
All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher.
The MIT Press would like to thank the anonymous peer reviewers who provided comments on drafts of this book. The generous work of academic experts is essential for establishing the authority and quality of our publications. We acknowledge with gratitude the contributions of these otherwise uncredited readers.
Library of Congress Cataloging-in-Publication Data
Names: Verguts, Tom, author.
Title: Introduction to modeling cognitive processes / Tom Verguts.
Description: Cambridge, Massachusetts : The MIT Press, 2022. | Includes bibliographical references and index.
Identifiers: LCCN 2021016752 | ISBN 9780262045360 (hardcover)
Subjects: LCSH: CognitionData processing. | CognitionComputer simulation. | Cognitive learning. | Cognitive neuroscience. | Human information processing.
Classification: LCC BF311 .V4467 2022 | DDC 153.9/3dc23
LC record available at https://lccn.loc.gov/2021016752
d_r0
Contents
List of Figures
Levels (or scales) of modeling. The spatiotemporal scale of several real-world phenomena that can be modeled are indicated on the graph.
Theodosius Dobzhansky (19001975), one of the founders of the modern synthesis theory of biology (and inventor of great quotes). Reprinted with permission from the American Philosophical Society.
The steepest descent on the function f (x) = (x 1)2.
(a) Pet detector. To avoid clutter, only two of the weights are labeled. (b) Two trajectories of a cat and dog unit for a particular (doglike) input activation pattern. (c) RT histogram.
The linearity principle in a synaptic junction.
Part of the IAM of letter perception. This figure was reproduced with permission from McClelland and Rumelhart (1981).
The Hopfield model; active units are indicated by gray shading.
Energy function with two local minima.
The diffusion model. This figure was reproduced and adapted with permission from White et al. (2011).
Fast time scale of activation (bottom) and slower time scale of learning (top).
Two orthonormal (orthogonal and length 1) input vectors.
Any pattern is attracted toward x1. This figure was reproduced and adapted with permission from www.codeproject.com.
Small perturbations of xi are attracted toward xi. This figure was reproduced and adapted with permission from www.codeproject.com.
Row 1: After training on 2 items, starting in a random activation pattern leads to one of the stored digits (7). Row 2: After training on 5 items, starting in a random activation pattern, leads to an unreadable digit.
Illustration of the context model of memory (only a few items are labeled). The activation at the context layer is xpointer3; pasta is the third item on the list given in this example. Unit shading indicates activation (with more black indicating more active). The arrow shows the weight that will be most updated after this pair of patterns is presented.
A generic two-layer model.
Plots of activation functions: (a) linear; (b) hard threshold; (c) soft threshold (or sigmoid); (d) Gaussian.
The geometry of the delta rule: (a) threshold activation function in 2D; (b) logistic activation function in 2D.
(a) Threshold illustration in a 1D function; (b) threshold illustration in a 2D function.
Word naming model. Semantic features map to lexical units, and the connections are trained with the delta rule. This figure was reproduced with permission from Oppenheim et al. (2010).
Geometric representation of three logical problems: (a) the linearly separable OR problem; (b) the linearly separable AND problem; (c) the linearly non-separable XOR problem.
Geometric intuitions for multilayer models: cats and dogs in feature space.
A generic three-layer network.
(a) An AND of linear functions (dots) is a convex set. Therefore, separating a convex set from its complement requires just three layers. (b) An OR of convex sets (dots). This construction allows for fitting very complex input-output mappings.
Global and local minima: (a) function with one (global) minimum; (b) function with several local minima.
Backpropagating errors in a four-layer network model. External feedback is injected at the final layer z, from which the prediction error z can be computed. Based on z, the prediction error y in layer y can be computed, and so on. Each transformation is roughly a multiplicative function, leading to either a vanishing gradient (if the multiplication constant is smaller than 1) or an exploding gradient (if the multiplication constant is larger than 1).
Convolutional neural networks: (a) model architecture. The leftmost layer provides the pixel input, the next layer contains feature maps, and the next layer implements subsampling. This two-step process (convolutionsubsampling) is repeated, followed by all-to-all connectivity. (b) Classification accuracy on a picture by a convolutional network. Figure 5.6a was reproduced with permission from LeCun et al. (1998). Figure 5.6b was reproduced with permission from Ren et al. (2017).
An RBF responds to just a limited part of the input space [here, centered on point (1, 0)].
Internal (hidden unit) representations after training the semantic cognition model. This figure is reproduced with permission from McClelland and Rogers (2003).
Exploring the parameter space: (a) the neural network converges for virtually every network size in the tested range; (b) different parts of the parameter space yield qualitatively different properties. Figure 6.1a was reproduced with permission from Rombouts et al. (2015). Figure 6.1b was reproduced with permission from Steegen et al. (2017).
(a) Taking the logarithm of a function does not change the optimal (here, the minimum) point of that function. (b) Log-likelihood functions for the coin-tossing example.
The four-armed bandit, here depicted as four one-armed bandits, each with its own payoff probability p i .
Contour plot for the estimation of learning rate and temperature in the four-armed bandit case. (a) 100 trials; (b) 1,000 trials; (c) 3,000 trials.
Empirical RT histograms for two subjects (NH and JF), in two conditions, for two separate responses (probability of giving each response is shown next to each distribution). Reproduced with permission from Ratcliff and Rouder (1998).
Model residuals (the mean per plot is zero by definition). (a) No dependencyeach data point is sampled from an independent normal distribution. (b) Dependencyeach data point is 0.8 times the previous one plus random noise. (c) Strong dependencyeach data point is the previous one plus random noise. Note that the y-axis scale is different for the three panels.
Two possible RL architectures: (a) two-layer model; (b) three-layer model.
The effect of gamma on performance (average reward reaped) in the four-armed bandit problem. Gamma = 0.01, 0.8, 5, and 10 in panels (a)(d), respectively.