Doing Bayesian Data Analysis
A Tutorial with R, JAGS, and Stan
Second Edition
John K. Kruschke
Dept. of Psychological and Brain Sciences, Indiana University, Bloomington
Copyright
Academic Press is an imprint of Elsevier
32 Jamestown Road, London NWI 7BY, UK
525 B Street, Suite 1800, San Diego, CA 92101-4495, USA
225 Wyman Street, Waltham, MA 02451, USA
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK
Copyright 2015, 2011 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publishers permissions policies and our arrangement with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
ISBN: 978-0-12-405888-0
Library of Congress Cataloging-in-Publication Data
Kruschke, John K.
Doing Bayesian data analysis : a tutorial with R, JAGS, and Stan / John K. Kruschke. 2E [edition].
pages cm
Includes bibliographical references.
ISBN 978-0-12-405888-0
1. Bayesian statistical decision theory. 2. R (Computer program language) I. Title.
QA279.5.K79 2014
519.542dc23
2014011293
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
For information on all Academic Press publications visit our website at store.elsevier.com
Dedication
Dedicated to my mother, Marilyn A. Kruschke, and to the memory of my father, Earl R. Kruschke, both of whom brilliantly exemplified and taught sound reasoning.
And, in honor of my father, who dedicated his first book to his children,
I also dedicate this book to mine:
Claire A. Kruschke and Loren D. Kruschke.
Chapter 1
What's in This Book (Read This First!)
Contents
1.2 3
1.2.2 4
1.2.3 4
1.2.4 5
1.2.5 5
1.3 6
1.4 8
1.5 8
Oh honey I'm searching for love that is true ,
But driving through fog is so dang hard to do .
Please paint me a line on the road to your heart ,
I'll rev up my pick up and get a clean start .
1.1 Real people can read this book
This book explains how to actually do Bayesian data analysis, by real people (like you), for realistic data (like yours). The book starts at the basics, with elementary notions of probability and programming. You do not need to already know statistics and programming. The book progresses to advanced hierarchical models that are used in realistic data analysis. This book is speaking to a person such as a first-year graduate student or advanced undergraduate in the social or biological sciences: Someone who grew up in Lake Wobegon, but who is not the mythical being that has the previous training of a nuclear physicist and then decided to learn about Bayesian statistics. (After the publication of the first edition, one of those mythical beings contacted me about the book! So, even if you do have the previous training of a nuclear physicist, I hope the book speaks to you too.)
Details of prerequisites and the contents of the book are presented below. But first things first: As you may have noticed from the beginning of this chapter, the chapters commence with a stanza of elegant and insightful verse composed by a famous poet. The quatrains
If you do not find them to be all that funny,
If they leave you wanting back all of your money,
Well, honey, some waltzing's a small price to pay, for
All the good learning you'll get if you stay.
1.1.1 Prerequisites
There is no avoiding mathematics when doing data analysis. On the other hand, this book is definitely not a mathematical statistics textbook, in that it does not emphasize theorem proving or formal analyses. But I do expect that you are coming to this book with a dim knowledge of basic calculus. For example, if you understand expressions like x?dx=12x2 , you're probably good to go. Notice the previous sentence said understand the statement of the integral, not generate the statement on your own. When mathematical derivations are helpful for understanding, they will usually be presented with a thorough succession of intermediate steps, so you can actually come away feeling secure and familiar with the trip and destination, rather than just feeling car sick after being thrown blindfolded into the back seat and driven around curves at high speed.
The beginnings of your journey will go more smoothly if you have had some basic experience programming a computer, but previous programming experience is not crucial. A computer program is just a list of commands that the computer can execute. For example, if you've ever typed an equal sign in an Excel spreadsheet cell, you've written a programming command. If you've ever written a list of commands in Java, C, Python, Basic or any other computer programming language, then you're set. We will be using programming languages called R, JAGS, and Stan, which are free and thoroughly explained in this book.
1.2 What's in this book
This book has three major parts. The first part covers foundations: The basic ideas of Bayesian reasoning, models, probabilities, and programming in R.
The second part covers all the crucial ideas of modern Bayesian data analysis while using the simplest possible type of data, namely dichotomous data such as agree/disagree, remember/forget, male/female, etc. Because the data are so simplistic, the focus can be on Bayesian techniques. In particular, the modern techniques of Markov chain Monte Carlo (MCMC) are explained thoroughly and intuitively. Because the data are kept simple in this part of the book, intuitions about the meaning of hierarchical models can be developed in glorious graphic detail. This second part of the book also explores methods for planning how much data will be needed to achieve a desired degree of precision in the conclusions, broadly known as power analysis.