Predictive Analytics
Microsoft Excel
Conrad Carlberg
800 East 96th Street,
Indianapolis, Indiana 46240
USA
Predictive Analytics: Microsoft Excel
Copyright 2013 by Pearson Education, Inc.
All rights reserved. No part of this book shall be reproduced, stored in a retrieval system, or transmitted by any means, electronic, mechanical, photocopying, recording, or otherwise, without written permission from the publisher. No patent liability is assumed with respect to the use of the information contained herein. Although every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions. Nor is any liability assumed for damages resulting from the use of the information contained herein.
ISBN-13: 978-0-7897-4941-3
ISBN-10: 0-7897-4941-6
Library of Congress Cataloging-in-Publication data is on file.
Printed in the United States of America
First Printing: July 2012
Editor-in-Chief
Greg Wiegand
Acquisitions Editor
Loretta Yates
Development Editor
Charlotte Kughen
Managing Editor
Sandra Schroeder
Senior Project Editor
Tonya Simpson
Copy Editor
Water Crest Publishing
Indexer
Tim Wright
Proofreader
Debbie Williams
Technical Editor
Bob Umlas
Publishing Coordinator
Cindy Teeters
Book Designer
Anne Jones
Compositor
Nonie Ratcliff
Trademarks
All terms mentioned in this book that are known to be trademarks or service marks have been appropriately capitalized. Que Publishing cannot attest to the accuracy of this information. Use of a term in this book should not be regarded as affecting the validity of any trademark or service mark.
Microsoft is a registered trademark of Microsoft Corporation.
Warning and Disclaimer
Every effort has been made to make this book as complete and as accurate as possible, but no warranty or fitness is implied. The information provided is on an as is basis. The author and the publisher shall have neither liability nor responsibility to any person or entity with respect to any loss or damages arising from the information contained in this book.
Bulk Sales
Que Publishing offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales. For more information, please contact
U.S. Corporate and Government Sales
1-800-382-3419
For sales outside the United States, please contact
International Sales
About the Author
Counting conservatively, this is Conrad Carlbergs eleventh book about quantitative analysis using Microsoft Excel, which he still regards with a mix of awe and exasperation. A look back at the About the Author paragraph in Carlbergs first book, published in 1995, shows that the only word that remains accurate is He. Scary.
Dedication
For Sweet Sammy and Crazy Eddie. Welcome to the club, guys.
Acknowledgments
Once again I thank Loretta Yates of Que for backing her judgment. Charlotte Kughen for her work on guiding this book through development, and Sarah Kearns for her skillful copy edit. Bob Umlas, of course, a.k.a. The Excel Trickster, for his technical edit, which kept me from veering too far off course. And Que in general, for not being Wiley.
We Want to Hear from You!
As the reader of this book, you are our most important critic and commentator. We value your opinion and want to know what were doing right, what we could do better, what areas youd like to see us publish in, and any other words of wisdom youre willing to pass our way.
As an editor-in-chief for Que Publishing, I welcome your comments. You can email or write me directly to let me know what you did or didnt like about this bookas well as what we can do to make our books better.
Please note that I cannot help you with technical problems related to the topic of this book. We do have a User Services group, however, where I will forward specific technical questions related to the book.
When you write, please be sure to include this books title and author as well as your name, email address, and phone number. I will carefully review your comments and share them with the author and editors who worked on the book.
Email:
Mail: Greg Wiegand
Editor-in-Chief
Que Publishing
800 East 96th Street
Indianapolis, IN 46240 USA
Reader Services
Visit our website and register this book at quepublishing.com/register for convenient access to any updates, downloads, or errata that might be available for this book.
Introduction
A few years ago, a new word started to show up on my personal reading lists: analytics. It threw me for a while because I couldnt quite figure out what it really meant.
In some contexts, it seemed to mean the sort of numeric analysis that for years my compatriots and I had referred to as stats or quants. Ours is a living language and neologisms are often welcome. McJob. Tebowing. Yadda yadda yadda.
Welcome or not, analytics has elbowed its way into our jargon. It does seem to connote quantitative analysis, including both descriptive and inferential statistics, with the implication that what is being analyzed is likely to be web traffic: hits, conversions, bounce rates, click paths, and so on. (That implication seems due to Googles Analytics software, which collects statistics on website traffic.)
Furthermore, there are at least two broad, identifiable branches to analytics: decision and predictive:
Decision analytics has to do with classifying (mainly) people into segments of interest to the analyst. This branch of analytics depends heavily on multivariate statistical analyses, such as cluster analysis and multidimensional scaling. Decision analytics also uses a method called logistic regression to deal with the special problems created by dependent variables that are binary or nominal, such as buys versus doesnt buy and survives versus doesnt survive.
Predictive analytics deals with forecasting, and often employs techniques that have been used for decades. Exponential smoothing (also termed exponentially weighted moving averages or EMWA) is one such technique, as is autoregression. Box-Jenkins analysis dates to the middle of the twentieth century and comprises the moving average and regression approaches to forecasting.
Of course, these two broad branches arent mutually exclusive. Theres not a clear dividing line between situations in which you would use one and not the other, although thats often the case. But you can certainly find yourself asking questions such as these:
Ive classified my current database of prospects into likely buyers and likely non-buyers, according to demographics such as age, income, ZIP Code, and education level. Can I create a credible quarterly forecast of purchase volume if I apply the same classification criteria to a data set consisting of past prospects?
Ive extracted two principal components from a set of variables that measure the weekly performance of several product lines over the past two years. How do I forecast the performance of the products for the next quarter using the principal components as the outcome measures?