Statistics for Data Science
Leverage the power of statistics for Data Analysis, Classification, Regression, Machine Learning, and Neural Networks
James D. Miller
BIRMINGHAM - MUMBAI
Statistics for Data Science
Copyright 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: November 2017
Production reference: 1151117
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78829-067-8
www.packtpub.com
Credits
Author James D. Miller | Copy Editor Tasneem Fatehi |
Reviewers James C. Mott | Project Coordinator Manthan Patel |
Commissioning Editor Veena Pagare | Proofreader Safis Editing |
Acquisition Editor Tushar Gupta | Indexer Aishwarya Gangawane |
Content Development Editor Snehal Kolte | Graphics Tania Dutta |
Technical Editor Sayli Nikalje | Production Coordinator Deepika Naik |
About the Author
James D. Miller, is an IBM certified expert, creative innovator and accomplished Director, Sr. Project Leader and Application/System Architect with +35 years of extensive applications and system design and development experience across multiple platforms and technologies. Experiences include introducing customers to new and sometimes disruptive technologies and platforms, integrating with IBM Watson Analytics, Cognos BI, TM1 and web architecture design, systems analysis, GUI design and testing, database modelling and systems analysis, design and development of OLAP, client/server, web and mainframe applications and systems utilizing: IBM Watson Analytics, IBM Cognos BI and TM1 (TM1 rules, TI, TM1Web and Planning Manager), Cognos Framework Manager, dynaSight-ArcPlan, ASP, DHTML, XML, IIS, MS Visual Basic and VBA, Visual Studio, PERL, SPLUNK, WebSuite, MS SQL Server, ORACLE, SYBASE Server, and so on.
Responsibilities have also included all aspects of Windows and SQL solution development and design including analysis; GUI (and website) design; data modelling; table, screen/form and script development; SQL (and remote stored procedures and triggers) development/testing; test preparation and management and training of programming staff. Other experience includes the development of Extract , Transform , and Load ( ETL ) infrastructure such as data transfer automation between mainframe (DB2, Lawson, Great Plains, and so on.) systems and client/server SQL server and web-based applications and integration of enterprise applications and data sources.
Mr Miller has acted as Internet Applications Development Mgr. responsible for the design, development, QA and delivery of multiple websites including online trading applications, warehouse process control and scheduling systems, administrative and control applications. Mr Miller also was responsible for the design, development and administration of a web-based financial reporting system for a 450-million-dollar organization, reporting directly to the CFO and his executive team.
He has also been responsible for managing and directing multiple resources in various management roles including project and team leader, lead developer and applications development director.
He has authored the following books published by Packt:
- Mastering Predictive Analytics with R Second Edition
- Big Data Visualization
- Learning IBM Watson Analytics
- Implementing Splunk Second Edition
- Mastering Splunk
- IBM Cognos TM1 Developer's Certification Guide
He has also authored a number of whitepapers on best practices such as Establishing a Center of Excellence and continues to post blogs on a number of relevant topics based on personal experiences and industry best practices.
He is a perpetual learner continuing to pursue experiences and certifications, currently holding the following current technical certifications:
- IBM Certified Developer Cognos TM1
- IBM Certified Analyst Cognos TM1
- IBM Certified Administrator Cognos TM1
- IBM Cognos TM1 Master 385 Certification
- IBM Certified Advanced Solution Expert Cognos TM1
- IBM OpenPages Developer Fundamentals C2020-001-ENU
- IBM Cognos 10 BI Administrator C2020-622
- IBM Cognos 10 BI Author C2090-620-ENU
- IBM Cognos BI Professional C2090-180-ENU
- IBM Cognos 10 BI Metadata Model Developer C2090-632
- IBM Certified Solution Expert - Cognos BI
Specialties: The evaluation and introduction of innovative and disruptive technologies, cloud migration, IBM Watson Analytics, big data, data visualizations, Cognos BI and TM1 application design and development, OLAP, Visual Basic, SQL Server, forecasting and planning; international application, and development, business intelligence, project development, and delivery and process improvement.
To Nanette L. Miller:
"Like a river flows surely to the sea, darling so it goes, some things are meant to be."
About the Reviewer
James Mott, Ph.D, is a senior education consultant with extensive experience in teaching statistical analysis, modeling, data mining and predictive analytics. He has over 30 years of experience using SPSS products in his own research including IBM SPSS Statistics, IBM SPSS Modeler, and IBM SPSS Amos. He has also been actively teaching these products to IBM/SPSS customers for over 30 years. In addition, he is an experienced historian with expertise in the research and teaching of 20th Century United States political history and quantitative methods. His specialties are data mining, quantitative methods, statistical analysis, teaching, and consulting.
Next page