R Data Mining
Implement data mining techniques through practical use cases and real-world datasets
Andrea Cirillo
BIRMINGHAM - MUMBAI
R Data Mining
Copyright 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: November 2017
Production reference: 1271117
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78712-446-2
www.packtpub.com
Credits
Author Andrea Cirillo
| Copy Editors Safis Editing
Vikrant Phadkay |
Reviewers Enrico Pegoraro Doug Ortiz Radovan Kavicky Oleg Okun | Project Coordinator Nidhi Joshi |
Commissioning Editor Amey Varangaonkar | Proofreader Safis Editing |
Acquisition Editor Varsha Shetty | Indexer Tejal Daruwale Soni |
Content Development Editor Mayur Pawanikar | Graphics Tania Dutta |
Technical Editor Karan Thakkar | Production Coordinator Aparna Bhagat |
About the Author
Andrea Cirillo is currently working as an audit quantitative analyst at Intesa Sanpaolo Banking Group. He gained financial and external audit experience at Deloitte Touche Tohmatsu and internal audit experience at FNM, a listed Italian company. His main responsibilities involve the evaluation of credit risk management models and their enhancement, mainly within the field of the Basel III capital agreement. He is married to Francesca and is the father of Tommaso, Gianna, Zaccaria, and Filippo. Andrea has written and contributed to a few useful R packages such as updateR, ramazon, and paletteR, and regularly shares insightful advice and tutorials on R programming. His research and work mainly focus on the use of R in the fields of risk management and fraud detection, largely by modeling custom algorithms and developing interactive applications.
Andrea has previously authored RStudio for R Statistical Computing Cookbook for Packt Publishing.
To Cesca, Tommaso, Gianna, Zaccaria and Filippo.
About the Reviewers
Enrico Pegoraro graduated in statistics from the Italian University of Padua more than 20 years ago. He says that "he has experienced in himself the fast-growing computer science and statistics worlds". He has worked on projects involving databases, software development, programming languages, data integration, Linux, Windows, and cloud computing. He is currently working as a freelance statistician and data scientist.
Enrico has gained more than 10 years of experience with R and other statistical software training and consulting activities, with a special focus on Six Sigma, industrial statistical analysis, and corporate training courses. He is also a partner of the main company supporting the MilanoR Italian community. In this company, he works as a freelance principal data scientist, as well as teacher of statistical models and data mining with R training courses.
In his first job, Enrico collaborated with Italian medical institutions, contributing to some regional projects/publications on nosocomial infections. His main expertise is in consulting and teaching statistical modeling, data mining, data science, medical statistics, predictive models, SPC, and industrial statistics. Enrico planning to develop an Italian-language website dedicated to R (www.r-project.it).
Enrico can be contacted at pego.enrico@tiscalil.it .
I would like to thank all the people who support me and my activities, particularly my partner, Sonja, and her son, Gianluca.
Doug Ortiz is an enterprise cloud, big data, data analytics, and solutions architect who has been architecting, designing, developing, and integrating enterprise solutions throughout his career. Organizations that leverage his skillset have been able to rediscover and reuse their underutilized data via existing and emerging technologies such as Amazon Web Services, Microsoft Azure, Google Cloud, Microsoft BI Stack, Hadoop, Spark, NoSQL databases, and SharePoint along with related toolsets and technologies.
He is also the founder of Illustris, LLC and can be reached at .
Some interesting aspects of his profession are:
- Experience in integrating multiple platforms and products
- Big data, data science, R, and Python Certifications
- He helps organizations gain a deeper understanding of the value of their current investments in data and existing resources, turning them into useful sources of information
- He has improved, salvaged, and architected projects by utilizing unique and innovative techniques
- He regularly reviews books on Amazon Web Services, data science, machine learning, R, and cloud technologies
His hobbies are y oga and s cuba diving.
I would like to thank my wonderful wife, Mila, for all her help and support, as well as Maria, Nikolay, and our wonderful children.
Radovan Kavicky is the principal data scientist and president at GapData Institute, based in Bratislava, Slovakia, where he harnesses the power of data and wisdom of economics for public good. He is a macroeconomist by education, and consultant and analyst by profession (8+ years of experience in consulting for clients from the public and private sector), with strong mathematical and analytical skills. He is able to deliver top-level research and analytical work. From MATLAB, SAS, and Stata, he switched to Python, R and Tableau.
Radovan is an evangelist of open data and a member of the Slovak Economic Association (SEA), Open Budget Initiative, Open Government Partnership, and t he global Tableau #DataLeader network (2017). He is the founder of PyData Bratislava, R <- Slovakia, and the SK/CZ Tableau User Group (skczTUG). He has been a speaker at @TechSummit (Bratislava, 2017) and @PyData (Berlin, 2017).
You can follow him on Twitter at @radovankavicky, @GapDataInst or @PyDataBA. His full profile and experience are available at https://www.linkedin.com/in/radovankavicky/ and https://github.com/radovankavicky.
Next page