R Data Analysis Cookbook
Second Edition
A journey from data computation to data-driven insights
Kuntal Ganguly
BIRMINGHAM - MUMBAI
R Data Analysis Cookbook
Second Edition
Copyright 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: May 2015
Second Edition: September 2017
Production reference: 1150917
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78712-447-9
www.packtpub.com
Credits
Author Kuntal Ganguly | Copy Editor Manisha Sinha |
Reviewers Davor Lozi Daniel Alvarez Rojas | Project Coordinator Manthan Patel
|
Commissioning Editor Amey Varangaonkar | Proofreader Safis Editing |
Acquisition Editor Tushar Gupta | Indexer Tejal Daruwale Soni |
Content Development Editor Tejas Limkar | Graphics Tania Dutta |
Technical Editor Sagar Sawant | Production Coordinator Deepika Naik |
About the Author
Kuntal Ganguly is a big data analytics engineer focused on building large-scale data-driven systems using big data frameworks and machine learning. He has around 7 years of experience of building several big data and machine learning applications.
Kuntal provides solutions to AWS customers in building real-time analytics systems using AWS services and open source Hadoop ecosystem technologies such as Spark, Kafka, Storm, and Flink, along with machine learning and deep learning frameworks.
Kuntal enjoys hands-on software development, and has single-handedly conceived, architectured, developed, and deployed several large-scale distributed applications. Besides being an open source contributor, he is a machine learning and deep learning practitioner and is very passionate about building intelligent applications.
I am grateful to my mother, Chitra Ganguly, and father, Gopal Ganguly, for their love and support and for teaching me much about hard work, and even the little I have absorbed has helped me immensely throughout my life. I would also like to thank all my friends, colleagues, and mentors that I've had over the years.
You can reach Kuntal on LinkedIn at https://in.linkedin.com/in/kuntal-ganguly-59564088
I believe that data science and artificial intelligence will give us superpowers.
About the Reviewers
Davor Lozi is a senior software engineer interested in various subjects, especially computer security, algorithms, and data structures. He manages teams of 15+ engineers and is a part-time assistant professor who lectures about database systems, Java, and interoperability. You can visit his website at http://warriorkitty.com and contact him from there. He likes cats! If you want to talk about any aspect of technology or if you have funny pictures of cats, feel free to contact him.
Daniel Alvarez Rojas is currently a data scientist at Hova Health, an IT/consulting company in the health sector. With experience in statistics, marketing, and BI, Daniel h olds a BA in Business and Marketing and works in government consulting, helping health managers and directors to take data-driven decisions to solve industry challenges. He has spent years as an analyst in logistic companies, working on optimization and predictive models.
I extend my deepest gratitude to my family: my parents, Daniel and Sara, for always supporting me and my brothers; and Abdiel and Amy, for urging me to be the best example that I can be for them. To Alessandra, for all the love and wisdom. Hector, for being, more than a friend, a mentor. To Adrian, for opening the doors for a new stage.
www.PacktPub.com
For support files and downloads related to your book, please visit www.PacktPub.com . Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at service@packtpub.com for more details. At www.PacktPub.com , you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.
Why subscribe?
- Fully searchable across every book published by Packt
- Copy and paste, print, and bookmark content
- On demand and accessible via a web browser
Customer Feedback
Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.in/dp/1787124479 . If you'd like to join our team of regular reviewers, you can email us at customerreviews@packtpub.com. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!
Table of Contents
Preface
Data analytics with R has emerged as a very important topic for organizations of all kinds. R enables even those with only an intuitive grasp of the underlying concepts, without a deep mathematical background, to unleash powerful and detailed examinations of their data. This book empowers you by showing you ways to use R to generate professional analysis reports. The book also teaches you how to quickly adapt the example code for your own needs and save yourself the time needed to construct code from scratch.
What this book covers
, Acquire and Prepare the Ingredients Reading Your Data, provides the recipes to acquire, format, and cleanse data from multiple formats. Handling missing values, standardizing datasets, and transforming between numerical and categorical data are also covered.
, What's in There? Exploratory Data Analysis