Use R!
Series Editors
Robert Gentleman
23andMe Inc., South San Francisco, USA
Kurt Hornik
Department of Finance, Accounting and Statistics, WU Wirtschaftsuniversitt Wien, Vienna, Austria
Giovanni Parmigiani
Dana-Farber Cancer Institute, Boston, USA
Use R!This series of inexpensive and focused books on R will publish shorter books aimed at practitioners. Books can discuss the use of R in a particular subject area (e.g., epidemiology, econometrics, psychometrics) or as it relates to statistical topics (e.g., missing data, longitudinal data). In most cases, books will combine LaTeX and R so that the code for figures and tables can be put on a website. Authors should assume a background as supplied by Dalgaards Introductory Statistics with R or other introductory books so that each book does not repeat basic material.
This series of inexpensive and focused books on R will publish shorter books aimed at practitioners. Books can discuss the use of R in a particular subject area (e.g., epidemiology, econometrics, psychometrics) or as it relates to statistical topics (e.g., missing data, longitudinal data). In most cases, books will combine LaTeX and R so that the code for figures and tables can be put on a website. Authors should assume a background as supplied by Dalgaards Introductory Statistics with R or other introductory books so that each book does not repeat basic material.
More information about this series at http://www.springer.com/series/6991
Alfonso Zamora Saiz , Carlos Quesada Gonzlez , Llus Hurtado Gil and Diego Mondjar Ruiz
An Introduction to Data Analysis in R
Hands-on Coding, Data Mining, Visualization and Statistics from Scratch
1st ed. 2020
Alfonso Zamora Saiz
Department of Mathematics Applied to ICT, Technical University of Madrid, Madrid, Spain
Carlos Quesada Gonzlez
Department of Applied Mathematics and Statistics, Universidad San Pablo CEU, Madrid, Spain
Llus Hurtado Gil
eDreams ODIGEO, Barcelona, Spain
Diego Mondjar Ruiz
Department of Applied Mathematics and Statistics, Universidad San Pablo CEU, Madrid, Spain
ISSN 2197-5736 e-ISSN 2197-5744
Use R!
ISBN 978-3-030-48996-0 e-ISBN 978-3-030-48997-7
https://doi.org/10.1007/978-3-030-48997-7
Mathematics Subject Classication (2010): 62-07 68N15 68N20 68P05 62-01 68-01 62-09 62J05
Use R!
Springer Nature Switzerland AG 2020
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG.
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
One decade ago, the rise of big data was already a fact and several examples of successful data-driven companies were around. Nonetheless, its true importance and repercussion were still under question by many, especially on whether the new set of tools could be used not only by huge companies but also in a variety of aspects of life that could actually have an impact in society beyond increasing profits for businesses.
Of late, the fact that big data is useful beyond large-scale analysis is taken for granted and, in combination with other advances in mathematics, robotics, automation, and communication, data analysis has consolidated as the cornerstone of every new trend in business and society. Self-driving vehicles, facial recognition, and natural speech processing are some examples of data-driven technologies that are about to change the world (if not already have).
Even though this book does not dive into the latest methods of machine learning so as to understand how exactly a car is able to drive on its own, it is obvious that in this new data-oriented environment, the demand of graduates with proficient skills in the growing field of data analysis has increased notoriously. The need of STEM professionals is clear, but students from other areas such as economy, business, social sciences, or law are equally important as the applications of data analysis affect all sectors of society. Typical STEM programs include courses syllabi with a strong core in mathematics, statistics, or even computing and programming, but it is frequently the case that other degree programs suffer from the lack of quantitative courses.
The archetypal job position in the next decade will require of the candidate to show proficiency in extracting conclusions from data through computing software, no matter the field or area of expertise. Of course, not all of them will be computer engineers or software developers in the strict sense, but they will be required to know the basics of coding to be able to develop data-oriented scripts and techniques to analyze information. This implies the need to prepare alumni from other areas to handle data in their future jobs.
As professors, we have detected an increasing interest in data analysis from our colleagues coming from areas other than mathematics. Undergraduate students also express their desire to take extra credits in data-related subjects. Moreover, from our links to industry, we have been encouraged to instruct our students in analytical skills to face the era of data change. It is time to rethink undergraduate programs, especially in social sciences, and include courses addressing these kinds of competencies.
This text is conceived from the perspective of enabling students with no knowledge of data science to start their path in statistical computer-aided analysis with a broad scope. The book is the result on the one hand of our academic experience teaching in several undergraduate courses on business management, economics, business intelligence, and other master programs for the last years and, on the other hand, from our professional views as data scientists. It features the basics of programming and a series of mathematical and statistical tools that lead the reader from scratch to understanding the analysis of a database throughout the whole process of obtainment, preparation, study, and presentation of conclusions.