CHAPTER 1
INTRODUCTION TO BIG DATAANALYTICS
1. Overview
In this introductory chapter wewill discuss Need for Big data? What big data analytics is? What is the role ofData scientist?
1.1 What is Data Science?
There is much debate amongscholars and practitioners about what data science is, and what it isnt? Datascience involves analyzing and extracting knowledge from large volume of datausing automated methods. The techniques and theories can be drawn from manyfields like statistics, computer science, applied mathematics andvisualization. It can turn immeasurable amount of data in to new knowledge. Inareas of intellectual inquiry, data science offers a powerful new approach tomake discoveries. Data science affects academic and applied research in manydomains, including agriculture, the biological sciences, medical informatics,health care, social sciences and the humanities. It heavily influenceseconomics, business and finance. From the business perspective, data science isan integral part of competitive intelligence, a newly promising field that encompassesa number of activities, such as data mining and data analysis.
The term "data science"(originally used interchangeably with "datalogy") has existed forover thirty years and was used primarily as a substitute for computer scienceby Peter Naur in 1960. In 1974, Naur published Concise Survey of ComputerMethods, which freely used the term data science in its survey of the existingdata processing methods that are used in a wide range of applications. AnjulBhambhri, vice president of big data products at IBM, says, A data scientistis somebody who is probing, who can gaze at data and spot trends. It's almostlike a Renaissance individual who really wants to learn and bring change to anorganization."A practitioner of data science is known as Data scientist.The data scientist is responsible for designing and implementing processes andlayouts for complex, large-scale data sets used for modeling, data mining andresearch purposes. Data scientists are mainly part of the marketing and planningprocess to identify useful insights and derive statistical data for planning,executing and monitoring result-driven marketing strategies. A traditional dataanalyst may look only at data from a single source a CRM system, for example a data scientist will most likely explore and examine data from multiplesources. The data scientist will go through all incoming data with the goal ofdiscovering a previously concealed insight, which in turn can offer acompetitive advantage or address a vital business problem. A data scientistdoes not simply gather and report on data, but also looks at it from manyangles, determines what it means, then recommends ways to apply the data. Letssee some of the skills required by a data scientist.
Learning the application field- The data scientist must quickly learn how the data will be used in aparticular environment.
Communicating with datausers - A data scientist must possess strong skills for learning the requirementsand preferences of users. Translating back and forth between the technicalterms of computing and statistics and the vocabulary of the application domainis a significant skill.
Seeing the big picture of a multifacetedsystem - After developing an understanding of the application domain, thedata scientist must visualize how data will move around among all of the relatedsystems and people.
Knowing how data can berepresented - Data scientists must have a clear understanding about howdata can be stored and linked, as well as about "metadata" (data thatdescribes how other data are arranged).
Data conversion and analysis- When data become available for the use of decision makers, data scientistsmust know how to transform, summarize, and make inferences from the data. Asnoted above, being able to communicate the results of analyses to users is alsoa significant skill here.
Visualization andpresentation - Although numbers often have the edge in precision anddetail, a good data display (e.g., a bar chart) can often be a more efficientmeans of communicating results to data users.
Attention to quality -No matter how good a set of data may be, there is no such thing as perfectdata. Data scientists must know the boundaries of the data they work with, knowhow to measure its accuracy, and be able to make suggestions for improving thequality of the data in the future.
Right reasoning - Ifdata are essential enough to collect, they are often important enough to influencepeoples lives. Data scientists must understand important ethical issues suchas privacy, and must be able to communicate the limitations of data to try to avoidexploitation of data or analytical results.
Data scientists are curious:exploring, asking questions, doing what if analysis, questioning existingassumptions and processes. Armed with data and analytical results, a top-tierdata scientist will then communicate informed conclusions and recommendationsacross an organizations leadership structure. As an interdisciplinary subject,data science draws scientific inquiry from a wide range of academic subjectareas. Some areas of research are:
Data mining and Knowledgediscovery (KDD)
Cloud computing
Databases and informationintegration
Signal processing
Deep Learning, natural languageprocessing and information extraction
Knowledge discovery in socialand information networks
Visualization
Ranking Organizations with BigData
Data Science Automation