Volume 94
Studies in Big Data
Series Editor
Janusz Kacprzyk
Polish Academy of Sciences, Warsaw, Poland
The series Studies in Big Data (SBD) publishes new developments and advances in the various areas of Big Data- quickly and with a high quality. The intent is to cover the theory, research, development, and applications of Big Data, as embedded in the fields of engineering, computer science, physics, economics and life sciences. The books of the series refer to the analysis and understanding of large, complex, and/or distributed data sets generated from recent digital sources coming from sensors or other physical instruments as well as simulations, crowd sourcing, social networks or other internet transactions, such as emails or video click streams and other. The series contains monographs, lecture notes and edited volumes in Big Data spanning the areas of computational intelligence including neural networks, evolutionary computation, soft computing, fuzzy systems, as well as artificial intelligence, data mining, modern statistics and Operations research, as well as self-organizing systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output.
The books of this series are reviewed in a single blind peer review process.
Indexed by SCOPUS, SCIMAGO and zbMATH.
All books published in the series are submitted for consideration in Web of Science.
More information about this series at http://www.springer.com/series/11970
Parikshit Narendra Mahalle
Department of Artificial intelligence and Data Science, Vishwakarma Institute of Information Technology, Pune, India
Department of Computer Engineering, STESs Smt. Kashibai Navale College of Engineering, Pune, Maharashtra, India
Gitanjali Rahul Shinde
Department of Computer Engineering, Vishwakarma Institute of Information Technology, Pune, India
Priya Dudhale Pise
Dr. D. Y. Patil Biotechnology and Bioinformatics Institute, Pune, India
Jyoti Yogesh Deshmukh
Department of Computer Engineering, JSPMs Bhivarabai Sawant Institute of Technology and Research, Pune, India
ISSN 2197-6503 e-ISSN 2197-6511
Studies in Big Data
ISBN 978-981-16-5159-5 e-ISBN 978-981-16-5160-1
https://doi.org/10.1007/978-981-16-5160-1
The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
Senses (Sense-Organs) are purified by water; Mind is purified by truth; Soul is purified by learning and penance; while intelligence is purified by knowledge.
Sanskrit Shubhashitani
The book Foundations of Data Science for Engineering Problem Solving is envisioned to present the detailed and comprehensive overview of data science foundations including data science evolution, data collection, preparation, analysis of data using machine learning algorithm, data visualization and how data science can make better insights into various use cases in science and engineering. Since the last decade, there is much advancement in very large scale integration technology and the semiconductor industry making electronic wearable devices, and all Wi-Fi-enabled devices cheaper and tiny having functionalities of sensing, computing and communication. In addition to this, the Internet is also available at a more faster and affordable cost as compared to the past. Due to these reasons, the data generated by these devices and its posting on the cloud is increasing at a very faster rate. The data has become big in terms of volume, variety, velocity and complexity, and all information technology leaders are facing the problem of how to deal with this big data.
The book focuses on how data science can enrich the applications of the science and engineering domain for making it smarter. The main objective of this book is to help readers to understand how this evolving field of data science is going to be useful in forecasting, prediction, estimation and recommendation. The entire notion of the book is exploring foundations of data science from the basics to applications followed by case studies in science and engineering. The entire book is mainly divided into three parts. The first part of the book deals with Big data and its emergence in todays context, data science basics, its evolution and need for today, and various applications. Data collection and preparation is the main part of any data science application which includes data exploration, various types of datasets, their classification based on the sources and types, data preprocessing phases and different tasks involved; web scrapping tools like Beautiful Soap, Scrapy and URLLIB are presented and discussed in this part of the book.
The next part of the book covers the important topic of data visualization, its need, challenges, respective tools and modelling of data. Visualization tools like Tableau, Matplotlib, Looker, Seaborn, PowerBI, IBM Cognos Analytics and their functioning are also discussed in detail in this part of the book. The process of data modelling, impact of modelling on the outcomes, decision making process as well as the role of data science in engineering problem solving are also elucidated in this part of the book. The main objective of this part of the book is to focus on various emerging tools and techniques adapted in industry for data science applications to enhance business intelligence.