Editors
Max Garzon
375 Dunn Hall, The University of Memphis, Memphis, TN, USA
Ching-Chi Yang
375 Dunn Hall, The University of Memphis, Memphis, TN, USA
Deepak Venugopal
375 Dunn Hall, The University of Memphis, Memphis, TN, USA
Nirman Kumar
375 Dunn Hall, The University of Memphis, Memphis, TN, USA
Kalidas Jana
Memphis, TN, USA
Lih-Yuan Deng
375 Dunn Hall, The University of Memphis, Memphis, TN, USA
ISBN 978-3-031-05370-2 e-ISBN 978-3-031-05371-9
https://doi.org/10.1007/978-3-031-05371-9
The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Data science is about solving problems based on observations and data collected in the real world. Problems may range from the mundane to difficult scientific questions, for example, rating movies for recommendation systems, understanding the earning power of American taxpayers, increasing revenue for a business, spam, controlling the spread of misinformation through the internet, global warming, or the expansion of our universe. Our ability to generate, gather, and store volumes of data in the order of tera- and exo-bytes daily has far outpaced our ability to derive useful information from it in many fields, with available computational resources. The overarching goal of this book is to provide a practical and fairly complete, but not encyclopedic, review of Data Science (DS) through the lens of Dimensionality Reduction (DR).
The intended audience consists of professionals and/or students in any domain science who need to solve problems to answer questions about their domain based on data. Domain science is a fairly vague term that refers to a specialized area of human knowledge characterized by specific questions about a certain aspect of reality (like what is motion in physics, what are physical objects made of in chemistry, what is life in biology, and so forth.) In addition to the well-established sciences, they include just about any area where data can be recorded and analyzed to answer questions concerning the population of individuals or objects the data is about.
Data science presents a singular approach to problem solving when compared to more established sciences. Traditional sciences are motivated by understanding our world in order to survive and thrive. That requires a degree of analysis and theorizing to understand the specific phenomena involved and to enable predictive power. By contrast, with the advent of computer science and its abstractions into the information age (as embodied by the internet and web for example), tools have been created that can be used regardless of the specific domain. Once this threshold is crossed, then it is a natural next step from mathematics and statistics to synergistically combine them with the powerful computational tools developed by computer science to create a new science that is more than the sum of the parts, hence data science.
We have strived to leave our niche hats at the door and present an intuitive, integrative and synergistic approach that captures the best of the three worlds. That is the pervasive thread that readers will discover through examples and methods throughout the book. Sections begin with intuitive examples of a problem to be solved by the (perhaps new) concepts and results being described in the section. A professional with an undergraduate degree in any science, particularly quantitative, should be able to easily follow this part. These motivating examples are then followed by precise definitions of the technical concepts and presentation of the results in general situations. These require a degree of abstraction that can be followed by re-interpreting concepts like in the original example(s). Finally, each section closes with solutions to the original problem(s) afforded by these techniques, perhaps in various ways to compare and contrast dis/advantages of the various DR techniques based on quantitative and qualitative assessments back in the real world.
We are grateful to acknowledge support for this project from various sources. First, support from the University of Memphis CoRS (Communities of Research Scholars) program (through Deborah Hernandez) to start up research projects that facilitated initial interactions that eventually led to the interdisciplinary collaboration that produced this book, among other works, as well as access to the High Performance Cluster (HPC) used in development and testing of most results herein. Second, Lih-Yuan acknowledges support from the National Science Foundation under an award for The Learner Data Institute (NSF-1934745: The opinions, findings, and results are solely the authors and do not reflect those of the funding agency.). Third, Max acknowledges faculty (Professor Luis Fernando Nino) and student support (especially Alfredo Bayuelo for TA support and help with cartoon design and rendering) at the National University of Colombia for their active feedback and project outcomes at the XI International Cathedra on Data Science for Bioinformatics in 2017, where several of the approaches, views, and some results included in this book were conceived.
Max Garzon
Ching-Chi Yang
Deepak Venugopal
Nirman Kumar
Kalidas Jana
Lih-Yuan Deng
Memphis, TN, USA
November 2021