• Complain

Julián Luengo - Big Data Preprocessing: Enabling Smart Data

Here you can read online Julián Luengo - Big Data Preprocessing: Enabling Smart Data full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2020, publisher: Springer International Publishing, genre: Politics. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Julián Luengo Big Data Preprocessing: Enabling Smart Data

Big Data Preprocessing: Enabling Smart Data: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Big Data Preprocessing: Enabling Smart Data" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

This book offers a comprehensible overview of Big Data Preprocessing, which includes a formal description of each problem. It also focuses on the most relevant proposed solutions. This book illustrates actual implementations of algorithms that helps the reader deal with these problems.
This book stresses the gap that exists between big, raw data and the requirements of quality data that businesses are demanding. This is called Smart Data, and to achieve Smart Data the preprocessing is a key step, where the imperfections, integration tasks and other processes are carried out to eliminate superfluous information. The authors present the concept of Smart Data through data preprocessing in Big Data scenarios and connect it with the emerging paradigms of IoT and edge computing, where the end points generate Smart Data without completely relying on the cloud.
Finally, this book provides some novel areas of study that are gathering a deeper attention on the Big Data preprocessing. Specifically, it considers the relation with Deep Learning (as of a technique that also relies in large volumes of data), the difficulty of finding the appropriate selection and concatenation of preprocessing techniques applied and some other open problems.
Practitioners and data scientists who work in this field, and want to introduce themselves to preprocessing in large data volume scenarios will want to purchase this book. Researchers that work in this field, who want to know which algorithms are currently implemented to help their investigations, may also be interested in this book.

Julián Luengo: author's other books


Who wrote Big Data Preprocessing: Enabling Smart Data? Find out the surname, the name of the author of the book and a list of all author's works by series.

Big Data Preprocessing: Enabling Smart Data — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Big Data Preprocessing: Enabling Smart Data" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Contents
Landmarks
Julin Luengo Diego Garca-Gil Sergio Ramrez-Gallego Salvador Garca and - photo 1
Julin Luengo , Diego Garca-Gil , Sergio Ramrez-Gallego , Salvador Garca and Francisco Herrera
Big Data Preprocessing
Enabling Smart Data
Julin Luengo Department of Computer Science and AI University of Granada - photo 2
Julin Luengo
Department of Computer Science and AI, University of Granada, Granada, Spain
Diego Garca-Gil
Department of Computer Science and AI, University of Granada, Granada, Spain
Sergio Ramrez-Gallego
DOCOMO Digital Espaa, Madrid, Madrid, Spain
Salvador Garca
Department of Computer Science and AI, University of Granada, Granada, Spain
Francisco Herrera
Department of Computer Science and AI, University of Granada, Granada, Spain
ISBN 978-3-030-39104-1 e-ISBN 978-3-030-39105-8
https://doi.org/10.1007/978-3-030-39105-8
Springer Nature Switzerland AG 2020
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG.

The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

This book is dedicated to all the people with whom we have worked over the years and who have made it possible to reach this moment. Thanks to the members of the research institute Andalusian Research Institute in Data Science and Computational Intelligence.

To our families.

Preface

The massive growth in the scale of data has been observed in recent years, being a key factor of the Big Data scenario. Big Data can be defined as high volume, velocity, and variety of data that require a new high-performance processing. Addressing Big Data is a challenging and time-demanding task that requires a large computational infrastructure to ensure successful data processing and analysis. Being a very common scenario in real-life applications, the interest of researchers and practitioners on the topic has grown significantly during these years. Among Big Data disciplines, data mining is a key topic, enabling the user to extract knowledge from enormous amounts of raw data. However, this raw data is not always in the best condition to be treated, analyzed, and surveyed. The application of preprocessing techniques is a must in real-world applications, to ensure quality data, Smart Data, for a proper treatment and analysis. The term Smart Data refers to the challenge of transforming raw data into quality data that can be appropriately exploited to obtain valuable insights.

This book aims at offering a general and comprehensible overview of data preprocessing in Big Data, enabling Smart Data. It contains a comprehensive description of the topic and focuses on its main features and the most relevant proposed solutions. Additionally, it considers the different scenarios in Big Data for which the application of data preprocessing techniques can suppose a real challenge. Data preprocessing is a multifaceted discipline that includes data preparation, compounded by integration, cleaning, normalization, and transformation of data; data reduction tasks such as feature selection, instance selection, and discretization; and resampling techniques to deal with imbalanced data.

This book stresses the gap with standard data preprocessing techniques and their Big Data equivalents, showing the challenging difficulties in their development for the latter. It also covers the different approaches that have been traditionally applied and the latest proposals in Big Data preprocessing. Specifically, it reviews data reduction methods, imperfect data approaches, discretization techniques, and imbalanced data preprocessing solutions. Finally, this book describes the most popular Big Data libraries for machine learning, focusing on their data preprocessing algorithms and utilities.

Julin Luengo
Diego Garca-Gil
Sergio Ramrez-Gallego
Salvador Garca
Francisco Herrera
Granada, Spain Granada, Spain Madrid, Spain Granada, Spain Granada, Spain
June 2019
Acronyms
BSP

Bulk Synchronous Parallel

DAG

Directed Acyclic Graph

DM

Data Mining

FS

Feature Selection

HDFS

Hadoop Distributed File System

HPC

High-Performance Computing

IG

Instance Generation

IS

Instance Selection

KNN

K-Nearest Neighbors

ML

Machine Learning

MPI

Message Passing Interface

MV

Missing Values

PCA

Principal Components Analysis

PG

Prototype Generation

PR

Prototype Reduction

PS

Prototype Selection

RDD

Resilient Distributed Dataset

SVM

Support Vector Machine

UCI

UC Irvine Machine Learning Repository

YARN

Yet Another Resource Negotiator

Contents
Springer Nature Switzerland AG 2020
J. Luengo et al. Big Data Preprocessing https://doi.org/10.1007/978-3-030-39105-8_1
1. Introduction
Julin Luengo
(1)
Department of Computer Science and AI, University of Granada, Granada, Spain
(2)
DOCOMO Digital Espaa, Madrid, Madrid, Spain
1.1 Big Data
We are immersed in the Information Age where vast amounts of data are available. Petabytes of data are generated and stored everyday, resulting in a humongous volume of information; this information arrives at high velocity and its processing requires real-time processing; this information can be found in many formats, like structured, semi-structured, or unstructured data, implying variety; it also needs to be cleaned in order to maintain veracity; finally, this information must provide value to the organization. These five concepts are one of the most extended definitions of Big Data [. While the volume, velocity, and variety aspects refer to the data generation process and how to capture and store the data, veracity and value aspects deal with the quality and the usefulness of the data. These two last aspects become crucial in any Big Data process, where the extraction of useful and valuable knowledge is strongly influenced by the quality of the used data.
Fig 11 Big Data Vs It is predicted that by 2020 the digital universe will - photo 3
Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Big Data Preprocessing: Enabling Smart Data»

Look at similar books to Big Data Preprocessing: Enabling Smart Data. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Big Data Preprocessing: Enabling Smart Data»

Discussion, reviews of the book Big Data Preprocessing: Enabling Smart Data and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.