• Complain

Edwin Diday (editor) - Advances in Data Science: Symbolic, Complex, and Network Data (Innovation, Entrepreneurship, Management; Big Data, Intelligence and Data Analaysis)

Here you can read online Edwin Diday (editor) - Advances in Data Science: Symbolic, Complex, and Network Data (Innovation, Entrepreneurship, Management; Big Data, Intelligence and Data Analaysis) full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2020, publisher: Wiley-ISTE, genre: Politics. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Edwin Diday (editor) Advances in Data Science: Symbolic, Complex, and Network Data (Innovation, Entrepreneurship, Management; Big Data, Intelligence and Data Analaysis)

Advances in Data Science: Symbolic, Complex, and Network Data (Innovation, Entrepreneurship, Management; Big Data, Intelligence and Data Analaysis): summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Advances in Data Science: Symbolic, Complex, and Network Data (Innovation, Entrepreneurship, Management; Big Data, Intelligence and Data Analaysis)" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Data science unifies statistics, data analysis and machine learning to achieve a better understanding of the masses of data which are produced today, and to improve prediction. Special kinds of data (symbolic, network, complex, compositional) are increasingly frequent in data science. These data require specific methodologies, but there is a lack of reference work in this field.

Advances in Data Science fills this gap. It presents a collection of up-to-date contributions by eminent scholars following two international workshops held in Beijing and Paris. The 10 chapters are organized into four parts: Symbolic Data, Complex Data, Network Data and Clustering. They include fundamental contributions, as well as applications to several domains, including business and the social sciences.

Edwin Diday (editor): author's other books


Who wrote Advances in Data Science: Symbolic, Complex, and Network Data (Innovation, Entrepreneurship, Management; Big Data, Intelligence and Data Analaysis)? Find out the surname, the name of the author of the book and a list of all author's works by series.

Advances in Data Science: Symbolic, Complex, and Network Data (Innovation, Entrepreneurship, Management; Big Data, Intelligence and Data Analaysis) — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Advances in Data Science: Symbolic, Complex, and Network Data (Innovation, Entrepreneurship, Management; Big Data, Intelligence and Data Analaysis)" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Big Data Artificial Intelligence and Data Analysis Set coordinated by Jacques - photo 1

Big Data, Artificial Intelligence and Data Analysis Set

coordinated by
Jacques Janssen

Volume 4

Advances in Data Science
Symbolic, Complex and Network Data

Edited by

Edwin Diday

Rong Guan

Gilbert Saporta

Huiwen Wang

First published 2020 in Great Britain and the United States by ISTE Ltd and - photo 2

First published 2020 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:

ISTE Ltd

27-37 St Georges Road

London SW19 4EU

UK

www.iste.co.uk

John Wiley & Sons, Inc.

111 River Street

Hoboken, NJ 07030

USA

www.wiley.com

ISTE Ltd 2020

The rights of Edwin Diday, Rong Guan, Gilbert Saporta and Huiwen Wang to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.

Library of Congress Control Number: 2019951813

British Library Cataloguing-in-Publication Data

A CIP record for this book is available from the British Library

ISBN 978-1-78630-576-3

Preface

This book contains a selection of papers presented at two recent international workshops devoted to progress in the analysis of complex data.

The first workshop, ADS16, short for Advances in Data Science, was held in October 2016 at Beihang University, Beijing, China, at the initiative of Professor Huiwen Wang.

The second workshop, entitled Data Science: New Data and Classes, was held a few months later in January 2017 at Paris-Dauphine University, Paris, France, at the invitation of Professor Edwin Diday.

Several members of the Scientific Committees and participants were common to both. Each workshop gathered about 50 participants by invitation only.

After the workshops, we decided that some papers presented deserved to be made available to a wider audience, and we asked authors to prepare revised versions of their papers. Most of them agreed and the 10 papers collected in this volume were part of a blind review by referees, revised, and finally edited.

The papers are grouped into four sections: symbolic data, complex data, network data, and clustering.

For their dedication, we thank Paula Brito, Francisco de A.T. de Carvalho, Jie Gu, George Hbrail, Yves Lechevallier, Wen Long, Monique Noirhomme, Francesco Palumbo, Ming Ye, and Jichang Zhao.

We would also like to thank the sponsors of both meetings:

  • ADS16, Beijing: School of Economics and Management, and the Complex Data Analysis Research Center of Beihang University, School of Statistics and Mathematics of Central University of Finance and Economics. The Beijing workshop Advances in Data Science was financially supported by the NFSC Major International Joint Research Project (Grant number 71420107025), co-organized by Professor Huiwen Wang and Professor Gilbert Saporta.
  • Data Science: New Data and Classes, Paris: Lamsade and Ceremade Labs of Paris-Dauphine University, the French Statistical Society (SfdS), the French Speaking Society for Classification (SFC) , and the Society for Knowledge Discovery (EGC).

Edwin DIDAY

Rong GUAN

Gilbert SAPORTA

Huiwen WANG

October 2019

Part 1
Symbolic Data

Explanatory Tools for Machine Learning in the Symbolic Data Analysis Framework

The aim of this chapter is mainly to give explanatory tools for the understanding of standard, complex and big data. First, we recall some basic notions in Data Science: what are complex data? What are classes and classes of complex data? Which kind of internal class variability can be considered? Then, we define symbolic data and symbolic data tables, which express the within variability of classes, and we give some advantages of such kind of class description. Often in practice the classes are given. When they are not given, clustering can be used to build them by the Dynamic Clustering method (DCM) from which DCM regression, DCM canonical analysis, DCM mixture decomposition, and the like can be obtained. The description of these class yields by aggregation to a symbolic data table. We say that the description of a class is much more explanatory when it is described by symbolic variables (closer from the natural language of the users), and then by its usual analytical multidimensional description. The explanatory and characteristic power of classes can then be measured by criteria based on the symbolic data description of these classes and induce a way for comparing clustering methods by their explanatory power. These criteria are defined in a Symbolic Data Analysis framework for categorical variables, based on three random variables defined on the ground population. Tools are then given for ranking individuals, classes and their symbolic descriptive variables from the more toward the less characteristic. These characteristics are not only explanatory but can also express the concordance or the discordance of a class with the other classes. We suggest several directions of research mainly on parametric aspects of these criteria and on improving the explanatory power of Machine Learning tools. We finally present the conclusion and the wide domain of potential applications in socio demography, medicine, web security and so on.

1.1. Introduction

A Data Scientist is someone who is able to extract new knowledge from Standard, Big and Complex Data. Here we consider complex data as data that cannot be expressed in terms of a standard data table, where units are described by quantitative and qualitative variables. Complex data happen in case of unstructured data, unpaired samples, and multisource data (as mixture of numerical, textual, image and social networks data). The aggregation, fusion, and summarization of such data can be done into classes of row units that are considered as new units. Classes can be obtained by unsupervised learning, giving a concise and structured view on the data. In supervised learning, classes are used in order to provide efficient rules for the allocation of new units to a class. A third way is to consider classes as new units described by symbolic variables whose values are symbols as: intervals, probability distributions, weighted sequences of numbers or categories, functions, and the like, in order to express their within-class variability. For example, Regions express the variability of their inhabitant, Companies express the variability of their web intrusion, and Species express the variability of their specimen. One of the advantages of this approach is that unstructured data and unpaired samples at the level of row units become structured and paired at the classes level (see ).

Three principles guide this chapter in conformity with the Data Science framework. First, new tools are needed to transform huge data bases intended for management to data bases usable for Data Science tools. This transformation leads to the construction of new statistical units-described by aggregated data in terms of symbols as singlevalued dataare not suitable because they cannot incorporate theadditional information on data structure available in symbolic data. Second, we work on the symbolic data as they are given in data bases and not as we wish that they be given. For example, if the data contain intervals, we work on them even if the within-interval uniformity is statistically not satisfactory. Moreover, by considering MinMax intervals, we can obtain useful knowledge, complementary to the one given without the uniformity assumption. Hence, considering that the MinMax or interquartile where the aim is to extract useful knowledge from the data and not only to infer models (even if inferring models like in standard statistics, can for sure give complementary knowledge). Third, by using marginal description of classes by vectors of univariate symbols, rather than joint symbolic description by multivariate symbols, 99% of the users would say that a joint distribution describing a class often contains too much low or 0 values and so has a poor explanatory power in comparison with marginal distributions describing the same class. For example, having 10 variables of 5 categories each, the joint multivariate distribution leads to a sparse symbolic data table where the classes are described by a unique bar chart symbolic variable value containing 510 categories and taking for each class 510 low or 0 values. On the other hand, the 10 marginal bar chart symbolic variables values describe the classes by vectors of 10 bar charts of 5 categories each, easy to interpret and to compare between classes. Nevertheless, a compromise can be obtained by considering joints instead of marginal between the more dependent variables.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Advances in Data Science: Symbolic, Complex, and Network Data (Innovation, Entrepreneurship, Management; Big Data, Intelligence and Data Analaysis)»

Look at similar books to Advances in Data Science: Symbolic, Complex, and Network Data (Innovation, Entrepreneurship, Management; Big Data, Intelligence and Data Analaysis). We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Advances in Data Science: Symbolic, Complex, and Network Data (Innovation, Entrepreneurship, Management; Big Data, Intelligence and Data Analaysis)»

Discussion, reviews of the book Advances in Data Science: Symbolic, Complex, and Network Data (Innovation, Entrepreneurship, Management; Big Data, Intelligence and Data Analaysis) and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.