• Complain

Witold Pedrycz - Data Science and Big Data: An Environment of Computational Intelligence

Here you can read online Witold Pedrycz - Data Science and Big Data: An Environment of Computational Intelligence full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2017, publisher: Springer, genre: Home and family. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Witold Pedrycz Data Science and Big Data: An Environment of Computational Intelligence

Data Science and Big Data: An Environment of Computational Intelligence: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Data Science and Big Data: An Environment of Computational Intelligence" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

This book presents a comprehensive and up-to-date treatise of a range of methodological and algorithmic issues. It also discusses implementations and case studies, identifies the best design practices, and assesses data analytics business models and practices in industry, health care, administration and business.Data science and big data go hand in hand and constitute a rapidly growing area of research and have attracted the attention of industry and business alike. The area itself has opened up promising new directions of fundamental and applied research and has led to interesting applications, especially those addressing the immediate need to deal with large repositories of data and building tangible, user-centric models of relationships in data. Data is the lifeblood of todays knowledge-driven economy.Numerous data science models are oriented towards end users and along with the regular requirements for accuracy (which are present in any modeling), come the requirements for ability to process huge and varying data sets as well as robustness, interpretability, and simplicity (transparency). Computational intelligence with its underlying methodologies and tools helps address data analytics needs.The book is of interest to those researchers and practitioners involved in data science, Internet engineering, computational intelligence, management, operations research, and knowledge-based systems.

Witold Pedrycz: author's other books


Who wrote Data Science and Big Data: An Environment of Computational Intelligence? Find out the surname, the name of the author of the book and a list of all author's works by series.

Data Science and Big Data: An Environment of Computational Intelligence — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Data Science and Big Data: An Environment of Computational Intelligence" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Part I
Fundamentals
Springer International Publishing AG 2017
Witold Pedrycz and Shyi-Ming Chen (eds.) Data Science and Big Data: An Environment of Computational Intelligence Studies in Big Data 10.1007/978-3-319-53474-9_1
Large-Scale Clustering Algorithms
Rocco Langone 1
(1)
KU Leuven ESAT-STADIUS, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium
Rocco Langone (Corresponding author)
Email:
Vilen Jumutc
Email:
Johan A. K. Suykens
Email:
Abstract
Computational tools in modern data analysis must be scalable to satisfy business and research time constraints. In this regard, two alternatives are possible: (i) adapt available algorithms or design new approaches such that they can run on a distributed computing environment (ii) develop model-based learning techniques that can be trained efficiently on a small subset of the data and make reliable predictions. In this chapter two recent algorithms following these different directions are reviewed. In particular, in the first part a scalable in-memory spectral clustering algorithm is described. This technique relies on a kernel -based formulation of the spectral clustering problem also known as kernel spectral clustering . More precisely, a finite dimensional approximation of the feature map via the Nystrm method is used to solve the primal optimization problem, which decreases the computational time from cubic to linear. In the second part, a distributed clustering approach with fixed computational budget is illustrated. This method extends the k-means algorithm by applying regularization at the level of prototype vectors. An optimal stochastic gradient descent scheme for learning with Picture 1 and Picture 2 norms is utilized, which makes the approach less sensitive to the influence of outliers while computing the prototype vectors.
Keywords
Data clustering Big data Kernel methods Nystrm approximation Stochastic optimization K-means Map-Reduce Regularization In-memory algorithms scalability
Introduction
Data clustering allows to partition a set of points into groups called clusters which are as similar as possible. It plays a key role in computational intelligence because of its diverse applications in various domains. Examples include collaborative filtering and market segmentation, where clustering is used to provide personalized recommendations to users, trend detection which allows to discover key trends events in streaming data, community detection in social networks , and many others [].
With the advent of the big data era, a key challenge for data clustering lies in its scalability , that is, how to speed-up a clustering algorithm without affecting its performance. To this purpose, two main directions have been explored [] for some recent surveys on clustering algorithms for big data .
In this chapter two algorithms for large-scale data clustering are reviewed. The first one, named fixed-size kernel spectral clustering (FSKSC), is a sampling-based spectral clustering method. Spectral clustering (SC) [
The remainder of the chapter is organized as follows. Section . Finally some conclusions are given.
Notation
Picture 3
Transpose of the vector Picture 4
Picture 5
Transpose of the matrix Picture 6
Picture 7
Picture 8 Identity matrix
Data Science and Big Data An Environment of Computational Intelligence - image 9
Data Science and Big Data An Environment of Computational Intelligence - image 10 Vector of ones
Data Science and Big Data An Environment of Computational Intelligence - image 11
Training sample of Picture 12 data points
Picture 13
Feature map
Picture 14
Feature space of dimension Picture 15
Picture 16
Partitioning composed of k clusters
Picture 17
Cardinality of a set
Picture 18
p -norm of a vector
Picture 19
Gradient of function f
Standard Clustering Approaches
3.1 Spectral Clustering
Spectral clustering represents a solution to the graph partitioning problem. More precisely, it allows to divide a graph into weakly connected sub-graphs by making use of the spectral properties of the graph Laplacian matrix [].
A graph (or network) Data Science and Big Data An Environment of Computational Intelligence - image 20 is a mathematical structure used to model pairwise relations between certain objects. It refers to a set of N vertices or nodes Data Science and Big Data An Environment of Computational Intelligence - image 21 and a collection of edges Picture 22 that connect pairs of vertices. If the edges are provided with weights the corresponding graph is weighted, otherwise it is referred as an unweighted graph. The topology of a graph is described by the similarity or affinity matrix, which is an Picture 23 matrix Data Science and Big Data An Environment of Computational Intelligence - image 24 , where Data Science and Big Data An Environment of Computational Intelligence - image 25 indicates the link between the vertices i and j . Associated to the similarity matrix there is the degree matrix Data Science and Big Data An Environment of Computational Intelligence - image 26
Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Data Science and Big Data: An Environment of Computational Intelligence»

Look at similar books to Data Science and Big Data: An Environment of Computational Intelligence. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Data Science and Big Data: An Environment of Computational Intelligence»

Discussion, reviews of the book Data Science and Big Data: An Environment of Computational Intelligence and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.