LitArk » Books » Politics

Mirkin - Core Concepts in Data Analysis: Summarization, Correlation and Visualization

Here you can read online Mirkin - Core Concepts in Data Analysis: Summarization, Correlation and Visualization full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2019, publisher: Springer London, genre: Politics. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Core Concepts in Data Analysis: Summarization, Correlation and Visualization
Author:
Mirkin / Boris
Publisher:
Springer London
Genre:
Books / Politics
Year:
2019
Rating:
5 / 5
Favourites:
Add to favourites
Your mark:
- 100
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Core Concepts in Data Analysis: Summarization, Correlation and Visualization: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Core Concepts in Data Analysis: Summarization, Correlation and Visualization" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Mirkin: author's other books

Who wrote Core Concepts in Data Analysis: Summarization, Correlation and Visualization? Find out the surname, the name of the author of the book and a list of all author's works by series.

Core Concepts in Data Analysis: Summarization, Correlation and Visualization — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Core Concepts in Data Analysis: Summarization, Correlation and Visualization" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Boris Mirkin Undergraduate Topics in Computer Science Core Concepts in Data Analysis: Summarization, Correlation and Visualization 10.1007/978-0-85729-287-2_1 Springer-Verlag London Limited 2011

1. Introduction: What Is Core

Boris Mirkin 1, 2

(1)

Research University Higher School of Economics, School of Applied Mathematics and Informatics, 11 Pokrovsky Boulevard, Moscow, RF, Russia

(2)

Department of Computer Science, Birkbeck University of London, Malet Street, London, UK

1.1

1.2

1.2.1

1.2.2

1.2.3

1.2.4

1.2.5

1.2.6

1.2.7

1.3

1.3.1

1.3.2

1.3.3

1.3.4

1.4

Abstract

This is an introductory chapter in which(i)Goals of data analysis as a tool helping to enhance and augment knowledge of the domain are outlined. Since knowledge is represented by the concepts and statements of relation between them, two main pathways for data analysis are summarization, for developing and augmenting concepts, and correlation, for enhancing and establishing relations. (ii)A set of seven cases involving small datasets and related data analysis problems is presented. The datasets are taken from various fields such as monitoring market towns, computer security protocols, bioinformatics, cognitive psychology. (iii)An overview of data visualization, its goals and some techniques is given.

1.1 Summarization and Correlation: Two Main Goals of Data Analysis

The term Data Analysis has been used for quite a while, even before the advent of computer era, as an extension of mathematical statistics, starting from developments in cluster analysis and other multivariate techniques before WWII and eventually bringing forth the concepts of exploratory data analysis and confirmatory data analysis in statistics (see, for example, Tukey ).

The situation can be looked at as follows. Classical statistics takes the view of data as a vehicle to fit and test mathematical models of the phenomena the data refer to. The data mining and knowledge discovery discipline uses data to add new knowledge in any format. It should be sensible then to look at those methods that relate to an intermediate level and contribute to the theoretical rather than any knowledge of the phenomenon. These would focus on ways of augmenting or enhancing theoretical knowledge of the specific domain which the data being analyzed refer to. The term knowledge encompasses many a diverse layer or form of information, starting from individual facts to those of literary characters to major scientific laws. But when focusing on a particular domain the dataset in question comes from, its theoretical knowledge structure can be considered as comprised of just two types of elements: (i) concepts and (ii) statements relating them. Concepts are terms referring to aggregations of similar entities, such as apples or plums, or similar categories such as fruit comprising both apples and plums, among others. When created over data objects or features, these are referred to, in data analysis, as clusters or factors, respectively. Statements of relation between concepts express regularities relating different categories. Two features are said to correlate when a co-occurrence of specific patterns in their values is observed as, for instance, when a features value tends to be the square of the other feature. The observance of a correlation pattern can lead sometimes to investigation of a broader structure behind the pattern, which may further lead to finding or developing a theoretical framework for the phenomenon in question from which the correlation follows. It is useful to distinguish between quantitative correlations such as functional dependencies between features and categorical ones expressed conceptually, for example, as logical production rules or more complex structures such as decision trees. Correlations may be used for both understanding and prediction. In applications, the latter is by far more important. Moreover, the prediction problem is much easier to make sense of operationally so that the sciences so far have paid much attention to this.

What is said above suggests that there are two main pathways for augmenting knowledge: (i) developing new concepts by summarizing data and (ii) deriving new relations between concepts by analyzing correlation between various aspects of the data. The quotation marks are used here to point out that each of the terms, summarization and correlation, much extends its conventional meaning. Indeed, while everybody would agree that the average mark does summarize the marking scores on test papers, it would be more daring to see in the same light derivation of students hidden talent scores by approximating their test marks on various subjects or finding a cluster of similarly performing students. Still, the mathematical structures behind each of these three activities calculating the average, finding a hidden factor, and designing a cluster structure are analogous, which suggests that classing them all under the summarization umbrella may be reasonable. Similarly, term correlation which is conventionally utilized in statistics to only express the extent of linear relationship between two or more variables, is understood here in its generic sense, as a supposed affinity between two or more aspects of the same data that can be variously expressed, not necessarily by a linear equation or by a quantitative expression at all.

It would be useful to spell out that view of the data as a subject of computational data analysis that is adhered to here. Typically, in sciences and in statistics, a problem comes first, and then the investigator turns to data that might be useful in advancing towards a solution. In computational data analysis, it may also be the case sometimes. Yet the situation is reversed frequently. Typical questions then would be: Take a look at this data set what sense can be made out of it? Is there any structure in the data set? Can these features help in predicting those? This is more reminiscent to a travelers view of the world rather than that of a scientist. The scientist sits at his desk, gets reproducible signals from the universe and tries to accommodate them into the great model of the universe that the science has been developing. The traveler deals with what comes on their way. Helping the traveler in making sense of data is the task of data analysis. It should be pointed out that this view much differs from the conventional scientific method in which the main goal is to identify a pre-specified model of the world, and data is but a vehicle in achieving this goal. It is that view that underlies the development of data mining, though the aspect of data being available as a database, quite important in data mining, is rather tangential to data analysis.

Any data set comprises two parts, data and metadata entries. Data entries are the set of measurements taken, whereas metadata is a most straightforward relation between knowledge and measurements. Metadata usually involves names for the entities and features as well as indications of the measurement scales for the latter. Depending on the data domain, entities may be alternatively but synonymously referred to as individuals, objects, cases, instances, patterns, or observations. Data features may be synonymously referred to as variables, attributes, states, or characters. Depending on the way they are assigned to entities, the features can be of elementary structure [e.g., age, sex, or income of individuals] or complex structure [e.g., an image or a statement or a cardiogram]. Metadata may involve relations between entities and other relevant information.

The two fold goal clearly delineates the place of the data analysis core within the set of approaches involving various data analysis tasks. Here is a list of some popular approaches:

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Core Concepts in Data Analysis: Summarization, Correlation and Visualization»

Look at similar books to Core Concepts in Data Analysis: Summarization, Correlation and Visualization. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Schwabish Jonathan

Data Visualization in Excel

Laura Po

Linked Data Visualization: Techniques, Tools, and Big Data

David S. Brown

Statistics and Data Visualization Using R: The Art and Practice of Data Analysis

Campbell

Data Visualization Guide: Clear Introduction to Data Mining, Analysis, and Visualization

Mailund

Beginning Data Science in R Data Analysis, Visualization, and Modelling for the Data Scientist

Hubbard

Java data analysis: data mining, big data analysis, NoSQL, and data visualization

John Wiley

JavaScript and jQuery for Data Analysis and Visualization

Avishek Pal

Practical Time Series Analysis: Master Time Series Data Processing, Visualization, and Modeling using Python

Jon Raasch

JavaScript and jQuery for Data Analysis and Visualization

Gregg Hartvigsen

A Primer in Biological Data Analysis and Visualization Using R

Nathan Yau

Data Points: Visualization That Means Something

Tom Barker

Pro Data Visualization using R and JavaScript

Reviews about «Core Concepts in Data Analysis: Summarization, Correlation and Visualization»

Discussion, reviews of the book Core Concepts in Data Analysis: Summarization, Correlation and Visualization and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.