• Complain

Charu C. Aggarwal - Data Mining

Here you can read online Charu C. Aggarwal - Data Mining full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 0, publisher: Springer International Publishing, Cham, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Charu C. Aggarwal Data Mining
  • Book:
    Data Mining
  • Author:
  • Publisher:
    Springer International Publishing, Cham
  • Genre:
  • Year:
    0
  • Rating:
    5 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 100
    • 1
    • 2
    • 3
    • 4
    • 5

Data Mining: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Data Mining" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Charu C. Aggarwal: author's other books


Who wrote Data Mining? Find out the surname, the name of the author of the book and a list of all author's works by series.

Data Mining — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Data Mining" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Springer International Publishing Switzerland 2015
Charu C. Aggarwal Data Mining 10.1007/978-3-319-14142-8_1
1. An Introduction to Data Mining
Charu C. Aggarwal 1
(1)
IBM T.J. Watson Research Center, Yorktown Heights, New York, USA
Charu C. Aggarwal
Email:
1.1
1.2
1.2.1
1.2.2
1.3
1.3.1
1.3.1.1
1.3.1.2
1.3.1.3
1.3.1.4
1.3.2
1.3.2.1
1.3.2.2
1.3.2.3
1.3.2.4
1.3.2.5
1.4
1.4.1
1.4.2
1.4.3
1.4.4
1.4.5
1.4.5.1
1.4.5.2
1.4.5.3
1.4.5.4
1.5
1.6
1.6.1
1.6.2
1.6.3
1.6.4
1.7
1.8
1.9
Education is not the piling on of learning, information, data, facts, skills, or abilities thats training or instruction but is rather making visible what is hidden as a seed. Thomas More
1.1 Introduction
Data mining is the study of collecting, cleaning, processing, analyzing, and gaining useful insights from data. A wide variation exists in terms of the problem domains, applications, formulations, and data representations that are encountered in real applications. Therefore, data mining is a broad umbrella term that is used to describe these different aspects of data processing.
In the modern age, virtually all automated systems generate some form of data either for diagnostic or analysis purposes. This has resulted in a deluge of data, which has been reaching the order of petabytes or exabytes. Some examples of different kinds of data are as follows:
  • World Wide Web: The number of documents on the indexed Web is now on the order of billions, and the invisible Web is much larger. User accesses to such documents create Web access logs at servers and customer behavior profiles at commercial sites. Furthermore, the linked structure of the Web is referred to as the Web graph , which is itself a kind of data. These different types of data are useful in various applications. For example, the Web documents and link structure can be mined to determine associations between different topics on the Web. On the other hand, user access logs can be mined to determine frequent patterns of accesses or unusual patterns of possibly unwarranted behavior.
  • Financial interactions: Most common transactions of everyday life, such as using an automated teller machine (ATM) card or a credit card, can create data in an automated way. Such transactions can be mined for many useful insights such as fraud or other unusual activity.
  • User interactions: Many forms of user interactions create large volumes of data. For example, the use of a telephone typically creates a record at the telecommunication company with details about the duration and destination of the call. Many phone companies routinely analyze such data to determine relevant patterns of behavior that can be used to make decisions about network capacity, promotions, pricing, or customer targeting.
  • Sensor technologies and the Internet of Things: A recent trend is the development of low-cost wearable sensors, smartphones, and other smart devices that can communicate with one another. By one estimate, the number of such devices exceeded the number of people on the planet in 2008 [30]. The implications of such massive data collection are significant for mining algorithms.
The deluge of data is a direct result of advances in technology and the computerization of every aspect of modern life. It is, therefore, natural to examine whether one can extract concise and possibly actionable insights from the available data for application-specific goals. This is where the task of data mining comes in. The raw data may be arbitrary, unstructured, or even in a format that is not immediately suitable for automated processing. For example, manually collected data may be drawn from heterogeneous sources in different formats and yet somehow needs to be processed by an automated computer program to gain insights.
To address this issue, data mining analysts use a pipeline of processing, where the raw data are collected, cleaned, and transformed into a standardized format. The data may be stored in a commercial database system and finally processed for insights with the use of analytical methods. In fact, while data mining often conjures up the notion of analytical algorithms, the reality is that the vast majority of work is related to the data preparation portion of the process. This pipeline of processing is conceptually similar to that of an actual mining process from a mineral ore to the refined end product. The term mining derives its roots from this analogy.
From an analytical perspective, data mining is challenging because of the wide disparity in the problems and data types that are encountered. For example, a commercial product recommendation problem is very different from an intrusion-detection application, even at the level of the input data format or the problem definition. Even within related classes of problems, the differences are quite significant. For example, a product recommendation problem in a multidimensional database is very different from a social recommendation problem due to the differences in the underlying data type. Nevertheless, in spite of these differences, data mining applications are often closely connected to one of four superproblems in data mining: association pattern mining, clustering, classification, and outlier detection. These problems are so important because they are used as building blocks in a majority of the applications in some indirect form or the other. This is a useful abstraction because it helps us conceptualize and structure the field of data mining more effectively.
The data may have different formats or types . The type may be quantitative (e.g., age), categorical (e.g., ethnicity), text, spatial, temporal, or graph-oriented. Although the most common form of data is multidimensional, an increasing proportion belongs to more complex data types. While there is a conceptual portability of algorithms between many data types at a very high level, this is not the case from a practical perspective. The reality is that the precise data type may affect the behavior of a particular algorithm significantly. As a result, one may need to design refined variations of the basic approach for multidimensional data, so that it can be used effectively for a different data type. Therefore, this book will dedicate different chapters to the various data types to provide a better understanding of how the processing methods are affected by the underlying data type.
A major challenge has been created in recent years due to increasing data volumes. The prevalence of continuously collected data has led to an increasing interest in the field of data streams . For example, Internet traffic generates large streams that cannot even be stored effectively unless significant resources are spent on storage. This leads to unique challenges from the perspective of processing and analysis. In cases where it is not possible to explicitly store the data, all the processing needs to be performed in real time.
This chapter will provide a broad overview of the different technologies involved in preprocessing and analyzing different types of data. The goal is to study data mining from the perspective of different problem abstractions and data types that are frequently encountered. Many important applications can be converted into these abstractions.
This chapter is organized as follows. Section gives a summary.
Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Data Mining»

Look at similar books to Data Mining. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Data Mining»

Discussion, reviews of the book Data Mining and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.