LitArk » Books » Computer

Byron Ellis - Real-Time Analytics Techniques to Analyze and Visualize Streaming Data

Here you can read online Byron Ellis - Real-Time Analytics Techniques to Analyze and Visualize Streaming Data full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2014, publisher: Wiley, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Real-Time Analytics Techniques to Analyze and Visualize Streaming Data
Author:
Byron Ellis
Publisher:
Wiley
Genre:
Books / Computer
Year:
2014
Rating:
3 / 5
Favourites:
Add to favourites
Your mark:
- 60
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Real-Time Analytics Techniques to Analyze and Visualize Streaming Data" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Construct a robust endtoend solution for analyzing and visualizing streaming data Realtime analytics is the hottest topic in data analytics today. In RealTime Analytics: Techniques to Analyze and Visualize Streaming Data , expert Byron Ellis teaches data analysts technologies to build an effective realtime analytics platform. This platform can then be used to make sense of the constantly changing data that is beginning to outpace traditional batchbased analysis platforms. The author is among a very few leading experts in the field. He has a prestigious background in research, development, analytics, realtime visualization, and Big Data streaming and is uniquely qualified to help you explore this revolutionary field. Moving from a description of the overall analytic architecture of realtime analytics to using specific tools to obtain targeted results, RealTime Analytics leverages open source and modern commercial tools to construct robust, efficient systems that can provide realtime analysis in a costeffective manner. The book includes: A deep discussion of streaming data systems and architectures Instructions for analyzing, storing, and delivering streaming data Tips on aggregating data and working with sets Information on data warehousing options and techniques RealTime Analytics includes indepth case studies for website analytics, Big Data, visualizing streaming and mobile data, and mining and visualizing operational data flows. The books recipe layout lets readers quickly learn and implement different techniques. All of the code examples presented in the book, along with their related data sets, are available on the companion website.

Byron Ellis: author's other books

Who wrote Real-Time Analytics Techniques to Analyze and Visualize Streaming Data? Find out the surname, the name of the author of the book and a list of all author's works by series.

Real-Time Analytics Techniques to Analyze and Visualize Streaming Data — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Real-Time Analytics Techniques to Analyze and Visualize Streaming Data" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Chapter 1 Introduction to Streaming Data It seems like the world moves at a - photo 1

Chapter 1
Introduction to Streaming Data

It seems like the world moves at a faster pace every day. People and places become more connected, and people and organizations try to react at an ever-increasing pace. Reaching the limits of a human's ability to respond, tools are built to process the vast amounts of data available to decision makers, analyze it, present it, and, in some cases, respond to events as they happen.

The collection and processing of this data has a number of application areas, some of which are discussed in the next section. These applications, which are discussed later in this chapter, require an infrastructure and method of analysis specific to streaming data. Fortunately, like batch processing before it, the state of the art of streaming infrastructure is focused on using commodity hardware and software to build its systems rather than the specialized systems required for real-time analysis prior to the Internet era. This, combined with flexible cloud-based environment, puts the implementation of a real-time system within the reach of nearly any organization. These commodity systems allow organizations to analyze their data in real time and scale that infrastructure to meet future needs as the organization grows and changes over time.

The goal of this book is to allow a fairly broad range of potential users and implementers in an organization to gain comfort with the complete stack of applications. When real-time projects reach a certain point, they should be agile and adaptable systems that can be easily modified, which requires that the users have a fair understanding of the stack as a whole in addition to their own areas of focus. Real time applies as much to the development of new analyses as it does to the data itself. Any number of well-meaning projects have failed because they took so long to implement that the people who requested the project have either moved on to other things or simply forgotten why they wanted the data in the first place. By making the projects agile and incremental, this can be avoided as much as possible.

This chapter is divided into sections that cover three topics. The first section, Sources of Streaming Data, is some of the common sources and applications of streaming data. They are arranged more or less chronologically and provide some background on the origin of streaming data infrastructures. Although this is historically interesting, many of the tools and frameworks presented were developed to solve problems in these spaces, and their design reflects some of the challenges unique to the space in which they were born. Kafka, a data motion tool covered in Chapter 4, Flow Management for Streaming Analysis, for example, was developed as a web applications tool, whereas Storm, a processing framework covered in Chapter 5, Processing Streaming Data, was developed primarily at Twitter for handling social media data.

The second section, Why Streaming Data is Different, covers three of the important aspects of streaming data: continuous data delivery, loosely structured data, and high-cardinality datasets. The first, of course, defines a system to be a real-time streaming data environment in the first place. The other two, though not entirely unique, present a unique challenge to the designer of a streaming data application. All three combine to form the essential streaming data environment.

The third section, Infrastructures and Algorithms, briefly touches on the significance of how infrastructures and algorithms are used with streaming data.

Sources of Streaming Data

There are a variety of sources of streaming data. This section introduces some of the major categories of data. Although there are always more and more data sources being made available, as well as many proprietary data sources, the categories discussed in this section are some of the application areas that have made streaming data interesting. The ordering of the application areas is primarily chronological, and much of the software discussed in this book derives from solving problems in each of these specific application areas.

The data motion systems presented in this book got their start handling data for website analytics and online advertising at places like LinkedIn, Yahoo!, and Facebook. The processing systems were designed to meet the challenges of processing social media data from Twitter and social networks like LinkedIn.

Google, whose business is largely related to online advertising, makes heavy use of the advanced algorithmic approaches similar to those presented in Chapter 11. Google seems to be especially interested in a technique called deep learning, which makes use of very large-scale neural networks to learn complicated patterns.

These systems are even enabling entirely new areas of data collection and analysis by making the Internet of Things and other highly distributed data collection efforts economically feasible. It is hoped that outlining some of the previous application areas provides some inspiration for as-of-yet-unforeseen applications of these technologies.

Operational Monitoring

Operational monitoring of physical systems was the original application of streaming data. Originally, this would have been implemented using specialized hardware and software (or even analog and mechanical systems in the pre-computer era). The most common use case today of operational monitoring is tracking the performance of the physical systems that power the Internet.

These datacenters house thousandspossibly even tens of thousandsof discrete computer systems. All of these systems continuously record data about their physical state from the temperature of the processor, to the speed of the fan and the voltage draw of their power supplies. They also record information about the state of their disk drives and fundamental metrics of their operation, such as processor load, network activity, and storage access times.

To make the monitoring of all of these systems possible and to identify problems, this data is collected and aggregated in real time through a variety of mechanisms. The first systems tended to be specialized ad hoc mechanisms, but when these sorts of techniques started applying to other areas, they started using the same collection systems as other data collection mechanisms.

Web Analytics

The introduction of the commercial web, through e-commerce and online advertising, led to the need to track activity on a website. Like the circulation numbers of a newspaper, the number of unique visitors who see a website in a day is important information. For e-commerce sites, the data is less about the number of visitors as it is the various products they browse and the correlations between them.

To analyze this data, a number of specialized log-processing tools were introduced and marketed. With the rise of Big Data and tools like Hadoop, much of the web analytics infrastructure shifted to these large batch-based systems. They were used to implement recommendation systems and other analysis. It also became clear that it was possible to conduct experiments on the structure of websites to see how they affected various metrics of interest. This is called A/B testing becausein the same way an optometrist tries to determine the best prescriptiontwo choices are pitted against each other to determine which is best. These tests were mostly conducted sequentially, but this has a number of problems, not the least of which is the amount of time needed to conduct the study.

As more and more organizations mined their website data, the need to reduce the time in the feedback loop and to collect data on a more continual basis became more important. Using the tools of the system-monitoring community, it became possible to also collect this data in real time and perform things like A/B tests in parallel rather than in sequence. As the number of dimensions being measured and the need for appropriate auditing (due to the metrics being used for billing) increased, the analytics community developed much of the streaming infrastructure found in this book to safely move data from their web servers spread around the world to processing and billing systems.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Real-Time Analytics Techniques to Analyze and Visualize Streaming Data»

Look at similar books to Real-Time Analytics Techniques to Analyze and Visualize Streaming Data. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Pethuru Raj

Streaming Analytics: Concepts, architectures, platforms, use cases and applications

Jim Lindell

Analytics and Big Data for Accountants (AICPA)

Himansu Das

Real-Time Data Analytics for Large Scale Sensor Data

Czygan Martin

Python: Data Analytics and Visualization

IBM Paul Zikopoulos

Understanding big data: analytics for enterprise class Hadoop and streaming data

Bengfort Benjamin

Data analytics with Hadoop: an introduction for data scientists

Anindita Basak

Stream Analytics with Microsoft Azure: Real-time data processing for quick insights using Azure Stream Analytics

Nataraj Dasgupta

Practical Big Data Analytics

Gollapudi

Getting started with Greenplum for big data analytics

Dr. Arvind Sathi

Big Data Analytics: Disruptive Technologies for Changing the Game

Vignesh Prajapati

Big Data Analytics with R and Hadoop

Frank J. Ohlhorst

Big data analytics: turning big data into big money

Reviews about «Real-Time Analytics Techniques to Analyze and Visualize Streaming Data»

Discussion, reviews of the book Real-Time Analytics Techniques to Analyze and Visualize Streaming Data and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.