• Complain

Chen Min - Traffic Measurement for Big Network Data

Here you can read online Chen Min - Traffic Measurement for Big Network Data full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. City: Cham, year: 2017;2018, publisher: Springer International Publishing, genre: Home and family. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Chen Min Traffic Measurement for Big Network Data

Traffic Measurement for Big Network Data: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Traffic Measurement for Big Network Data" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Chen Min: author's other books


Who wrote Traffic Measurement for Big Network Data? Find out the surname, the name of the author of the book and a list of all author's works by series.

Traffic Measurement for Big Network Data — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Traffic Measurement for Big Network Data" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Springer International Publishing AG 2017
Shigang Chen , Min Chen and Qingjun Xiao Traffic Measurement for Big Network Data Wireless Networks 10.1007/978-3-319-47340-6_1
1. Introduction
Shigang Chen 1, Min Chen 1 and Qingjun Xiao 2
(1)
Department of Computer & Information Science, University of Florida, Gainesville, FL, USA
(2)
School of Computer Science and Engineering, Southeast University of China, Nanjing, Jiangsu, China
Keywords
Big Network Data Per-Flow Size Measurement Cardinality Estimation Persistent Spread Estimation
1.1 Big Network Data
There is hardly any other data set whose size can rival the big network data that flows on the Internet. The annual global IP traffic is expected to pass zettabyte by 2016 [], and major online retailers such as Alibaba process over a billion sales annually. As these data accumulate day by day and year by year, mining them for knowledge becomes a daunting task that requires tremendous resources. This book aims to develop new compact and fast online measurement methods that reduce big network data to measurement summaries orders-of-magnitude smaller than what the traditional methods can do. The new methods hold the promise of allowing routers to perform measurement on large network traffic in real time using small cache memory on network processors, allowing enterprise systems to store their traffic records (in the form of summaries) over a far longer time frame, and allowing users with ordinary computing resources to perform analysis on big network data.
1.2 Online Challenge
Modern routers forward packets from incoming ports to outgoing ports via switching fabric. To process packets in real time, online modules for traffic measurement, packet scheduling, access control, and quality of service are implemented on network processors, bypassing main memory, and CPU almost entirely []. Commonly used cache memory on network processor chips is SRAM, typically a few megabytes. Increasing on-chip memory to more than 10MB is technically feasible, but it comes with a much higher price tag and access time is longer. There is a huge incentive to keep on-chip memory small because smaller memory can be made faster and cheaper. Off-chip SRAM or embedded DRAM (built on 3-D stacking interconnect or packaging on the same module) can be made larger. However, it is slower to access, and the bandwidth between a network processor and its off-chip memory can be a performance bottleneck. Hence, on-chip memory remains the first choice for online network functions that are designed to match the line speed.
To make the matter more challenging, limited on-chip memory may have to be shared among routing/performance/measurement/security functions that are implemented on the same chip. Each function can only use a fraction of the available space. Depending on their relative importance, some functions may be allocated tiny portions of the on-chip memory, whereas the amount of data they have to process and store can be extremely large in high-speed networks. The great disparity in memory demand and supply requires us to implement online functions, including real-time traffic measurement, as compact as possible. As an example, if the amount of on-chip memory allocated to a traffic measurement function is 1Mb but there are 1M concurrent flows, with 1 bit per flow, can we still perform per-flow traffic measurement? How about 10M concurrent flows with the same memory allocation? This is what we want to achieve through this book.
1.3 Offline Challenge
The space problem also exists offline where disks are used to store network traffic data over time for long-term analysis. As such data are constantly produced, there is a limit on how long they can be stored. With a given amount of disk space, the smaller we can reduce the traffic data, the longer we can keep the data before it has to be removed.
The space issue also arises when we analyze big data. Suppose an analyst who has access to web search records wants to profile the number of searches for each keyword/phrase/question/sentence. This information is useful to online social/economical/opinion trend studies [], various data analysis systems at Google, such as Sawzall, Dremel, and PowerDrill, estimate the cardinalities of very large data sets on a daily basis, which presents a challenge in computational resources, and memory in particularfor the PowerDrill system, a non-negligible fraction of queries historically could not be computed because they exceeded the available memory.
As another example, lets consider an analyst with access to billions of sale records from an online retailer. Suppose she wants to analyze purchase associations. Each association is defined as the purchase of one product followed by the purchase of another product from the same client. Profiling the frequency of each association helps the retailer follow up with product recommendations to its clients after they make purchases. However, such analysis requires pairing up the sale records. The multiplicative effort of pairing may result in an extraordinary number of purchase associations, much larger than the number of sale records. Although the analyst may resort to a datacenter for needed resources, it would certainly be welcome if we can make the same job doable on a regular laptop, even when the number of available memory bits on the laptop is far fewer than the number of purchase associations. (The same is true for the previous example of profiling search record.) This is what we want to achieve through this project.
1.4 Fundamental Primitives
In this book, we model network data as a set of flows, each of which is the abstraction of a data subset defined based on the measurement requirement. For example, we may treat all packets from the same source address as a flow, i.e., per-source flow. In this case, the flow identifier is the source address in the packet header. Similarly, we may define per-destination flows, per-source/destination flows, TCP flows, WWW flows, P2P flows, or other application-specific flows. We also need to define elements in the flows to be measured. Depending on the application needs, the elements may be destination addresses, source addresses, ports, or even keywords that appear in the packets of a flow.
Big network data consists of millions or even billions of flows. We may measure the flow size which is what NetFlow [] doesin number of bytes or packets; here, each byte (or packet) is considered as an element to be counted. We may measure the flow cardinality which is what firewalls often doin number of distinct elements in each flow. This is a harder problem because in order to remove the duplicate elements in the flow, we need a way to remember which elements we have seen in the past. Or we may measure the persistent spread of a flow: For a certain number of consecutive periods, if an element of a flow appears in each period, we call it a persistent element. The persistent spread of the flow over a given number of periods is defined as the distinct number of persistent elements in the flow. This book presents three important fundamental online functions: per-flow size measurement, persistent spread measurement, and per-flow cardinality measurement.
1.5 Scalable Counter Architectures for Per-Flow Size Measurement
Measuring flow size has many important applications. We may measure the number of packets in each TCP flow, the data rate of each voice-over-IP session, the number of bytes that each host downloads, the number of SYN packets from each source address, or the number of ACK packets sent to each address. Such information is very useful to service provision, capacity planning, accounting and billing, and anomaly detection [] and use per-flow information to compile the list of candidate bots that contribute to the change, helping to narrow down the scope for further investigation.
Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Traffic Measurement for Big Network Data»

Look at similar books to Traffic Measurement for Big Network Data. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Traffic Measurement for Big Network Data»

Discussion, reviews of the book Traffic Measurement for Big Network Data and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.