• Complain

Donald Miner - MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems

Here you can read online Donald Miner - MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2012, publisher: OReilly Media, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Donald Miner MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems
  • Book:
    MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems
  • Author:
  • Publisher:
    OReilly Media
  • Genre:
  • Year:
    2012
  • Rating:
    3 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 60
    • 1
    • 2
    • 3
    • 4
    • 5

MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Until now, design patterns for the MapReduce framework have been scattered among various research papers, blogs, and books. This handy guide brings together a unique collection of valuable MapReduce patterns that will save you time and effort regardless of the domain, language, or development framework youre using. Each pattern is explained in context, with pitfalls and caveats clearly identified to help you avoid common design mistakes when modeling your big data architecture. This book also provides a complete overview of MapReduce that explains its origins and implementations, and why design patterns are so important. All code examples are written for Hadoop.

Donald Miner: author's other books


Who wrote MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems? Find out the surname, the name of the author of the book and a list of all author's works by series.

MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
MapReduce Design Patterns
Donald Miner
Adam Shook
Published by OReilly Media

Beijing Cambridge Farnham Kln Sebastopol Tokyo Dedication For William - photo 1

Beijing Cambridge Farnham Kln Sebastopol Tokyo

Dedication

For William

Preface

Welcome to MapReduce Design Patterns ! This book will be unique in some ways and familiar in others. First and foremost, this book is obviously about design patterns, which are templates or general guides to solving problems. We took a look at other design patterns books that have been written in the past as inspiration, particularly Design Patterns: Elements of Reusable Object-Oriented Software , by Gamma et al. (1995), which is commonly referred to as The Gang of Four book. For each pattern, youll see a template that we reuse over and over that we loosely based off of their book. Repeatedly seeing a similar template will help you get to the specific information you need. This will be especially useful in the future when using this book as a reference.

This book is a bit more open-ended than a book in the cookbook series of texts as we dont call out specific problems. However, similarly to the cookbooks, the lessons in this book are short and categorized. Youll have to go a bit further than just copying and pasting our code to solve your problems, but we hope that you will find a pattern to get you at least 90% of the way for just about all of your challenges.

This book is mostly about the analytics side of Hadoop or MapReduce. We intentionally try not to dive into too much detail on how Hadoop or MapReduce works or talk too long about the APIs that we are using. These topics have been written about quite a few times, both online and in print, so we decided to focus on analytics.

In this preface, well talk about how to read this book since its format might be a bit different than most books youve read.

Intended Audience

The motivation for us to write this book was to fill a missing gap we saw in a lot of new MapReduce developers. They had learned how to use the system, got comfortable with writing MapReduce, but were lacking the experience to understand how to do things right or well. The intent of this book is to prevent you from having to make some of your own mistakes by educating you on how experts have figured out how to solve problems with MapReduce. So, in some ways, this book can be viewed as an intermediate or advanced MapReduce developer resource, but we think early beginners and gurus will find use out of it.

This book is also intended for anyone wanting to learn more about the MapReduce paradigm. The book goes deeply into the technical side of MapReduce with code examples and detailed explanations of the inner workings of a MapReduce system, which will help software engineers develop MapReduce analytics. However, quite a bit of time is spent discussing the motivation of some patterns and the common use cases for these patterns, which could be interesting to someone who just wants to know what a system like Hadoop can do.

To get the most out of this book, we suggest you have some knowledge of Hadoop, as all of the code examples are written for Hadoop and many of the patterns are discussed in a Hadoop context. A brief refresher will be given in the first chapter, along with some suggestions for additional reading material.

Pattern Format

The patterns in this book follow a single template format so they are easier to read in succession. Some patterns will omit some of the sections if they dont make sense in the context of that pattern.

Intent

This section is a quick description of the problem the pattern is intended to solve.

Motivation

This section explains why you would want to solve this problem or where it would appear. Some use cases are typically discussed in brief.

Applicability

This section contains a set of criteria that must be true to be able to apply this pattern to a problem. Sometimes these are limitations in the design of the pattern and sometimes they help you make sure this pattern will work in your situation.

Structure

This section explains the layout of the MapReduce job itself. Itll explain what the map phase does, what the reduce phase does, and also lets you know if itll be using any custom partitioners, combiners, or input formats. This is the meat of the pattern and explains how to solve the problem.

Consequences

This section is pretty short and just explains what the output of the pattern will be. This is the end goal of the output this pattern produces.

Resemblances

For readers that have some experience with SQL or Pig, this section will show analogies of how this problem would be solved with these other languages. You may even find yourself reading this section first as it gets straight to the point of what this pattern does.

Sometimes, SQL, Pig, or both are omitted if what we are doing with MapReduce is truly unique.

Known Uses

This section outlines some common use cases for this pattern.

Performance Analysis

This section explains the performance profile of the analytic produced by the pattern. Understanding this is important because every MapReduce analytic needs to be tweaked and configured properly to maximize performance. Without the knowledge of what resources it is using on your cluster, it would be difficult to do this.

The Examples in This Book

All of the examples in this book are written for Hadoop version 1.0.3. MapReduce is a paradigm that is seen in a number of open source and commercial systems these days, but we had to pick one to make our examples consistent and easy to follow, so we picked Hadoop. Hadoop was a logical choice since it a widely used system, but we hope that users of MongoDBs MapReduce and other MapReduce implementations will be able to extrapolate the examples in this text to their particular system of choice.

Caution

In general, we try to use the newer mapreduce API for all of our examples, not the deprecated mapred API. Just be careful when mixing code from this book with other sources, as plenty of people still use mapred and their APIs are not compatible.

Our examples generally omit any sort of error handling, mostly to make the code more terse. In real-world big data systems, you can expect your data to be malformed and youll want to be proactive in handling those situations in your analytics.

We use the same data set throughout this text: a dump of StackOverflows databases. StackOverflow is a popular website in which software developers can go to ask and answer questions about any coding topic (including Hadoop). This data set was chosen because it is reasonable in size, yet not so big that you cant use it on a single node. This data set also contains human-generated natural language text as well as structured elements like usernames and dates.

Throughout the examples in this book, we try to break out parsing logic of this data set into helper functions to clearly distinguish what code is specific to this data set and which code is general and part of the pattern. Since the XML is pretty simple, we usually avoid using a full-blown XML parser and just parse it with some string operations in our Java code.

The data set contains five tables, of which we only use three: comments, posts, and users. All of the data is in well-formed XML, with one record per line.

We use the following three StackOverflow tables in this book:

comments
Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems»

Look at similar books to MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems»

Discussion, reviews of the book MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.