• Complain

Alex Holmes - Hadoop in Practice

Here you can read online Alex Holmes - Hadoop in Practice full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2012, publisher: Manning Publications, genre: Home and family. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Alex Holmes Hadoop in Practice
  • Book:
    Hadoop in Practice
  • Author:
  • Publisher:
    Manning Publications
  • Genre:
  • Year:
    2012
  • Rating:
    4 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 80
    • 1
    • 2
    • 3
    • 4
    • 5

Hadoop in Practice: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Hadoop in Practice" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Hadoop in Practice collects 85 Hadoop examples and presents them in a problem/solution format. Each technique addresses a specific task youll face, like querying big data using Pig or writing a log file loader. Youll explore each problem step by step, learning both how to build and deploy that specific solution along with the thinking that went into its design. As you work through the tasks, youll find yourself growing more comfortable with Hadoop and at home in the world of big data.

Alex Holmes: author's other books


Who wrote Hadoop in Practice? Find out the surname, the name of the author of the book and a list of all author's works by series.

Hadoop in Practice — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Hadoop in Practice" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Hadoop in Practice
Alex Holmes

Hadoop in Practice - image 1

Copyright

For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact

Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 261 Shelter Island, NY 11964 Email: orders@manning.com

2012 by Manning Publications Co. All rights reserved.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.

Picture 2Recognizing the importance of preserving what has been written, it is Mannings policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.
Picture 3Manning Publications Co.
20 Baldwin Road
PO Box 261
Shelter Island, NY 11964
Development editor:
Copyeditors:
Proofreader:
Typesetter:
Illustrator:
Cover designer:
Cynthia Kane
Bob Herbtsman, Tara Walsh
Katie Tennant
Gordan Salinovic
Martin Murtonen
Marija Tudor

ISBN 9781617290237

Printed in the United States of America

1 2 3 4 5 6 7 8 9 10 MAL 17 16 15 14 13 12

Dedication

To Michal, Marie, Oliver, Ollie, Mish, and Anch

Brief Table of Contents
Table of Contents
Preface

I first encountered Hadoop in the fall of 2008 when I was working on an internet crawl and analysis project at Verisign. My team was making discoveries similar to those that Doug Cutting and others at Nutch had made several years earlier regarding how to efficiently store and manage terabytes of crawled and analyzed data. At the time, we were getting by with our home-grown distributed system, but the influx of a new data stream and requirements to join that stream with our crawl data couldnt be supported by our existing system in the required timelines.

After some research we came across the Hadoop project, which seemed to be a perfect fit for our needsit supported storing large volumes of data and provided a mechanism to combine them. Within a few months wed built and deployed a Map-Reduce application encompassing a number of MapReduce jobs, woven together with our own MapReduce workflow management system onto a small cluster of 18 nodes. It was a revelation to observe our MapReduce jobs crunching through our data in minutes. Of course we couldnt anticipate the amount of time that wed spend debugging and performance-tuning our MapReduce jobs, not to mention the new roles we took on as production administratorsthe biggest surprise in this role was the number of disk failures we encountered during those first few months supporting production!

As our experience and comfort level with Hadoop grew, we continued to build more of our functionality using Hadoop to help with our scaling challenges. We also started to evangelize the use of Hadoop within our organization and helped kick-start other projects that were also facing big data challenges.

The greatest challenge we faced when working with Hadoop (and specifically MapReduce) was relearning how to solve problems with it. MapReduce is its own flavor of parallel programming, which is quite different from the in-JVM programming that we were accustomed to. The biggest hurdle was the first onetraining our brains to think MapReduce, a topic which the book Hadoop in Action by Chuck Lam (Manning Publications, 2010) covers well.

After youre used to thinking in MapReduce, the next challenge is typically related to the logistics of working with Hadoop, such as how to move data in and out of HDFS, and effective and efficient ways to work with data in Hadoop. These areas of Hadoop havent received much coverage, and thats what attracted me to the potential of this bookthat of going beyond the fundamental word-count Hadoop usages and covering some of the more tricky and dirty aspects of Hadoop.

As Im sure many authors have experienced, I went into this project confidently believing that writing this book was just a matter of transferring my experiences onto paper. Boy, did I get a reality check, but not altogether an unpleasant one, because writing introduced me to new approaches and tools that ultimately helped better my own Hadoop abilities. I hope that you get as much out of reading this book as I did writing it.

Acknowledgments

First and foremost, I want to thank Michael Noll, who pushed me to write this book. He also reviewed my early chapter drafts and helped mold the organization of the book. I cant express how much his support and encouragement has helped me throughout the process.

Im also indebted to Cynthia Kane, my development editor at Manning, who coached me through writing this book and provided invaluable feedback on my work. Among many notable Aha! moments I had while working with Cynthia, the biggest one was when she steered me into leveraging visual aids to help explain some of the complex concepts in this book.

I also want to say a big thank you to all the reviewers of this book: Aleksei Sergeevich, Alexander Luya, Asif Jan, Ayon Sinha, Bill Graham, Chris Nauroth, Eli Collins, Ferdy Galema, Harsh Chouraria, Jeff Goldschrafe, Maha Alabduljalil, Mark Kemna, Oleksey Gayduk, Peter Krey, Philipp K. Janert, Sam Ritchie, Soren Macbeth, Ted Dunning, Yunkai Zhang, and Zhenhua Guo.

Jonathan Seidman, the primary technical editor, did a great job reviewing the entire book shortly before it went into production. Many thanks to Josh Wills, the creator of Crunch, who kindly looked over the chapter that covers that topic. And more thanks go to Josh Patterson, who reviewed my Mahout chapter.

All of the Manning staff were a pleasure to work with, and a special shout-out goes to Troy Mott, Katie Tennant, Nick Chase, Tara Walsh, Bob Herbstman, Michael Stephens, Marjan Bace, and Maureen Spencer.

Finally, a special thanks to my wife, Michal, who had to put up with a cranky husband working crazy hours. She was a source of encouragement throughout the entire process.

About this Book

Doug Cutting, Hadoops creator, likes to call Hadoop the kernel for big data, and Id tend to agree. With its distributed storage and compute capabilities, Hadoop is fundamentally an enabling technology for working with huge datasets. Hadoop, to me, provides a bridge between structured (RDBMS) and unstructured (log files, XML, text) data, and allows these datasets to be easily joined together. This has evolved from traditional use cases, such as combining OLTP and log files, to more sophisticated uses, such as using Hadoop for data warehousing (exemplified by Facebook) and the field of data science, which studies and makes new discoveries about data.

This book collects a number of intermediary and advanced Hadoop examples and presents them in a problem/solution format. Each of the 85 techniques addresses a specific task youll face, like using Flume to move log files into Hadoop or using Mahout for predictive analysis. Each problem is explored step by step and, as you work through them, youll find yourself growing more comfortable with Hadoop and at home in the world of big data.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Hadoop in Practice»

Look at similar books to Hadoop in Practice. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Hadoop in Practice»

Discussion, reviews of the book Hadoop in Practice and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.