LitArk » Books » Computer

Eric Sammer - Hadoop Operations

Here you can read online Eric Sammer - Hadoop Operations full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2012, publisher: OReilly Media, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Hadoop Operations
Author:
Eric Sammer
Publisher:
OReilly Media
Genre:
Books / Computer
Year:
2012
Rating:
4 / 5
Favourites:
Add to favourites
Your mark:
- 80
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Hadoop Operations: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Hadoop Operations" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

If youve been asked to maintain large and complex Hadoop clusters, this book is a must. Demand for operations-specific material has skyrocketed now that Hadoop is becoming the de facto standard for truly large-scale data processing in the data center. Eric Sammer, Principal Solution Architect at Cloudera, shows you the particulars of running Hadoop in production, from planning, installing, and configuring the system to providing ongoing maintenance.
Rather than run through all possible scenarios, this pragmatic operations guide calls out what works, as demonstrated in critical deployments.

Get a high-level overview of HDFS and MapReduce: why they exist and how they work
Plan a Hadoop deployment, from hardware and OS selection to network requirements
Learn setup and configuration details with a list of critical properties
Manage resources by sharing a cluster across multiple groups
Get a runbook of the most common cluster maintenance tasks
Monitor Hadoop clusters--and learn troubleshooting with the help of real-world war stories
Use basic tools and techniques to handle backup and catastrophic failure

Eric Sammer: author's other books

Who wrote Hadoop Operations? Find out the surname, the name of the author of the book and a list of all author's works by series.

Hadoop Operations — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Hadoop Operations" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Hadoop Operations

Eric Sammer

Published by OReilly Media

Beijing Cambridge Farnham Kln Sebastopol Tokyo For Aida Preface Conventions - photo 1

Beijing Cambridge Farnham Kln Sebastopol Tokyo

For Aida.

Preface

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, filenames, and file extensions.

Constant width

Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.

Constant width bold

Shows commands or other text that should be typed literally by the user.

Constant width italic

Shows text that should be replaced with user-supplied values or by values determined by context.

Tip

This icon signifies a tip, suggestion, or general note.

Caution

This icon indicates a warning or caution.

Using Code Examples

This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless youre reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from OReilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your products documentation does require permission.

We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: Hadoop Operations by Eric Sammer (OReilly). Copyright 2012 Eric Sammer, 978-1-449-32705-7.

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at .

Safari Books Online

Note

Safari Books Online (www.safaribooksonline.com) is an on-demand digital library that delivers expert content in both book and video form from the worlds leading authors in technology and business.

Technology professionals, software developers, web designers, and business and creative professionals use Safari Books Online as their primary resource for research, problem solving, learning, and certification training.

Safari Books Online offers a range of product mixes and pricing programs for organizations, government agencies, and individuals. Subscribers have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable database from publishers like OReilly Media, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technology, and dozens more. For more information about Safari Books Online, please visit us online.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

OReilly Media, Inc.

1005 Gravenstein Highway North

Sebastopol, CA 95472

800-998-9938 (in the United States or Canada)

707-829-0515 (international or local)

707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://oreil.ly/hadoop_operations.

To comment or ask technical questions about this book, send email to .

For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Watch us on YouTube: http://www.youtube.com/oreillymedia

Acknowledgments

I want to thank Aida Escriva-Sammer, my wife, best friend, and favorite sysadmin, for putting up with me while I wrote this.

None of this was possible without the support and hard work of the larger Apache Hadoop community and ecosystem projects. I want to encourage all readers to get involved in the community and open source in general.

Matt Massie gave me the opportunity to do this, along with OReilly, and then cheered me on the whole way. Both Matt and Tom White coached me through the proposal process. Mike Olson, Omer Trajman, Amr Awadallah, Peter Cooper-Ellis, Angus Klein, and the rest of the Cloudera management team made sure I had the time, resources, and encouragement to get this done. Aparna Ramani, Rob Weltman, Jolly Chen, and Helen Friedland were instrumental throughout this process and forgiving of my constant interruptions of their teams. Special thanks to Christophe Bisciglia for giving me an opportunity at Cloudera and for the advice along the way.

Many people provided valuable feedback and input throughout the entire process, but especially Aida Escriva-Sammer, Tom White, Alejandro Abdelnur, Amina Abdulla, Patrick Angeles, Paul Battaglia, Will Chase, Yanpei Chen, Eli Collins, Joe Crobak, Doug Cutting, Joey Echeverria, Sameer Farooqui, Andrew Ferguson, Brad Hedlund, Linden Hillenbrand, Patrick Hunt, Matt Jacobs, Amandeep Khurana, Aaron Kimball, Hal Lee, Justin Lintz, Todd Lipcon, Cameron Martin, Chad Metcalf, Meg McRoberts, Aaron T. Myers, Kay Ousterhout, Greg Rahn, Henry Robinson, Mark Roddy, Jonathan Seidman, Ed Sexton, Loren Siebert, Sunil Sitaula, Ben Spivey, Dan Spiewak, Omer Trajman, Kathleen Ting, Erik-Jan van Baaren, Vinithra Varadharajan, Patrick Wendell, Tom Wheeler, Ian Wrigley, Nezih Yigitbasi, and Philip Zeyliger. To those whom I may have omitted from this list, please forgive me.

The folks at OReilly have been amazing, especially Courtney Nash, Mike Loukides, Maria Stallone, Arlette Labat, and Meghan Blanchette.

Jaime Caban, Victor Nee, Travis Melo, Andrew Bayer, Liz Pennell, and Michael Demetria provided additional administrative, technical, and contract support.

Finally, a special thank you to Kathy Sammer for her unwavering support, and for teaching me to do exactly what others say you cannot.

Portions of this book have been reproduced or derived from software and documentation available under the Apache Software License, version 2.

Chapter 1. Introduction

Over the past few years, there has been a fundamental shift in data storage, management, and processing. Companies are storing more data from more sources in more formats than ever before. This isnt just about being a data packrat but rather building products, features, and intelligence predicated on knowing more about the world (where the world can be users, searches, machine logs, or whatever is relevant to an organization). Organizations are finding new ways to use data that was previously believed to be of little value, or far too expensive to retain, to better serve their constituents. Sourcing and storing data is one half of the equation. Processing that data to produce information is fundamental to the daily operations of every modern business.

Data storage and processing isnt a new problem, though. Fraud detection in commerce and finance, anomaly detection in operational systems, demographic analysis in advertising, and many other applications have had to deal with these issues for decades. What has happened is that the volume, velocity, and variety of this data has changed, and in some cases, rather dramatically. This makes sense, as many algorithms benefit from access to more data. Take, for instance, the problem of recommending products to a visitor of an ecommerce website. You could simply show each visitor a rotating list of products they could buy, hoping that one would appeal to them. Its not exactly an informed decision, but its a start. The question is what do you need to improve the chance of showing the right person the right product? Maybe it makes sense to show them what you think they like, based on what theyve previously looked at. For some products, its useful to know what they already own. Customers who already bought a specific brand of laptop computer from you may be interested in compatible accessories and upgrades.[] One of the most common techniques is to cluster users by similar behavior (such as purchase patterns) and recommend products purchased by similar users. No matter the solution, all of the algorithms behind these options require data and generally improve in quality with more of it. Knowing more about a problem space generally leads to better decisions (or algorithm efficacy), which in turn leads to happier users, more money, reduced fraud, healthier people, safer conditions, or whatever the desired result might be.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Hadoop Operations»

Look at similar books to Hadoop Operations. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Roos

Hadoop for dummies: [Understand the value of big data and how Hadoop can help manage it ; navigate the Hadoop 2 ecosystem and create clusters ; use applications for data mining, problem-solving,

White

Hadoop

Achari

Hadoop essentials: delve into the key concepts of Hadoop and get a thorough understanding of the Hadoop ecosystem

Sam R. Alapati

Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS

Ben Spivey

Hadoop Security: Protecting Your Big Data Platform

Shiva Achari

Hadoop Essentials

Tom White

Hadoop: The Definitive Guide

Vignesh Prajapati

Big Data Analytics with R and Hadoop

Danil Zburivsky

Hadoop Cluster Deployment

Dirk deRoos

Hadoop For Dummies

Shumin Guo

Hadoop Operations and Cluster Management Cookbook

Garry Turkington

Hadoop beginners guide

Reviews about «Hadoop Operations»

Discussion, reviews of the book Hadoop Operations and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.