• Complain

Lars George - HBase: The Definitive Guide

Here you can read online Lars George - HBase: The Definitive Guide full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2015, publisher: OReilly Media, genre: Home and family. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Lars George HBase: The Definitive Guide
  • Book:
    HBase: The Definitive Guide
  • Author:
  • Publisher:
    OReilly Media
  • Genre:
  • Year:
    2015
  • Rating:
    3 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 60
    • 1
    • 2
    • 3
    • 4
    • 5

HBase: The Definitive Guide: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "HBase: The Definitive Guide" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

If youre looking for a scalable storage solution to accommodate a virtually endless amount of data, this updated edition shows you how Apache HBase can meet your needs. Modeled after Googles BigTable architecture, HBase scales to billions of rows and millions of columns, while ensuring that write and read performance remain constant.

Fully revised for HBase 1.0, this second edition brings you up to speed on the new HBase client API, as well as security features and new case studies that demonstrate HBase use in the real world. Whether you just started to evaluate this non-relational database, or plan to put it into practice right away, this book has your back.

  • Launch into basic, advanced, and administrative features of HBases new client-facing API
  • Use new classes to integrate HBase with Hadoops MapReduce framework
  • Explore HBases architecture, including the storage format, write-ahead log, and background processes
  • Dive into advanced usage, such extended client and server options
  • Learn cluster sizing, tuning, and monitoring best practices
  • Design schemas, copy tables, import bulk data, decommission nodes, and other tasks
  • Go deeper into HBase security, including Kerberos and encryption at rest

Lars George: author's other books


Who wrote HBase: The Definitive Guide? Find out the surname, the name of the author of the book and a list of all author's works by series.

HBase: The Definitive Guide — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "HBase: The Definitive Guide" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
HBase - The Definitive Guide - 2nd Edition
Lars George
Beijing Cambridge Farnham Kln Sebastopol Tokyo Foreword Michael Stack - photo 1

Beijing Cambridge Farnham Kln Sebastopol Tokyo

Foreword: Michael Stack
Michael Stack
HBase Project Janitor

The HBase story begins in 2006, when the San Francisco-based startup Powerset was trying to build a natural language search engine for the Web. Their indexing pipeline was an involved multistep process that produced an index about two orders of magnitude larger, on average, than your standard term-based index. The datastore that theyd built on top of the then nascent Amazon Web Services to hold the index intermediaries and the webcrawl was buckling under the load (Ring. Ring. Hello! This is AWS. Whatever you are running, please turn it off!). They were looking for an alternative. The Google Bigtable paper[] had just been published.

Chad Walters, Powersets head of engineering at the time, reflects back on the experience as follows:

Building an open source system to run on top of Hadoops Distributed Filesystem (HDFS) in much the same way that Bigtable ran on top of the Google File System seemed like a good approach because: 1) it was a proven scalable architecture; 2) we could leverage existing work on Hadoops HDFS; and 3) we could both contribute to and get additional leverage from the growing Hadoop ecosystem.

After the publication of the Google Bigtable paper, there were on-again, off-again discussions around what a Bigtable-like system on top of Hadoop might look. Then, in early 2007, out of the blue, Mike Cafarela dropped a tarball of thirty odd Java files into the Hadoop issue tracker: Ive written some code for HBase, a Bigtable-like file store. Its not perfect, but its ready for other people to play with and examine. Mike had been working with Doug Cutting on Nutch, an open source search engine. Hed done similar drive-by code dumps there to add features such as a Google File System clone so the Nutch indexing process was not bounded by the amount of disk you attach to a single machine. (This Nutch distributed filesystem would later grow up to be HDFS.)

Jim Kellerman of Powerset took Mikes dump and started filling in the gaps, adding tests and getting it into shape so that it could be committed as part of Hadoop. The first commit of the HBase code was made by Doug Cutting on April 3, 2007, under the contrib subdirectory. The first HBase working release was bundled as part of Hadoop 0.15.0 in October 2007.

Not long after, Lars, the author of the book you are now reading, showed up on the #hbase IRC channel. He had a big-data problem of his own, and was game to try HBase. After some back and forth, Lars became one of the first users to run HBase in production outside of the Powerset home base. Through many ups and downs, Lars stuck around. I distinctly remember a directory listing Lars made for me a while back on his production cluster at WorldLingo, where he was employed as CTO, sysadmin, and grunt. The listing showed ten or so HBase releases from Hadoop 0.15.1 (November 2007) on up through HBase 0.20, each of which hed run on his 40-node cluster at one time or another during production.

Of all those who have contributed to HBase over the years, it is poetic justice that Lars is the one to write this book. Lars was always dogging HBase contributors that the documentation needed to be better if we hoped to gain broader adoption. Everyone agreed, nodded their heads in ascent, amend, and went back to coding. So Lars started writing critical how-tos and architectural descriptions in-between jobs and his intra-European travels as unofficial HBase European ambassador. His Lineland blogs on HBase gave the best description, outside of the source, of how HBase worked, and at a few critical junctures, carried the community across awkward transitions (e.g., an important blog explained the labyrinthian HBase build during the brief period we thought an Ivy-based build to be a good idea). His luscious diagrams were poached by one and all wherever an HBase presentation was given.

HBase has seen some interesting times, including a period of sponsorship by Microsoft, of all things. Powerset was acquired in July 2008, and after a couple of months during which Powerset employees were disallowed from contributing while Microsofts legal department vetted the HBase codebase to see if it impinged on SQLServer patents, we were allowed to resume contributing (I was a Microsoft employee working near full time on an Apache open source project). The times ahead look promising, too, whether its the variety of contortions HBase is being put through at Facebookas the underpinnings for their massive Facebook mail app or fielding millions of hits a second on their analytics clustersor more deploys along the lines of Yahoo!s 1k node HBase cluster used to host their snapshot of Microsofts Bing crawl. Other developments include HBase running on filesystems other than Apache HDFS, such as MapR.

But plain to me though is that none of these developments would have been possible were it not for the hard work put in by our awesome HBase community driven by a core of HBase committers. Some members of the core have only been around a year or soTodd Lipcon, Gary Helmling, and Nicolas Spiegelbergand we would be lost without them, but a good portion have been there from close to project inception and have shaped HBase into the (scalable) general datastore that it is today. These include Jonathan Gray, who gambled his startup streamy.com on HBase; Andrew Purtell, who built an HBase team at Trend Micro long before such a thing was fashionable; Ryan Rawson, who got StumbleUponwhich became the main sponsor after HBase moved on from Powerset/Microsofton board, and who had the sense to hire John-Daniel Cryans, now a power contributor but just a bushy-tailed student at the time. And then there is Lars, who during the bug fixes, was always about documenting how it all worked. Of those of us who know HBase, there is no better man qualified to write this first, critical HBase book.



[ by Fay Chang et al.

Foreword: Carter Page
Carter Page
Engineering Manager, Bigtable Team, Google

In late 2003, Google had a problem: We were continually building our web index from scratch, and each iteration was taking an entire month, even with all the parallelization we had at our disposal. Whats more the web was growing geometrically, and we were expanding into many new product areas, some of which were personalized. We had a filesystem, called GFS, which could scale to these sizes, but it lacked the ability to update records in place, or to insert or delete new records in sequence.

It was clear that Google needed to build a new database.

There were only a few people in the world who knew how to solve a database design problem at this scale, and fortunately, several of them worked at Google. On November 4, 2003, Jeff Dean and Sanjay Ghemawat committed the first 5 source code files of what was to become Bigtable. Joined by seven other engineers in Mountain View and New York City, they built the first version, which went live in 2004.

To this day, the biggest applications at Google rely on Bigtable: GMail, search, Google Analytics, and hundreds of other applications. A Bigtable cluster can hold many hundreds of petabytes and serve over a terabyte of data each second. Even so, were still working each year to push the limits of its scalability.

The book you have in your hands, or on your screen, will tell you all about how to use and operate HBase, the open-source re-creation of Bigtable. Im in the unusual position to know the deep internals of both systems; and the engineers who, in 2006, set out to build an open source version of Bigtable created something very close in design and behavior.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «HBase: The Definitive Guide»

Look at similar books to HBase: The Definitive Guide. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «HBase: The Definitive Guide»

Discussion, reviews of the book HBase: The Definitive Guide and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.