• Complain

Naidu Siddartha - Google BigQuery Analytics

Here you can read online Naidu Siddartha - Google BigQuery Analytics full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. City: Indianpoolis, year: 2014, publisher: John Wiley & Sons, Inc, genre: Politics. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Naidu Siddartha Google BigQuery Analytics
  • Book:
    Google BigQuery Analytics
  • Author:
  • Publisher:
    John Wiley & Sons, Inc
  • Genre:
  • Year:
    2014
  • City:
    Indianpoolis
  • Rating:
    3 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 60
    • 1
    • 2
    • 3
    • 4
    • 5

Google BigQuery Analytics: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Google BigQuery Analytics" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

How to effectively use BigQuery, avoid common mistakes, and execute sophisticated queries against large datasets

Google BigQuery Analytics is the perfect guide for business and data analysts who want the latest tips on running complex queries and writing code to communicate with the BigQuery API. The book uses real-world examples to demonstrate current best practices and techniques, and also explains and demonstrates streaming ingestion, transformation via Hadoop in Google Compute engine, AppEngine datastore integration, and using GViz with Tableau to generate charts of query results. In addition to the mechanics of BigQuery, the book also covers the architecture of the underlying Dremel query engine, providing a thorough understanding that leads to better query results.

  • Features a companion website that includes all code and data sets from the book
  • Uses real-world examples to explain everything analysts need to know to effectively use BigQuery
  • Includes web application examples coded in Python

Naidu Siddartha: author's other books


Who wrote Google BigQuery Analytics? Find out the surname, the name of the author of the book and a list of all author's works by series.

Google BigQuery Analytics — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Google BigQuery Analytics" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make

Part I BigQuery Fundamentals In This Part Chapter 1 The Story of Big Data at - photo 1

Part I
BigQuery Fundamentals
In This Part
  1. Chapter 1: The Story of Big Data at Google
  2. Chapter 2: BigQuery Fundamentals
  3. Chapter 3: Getting Started with BigQuery
  4. Chapter 4: Understanding the BigQuery Object Model
Chapter 1
The Story of Big Data at Google

Since its founding in 1998, Google has grown by multiple orders of magnitude in several different dimensionshow many queries it handles, the size of the search index, the amount of user data it stores, the number of services it provides, and the number of users who rely on those services. From a hardware perspective, the Google Search engine has gone from a server sitting under a desk in a lab at Stanford to hundreds of thousands of servers located in dozens of datacenters around the world.

The traditional approach to scaling (outside of Google) has been to scale the hardware up as the demands on it grow. Instead of running your database on a small blade server, run it on a Big Iron machine with 64 processors and a terabyte of RAM. Instead of relying on inexpensive disks, the traditional scaling path moves critical data to costly network-attached storage (NAS).

There are some problems with the scale-up approach, however:

  • Scaled-up machines are expensive. If you need one that has twice the processing power, it might cost you five times as much.
  • Scaled-up machines are single points of failure. You might need to get more than one expensive server in case of a catastrophic problem, and each one usually ends up being built with so many backup and redundant pieces that you're paying for a lot more hardware than you actually need.
  • Scale up has limits. At some point, you lose the ability to add more processors or RAM; you've bought the most expensive and fastest machine that is made (or that you can afford), and it still might not be fast enough.
  • Scale up doesn't protect you against software failures. If you have a Big Iron server that has a kernel bug, that machine will crash just as easily (and as hard) as your Windows laptop.

Google, from an early point in time, rejected scale-up architectures. It didn't, however, do this because it saw the limitations more clearly or because it was smarter than everyone else. It rejected scale-up because it was trying to save money. If the hardware vendor quotes you $1 million for the server you need, you could buy 200 $5,000 machines instead. Google engineers thought, Surely there is a way we could put those 200 servers to work so that the next time we need to increase the size, we just need to buy a few more cheap machines, rather than upgrade to the $5 million server. Their solution was to scale out, rather than scale up.

Big Data Stack 1.0

Between 2000 and 2004, armed with a few principles, Google laid the foundation for its Big Data strategy:

  • Anything can fail, at any time, so write your software expecting unreliable hardware. At most companies, when a database server crashes, it is a serious event. If a network switch dies, it will probably cause downtime. By running in an environment in which individual components fail often, you paradoxically end up with a much more stable system because your software is designed to handle those failures. You can quantify your risk beyond blindly quoting statistics, such as mean time between failures (MTBFs) or service-level agreements (SLAs).
  • Use only commodity, off-the-shelf components. This has a number of advantages: You don't get locked into a particular vendor's feature set; you can always find replacements; and you don't experience big price discontinuities when you upgrade to the bigger version.
  • The cost for twice the amount of capacity should not be considerably more than the cost for twice the amount of hardware. This means the software must be built to scale out, rather than up. However, this also imposes limits on the types of operations that you can do. For instance, if you scale out your database, it may be difficult to do a JOIN operation, since you'd need to join data together that lives on different machines.
  • A foolish consistency is the hobgoblin of little minds. If you abandon the C (consistency) in ACID database operations, it becomes much easier to parallelize operations. This has a cost, however; loss of consistency means that programmers have to handle cases in which reading data they just wrote might return a stale (inconsistent) copy. This means you need smart programmers.

These principles, along with a cost-saving necessity, inspired new computation architectures. Over a short period of time, Google produced three technologies that inspired the Big Data revolution:

  • Google File System (GFS): A distributed, cluster-based filesystem. GFS assumes that any disk can fail, so data is stored in multiple locations, which means that data is still available even when a disk that it was stored on crashes.
  • MapReduce: A computing paradigm that divides problems into easily parallelizable pieces and orchestrates running them across a cluster of machines.
  • Bigtable: A forerunner of the NoSQL database, Bigtable enables structured storage to scale out to multiple servers. Bigtable is also replicated, so failure of any particular tablet server doesn't cause data loss.

What's more, Google published papers on these technologies, which enabled others to emulate them outside of Google. Doug Cutting and other open source contributors integrated the concepts into a tool called Hadoop. Although Hadoop is considered to be primarily a MapReduce implementation, it also incorporates GFS and BigTable clones, which are called HDFS and HBase, respectively.

Armed with these three technologies, Google replaced nearly all the off-the-shelf software usually used to run a business. It didn't need (with a couple of exceptions) a traditional SQL database; it didn't need an e-mail server because its Gmail service was built on top of these technologies.

Big Data Stack 2.0 (and Beyond)

The three technologiesGFS, MapReduce, and Bigtablemade it possible for Google to scale out its infrastructure. However, they didn't make it easy. Over the next few years, a number of problems emerged:

  • MapReduce is hard. It can be difficult to set up and difficult to decompose your problem into Map and Reduce phases. If you need multiple MapReduce rounds (which is common for many real-world problems), you face the issue of how to deal with state in between phases and how to deal with partial failures without having to restart the whole thing.
  • MapReduce can be slow. If you want to ask questions of your data, you have to wait minutes or hours to get the answers. Moreover, you have to write custom C++ or Java code each time you want to change the question that you're asking.
  • GFS, while improving durability of the data (since it is replicated multiple times) can suffer from reduced availability, since the metadata server is a single point of failure.
  • Bigtable has problems in a multidatacenter environment. Most services run in multiple locations; Bigtable replication between datacenters is only eventually consistent (meaning that data that gets written out will show up everywhere, but not immediately). Individual services spend a lot of redundant effort babysitting the replication process.
  • Programmers (even Google programmers) have a really difficult time dealing with eventual consistency. This same problem occurred when Intel engineers tried improving CPU performance by relaxing the memory model to be eventually consistent; it caused lots of subtle bugs because the hardware stopped working the way people's mental model of it operated.
Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Google BigQuery Analytics»

Look at similar books to Google BigQuery Analytics. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Google BigQuery Analytics»

Discussion, reviews of the book Google BigQuery Analytics and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.