Praise for Google BigQuery: The Definitive Guide
This book is essential to the rapidly growing list of businesses that are migrating their existing enterprise data warehouses from legacy technology stacks to Google Cloud. Lak and Jordan provide a comprehensive coverage of BigQuery so that you can use it not only as your Enterprise Data Warehouse, for business analytics but also use SQL to query real-time data streams; access BigQuery from managed Hadoop and Spark clusters; and use machine learning to automatically categorize and run forecasting and predictions on your data .
Thomas Kurian, CEO, Google Cloud
Every once in a great while a piece of software or service comes along that changes everything. BigQuery has changed the way enterprises can think about their data, all of it. Designed from the beginning to handle the worlds largest datasets, BigQuery has gone on to be one of the best platforms for analyzing and learning from data. Announced in June 2016, Standard SQL is one of the most clean, complete, powerful, implementations of SQL ever designed. Powerful features include deeply nested data, user defined functions in JavaScript and SQL, geospatial data, integrated machine learning, and URL addressable data sharing, just to name a few. There is no better place to learn about BigQuery than from this book by Jordan and Lak, two of the people who know BigQuery best.
Lloyd Tabb, Cofounder and CTO, Looker
Even though Ive been using BigQuery for over seven years, I was pleased to discover that this book taught me things I never knew about it! It provides invaluable insights into best practices and techniques, and explains concepts in an easy to understand fashion. The code examples are a great way to follow the content in a practical, hands-on manner, and they kept the book fun and engaging. This book will undoubtedly become the go-to reference for BigQuery users.
Graham Polley, Managing Consultant, Servian
BigQuery can handle a lot of data very fast and at a low cost. The platform is there to help you get all your data in one place for faster insights. This book is a deep dive into key parts of BigQuery. In this quest along with two prominent legendary Googlers Lak Lakshmanan and Jordan Tiganiyoull learn the essentials of BigQuery as well as advanced topics like machine learning. Im a huge BigQuery advocate. Having used the tool firsthand, I can say that it will easily make your big data life a lot easier. This was an amazing read and now the BigQuery journey starts for you! Jump in!
Mikhail Berlyant, SVP Technology, Viant Inc.
Google BigQuery: The Definitive Guide
by Valliappa Lakshmanan and Jordan Tigani
Copyright 2020 Valliappa Lakshmanan and Jordan Tigani. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .
- Editor: Nicole Tach
- Production Editor: Kristen Brown
- Copyeditor: Octal Publishing, LLC
- Proofreader: Arthur Johnson
- Indexer: Ellen Troutman-Zaig
- Interior Designer: David Futato
- Cover Designer: Karen Montgomery
- Illustrator: Rebecca Demarest
- October 2019: First Edition
Revision History for the First Edition
- 2019-10-23: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781492044468 for release details.
The OReilly logo is a registered trademark of OReilly Media, Inc. Google BigQuery: The Definitive Guide, the cover image, and related trade dress are trademarks of OReilly Media, Inc.
The views expressed in this work are those of the authors, and do not represent the publishers views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
978-1-492-04446-8
[LSI]
Preface
Enterprises are becoming increasingly data driven, and a key component of any enterprises data strategy is a data warehousea central repository of integrated data from all across the company. Traditionally, the data warehouse was used by data analysts to create analytical reports. But now it is also increasingly used to populate real-time dashboards, to make ad hoc queries, and to provide decision-making guidance through predictive analytics. Because of these business requirements for advanced analytics and a trend toward cost control, agility, and self-service data access, many organizations are moving to cloud-based data warehouses such as Google BigQuery.
In this book, we provide a thorough tour of BigQuery, a serverless, highly scalable, low-cost enterprise data warehouse that is available on Google Cloud. Because there is no infrastructure to manage, enterprises can focus on analyzing data to find meaningful insights using familiar SQL.
Our goal with BigQuery has been to build a data platform that provides leading-edge capabilities, takes advantage of the many great technologies that are now available in cloud environments, and supports tried-and-true data technologies that are still relevant today. For example, on the leading edge, Googles BigQuery is a serverless compute architecture that decouples compute and storage. This enables diverse layers of the architecture to perform and scale independently, and it gives data developers flexibility in design and deployment. Somewhat uniquely, BigQuery supports native machine learning and geospatial analysis. With Cloud Pub/Sub, Cloud Dataflow, Cloud Bigtable, Cloud AI Platform, and many third-party integrations, BigQuery interoperates with both traditional and modern systems, at a wide range of desired throughput and latency. And on the tried-and-true front, BigQuery supports ANSI-standard SQL, columnar optimization, and federated queries, which are key to the self-service ad hoc data exploration that many users demand.
Who Is This Book For?
This book is for data analysts, data engineers, and data scientists who want to use BigQuery to derive insights from large datasets. Data analysts can interact with BigQuery through SQL and via dashboarding tools like Looker, Data Studio, and Tableau. Data engineers can integrate BigQuery with data pipelines written in Python or Java and using frameworks such as Apache Spark and Apache Beam. Data scientists can build machine learning models in BigQuery, run TensorFlow models on data in BigQuery, and delegate distributed, large-scale operations to BigQuery from within a Jupyter notebook.
Conventions Used in This Book
The following typographical conventions are used in this book:
ItalicIndicates new terms, URLs, email addresses, filenames, and file extensions.
Constant width
Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.