• Complain

Rukmani Gopalan - The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture. Early Release

Here you can read online Rukmani Gopalan - The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture. Early Release full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2023, publisher: OReilly Media, genre: Business. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Rukmani Gopalan The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture. Early Release

The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture. Early Release: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture. Early Release" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

More organizations than ever understand the importance of data lake architectures for deriving value from their data. Building a robust, scalable, and performant data lake remains a complex proposition, however, with a buffet of tools and options that need to work together to provide a seamless end-to-end pipeline from data to insights.

This book provides a concise yet comprehensive overview on the setup, management, and governance of a cloud data lake. Author Rukmani Gopalan, product management leader at Microsoft, guides data architects and engineers through the major aspects of working with a cloud data lake, from design considerations and best practices to data format optimizations, performance optimization, cost management, and governance.

  • Learn the benefits of a cloud-based big data strategy for your organization
  • Get guidance and best practices for designing performant and scalable data lakes
  • Examine architecture and design choices, and data governance principles and strategies
  • Build a data strategy that scales as your organizational and business needs increase
  • Implement a scalable data lake in the cloud
  • Use cloud-based advanced analytics to gain more value from your data

Rukmani Gopalan: author's other books


Who wrote The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture. Early Release? Find out the surname, the name of the author of the book and a list of all author's works by series.

The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture. Early Release — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture. Early Release" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
The Cloud Data Lake by Rukmani Gopalan Copyright 2022 Rukmani Gopalan All - photo 1
The Cloud Data Lake

by Rukmani Gopalan

Copyright 2022 Rukmani Gopalan. All rights reserved.

Printed in the United States of America.

Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.

OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .

  • Editors: Andy Kwan and Jill Leonard
  • Production Editor: Ashley Stussy
  • Interior Designer: David Futato
  • Cover Designer: Karen Montgomery
  • Illustrator: Kate Dullea
  • March 2023: First Edition
Revision History for the Early Release
  • 2022-05-03: First Release
  • 2022-06-16: Second Release
  • 2022-07-15: Third Release
  • 2022-08-18: Fourth Release

See http://oreilly.com/catalog/errata.csp?isbn=9781098116583 for release details.

The OReilly logo is a registered trademark of OReilly Media, Inc. The Cloud Data Lake, the cover image, and related trade dress are trademarks of OReilly Media, Inc.

The views expressed in this work are those of the author(s), and do not represent the publishers views. While the publisher and the author(s) have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author(s) disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

978-1-098-11652-1

Chapter 1. Big Data - Beyond the Buzz
A Note for Early Release Readers

With Early Release ebooks, you get books in their earliest formthe authors raw and unedited content as they writeso you can take advantage of these technologies long before the official release of these titles.

This will be the 1st chapter of the final book.

If you have comments about how we might improve the content and/or examples in this book, or if you notice missing material within this chapter, please reach out to the author at .

Without big data, you are blind and deaf and in the middle of a freeway.

Geoffrey Moore

If we were playing workplace Bingo, there is a high chance you would win a full house by crossing off all these words that you have heard in your organization in the past 3 months - digital transformation, data strategy, transformational insights, data lake, warehouse, data science, machine learning, and intelligence. It is now common knowledge that data is a key ingredient for organizations to succeed, and organizations that rely on data and AI clearly outperform their contenders. According to an IDC study sponsored by Seagate, the amount of data that is captured, collected, or replicated is expected to grow to 175 ZB by the year 2025. This data that captured, collected, or replicated is referred to as the Global Datasphere. This data comes from three classes of sources :

  • The core - traditional or cloud based datacenters.

  • The edge - hardened infrastructure, such as the cell towers.

  • The endpoints - PC, tablets, smartphones, and IoT devices.

This study also predicts that 49% of this Global Datasphere will be residing in public cloud environments by the year 2025.

If you have ever wondered, Why does this data need to be stored? What is it good for?, the answer is very simple - think of all of these data available as bits and pieces of words strewn around the globe in different languages and scripts, each sharing a sliver of information, like a piece in a puzzle. Stitching them together in a meaningful fashion tells a story that not only informs, but also could transform businesses, people, and even how this world runs. Most successful organizations already leverage data to understand the growth drivers for their businesses and the perceived customer experiences and taking the rightful action - looking at the funnel or customer acquisition, adoption, engagement, and retention are now largely the lingua franca of funding product investments. These types of data processing and analysis are referred to as business intelligence, or BI, and are classified as offline insights. Essentially, the data and the insights are crucial in presenting the trend that shows growth so the business leaders can take action, however, this workstream is separate to the core business logic that is used to run the business itself. As the maturity of the data platform grows, an inevitable signal we get from all custoemrs is that they start getting more requests to run more scenarios on their data lake, truly adhering to the Data is the new oil idiom.

Organizations leverage data to understand the growth drivers for their business and the perceived customer experience. They can then leverage data to set targets and drive improvements in customer experience with better support and newer features, they can additionally create better marketing strategies to grow their business and also drive efficiencies to lower their cost of building their products and organizations. Starbucks, the coffee shop that is present around the globe, uses data in every place possible to continously measure and improve their business. They use the data from their mobile applications and correlate that with their ordering system to better understand customer usage patterns and send targeted marketing campaigns. They use sensors on their coffee machines that emit health data every few seconds, and this data is analyzed to drive improvements into their predictive maintenance, they also use these connected coffee machines to download recipes to their coffee machines without involving human intervention. As the world is just learning to cope with the pandemic, organizations are leveraging data heavily to not just transform their businesses, but also to measure the health and productivity of their organizations to help their employees feel connected and minimize burn out. Overall, data is also used for world saving initiatives such as Project Zamba that leverages artificial intelligence for wildlife research and conservation in the remote jungles of Africa, and leveraging IoT and data science to create a circular economy to promote environmental sustaintability.

1.1 What is Big Data?

In all the examples we saw above, there are a few things in common.

  • Data can come in all kinds of shape and formats - it could be a few bytes emitted from an IoT sensor, social media data dumps, files from LOB systems and relational databases, and sometimes even audio and video content.

  • The processing scenarios of this data is vastly different - whether it is data science, SQL like queries or any other custom processing.

  • As studies show, this data is not just high volume, but also could arrive at various speeds, as one large dump like data ingested in batches from relational databases, or continously streamed like clickstream data or IoT data.

These are some of the characteristics of Big data. Big data processing refers to the set of tools and technologies that are used to store, manage, and analyze data without posing any restrictions or assumptions on the source, the format, or the size of the data.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture. Early Release»

Look at similar books to The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture. Early Release. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture. Early Release»

Discussion, reviews of the book The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture. Early Release and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.