• Complain

Sev Leonard - Cost-Effective Data Pipelines

Here you can read online Sev Leonard - Cost-Effective Data Pipelines full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2022, publisher: OReilly Media, Inc., genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Sev Leonard Cost-Effective Data Pipelines
  • Book:
    Cost-Effective Data Pipelines
  • Author:
  • Publisher:
    OReilly Media, Inc.
  • Genre:
  • Year:
    2022
  • Rating:
    3 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 60
    • 1
    • 2
    • 3
    • 4
    • 5

Cost-Effective Data Pipelines: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Cost-Effective Data Pipelines" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

The low cost of getting started with cloud services can easily evolve into a significant expense down the road. Thats challenging for teams developing data pipelines, particularly when rapid changes in technology and workload require a constant cycle of redesign. How do you deliver scalable, highly available products while keeping costs in check?With this practical guide, author Sev Leonard provides a holistic approach to designing scalable data pipelines in the cloud. Intermediate data engineers, software developers, and architects will learn how to navigate cost/performance trade-offs and how to choose and configure compute and storage. Youll also pick up best practices for code development, testing, and monitoring.By focusing on the entire design process, youll be able to deliver cost-effective, high-quality products. This book helps youReduce cloud spend with lower cost cloud service offerings and smart design strategiesMinimize waste without sacrificing performance by rightsizing compute resourcesDrive pipeline evolution, head off performance issues, and quickly debug with effective monitoringSet up development and test environments that minimize cloud service dependenciesCreate data pipeline code bases that are testable and extensible, fostering rapid development and evolutionImprove data quality and pipeline operation through validation and testing

Sev Leonard: author's other books


Who wrote Cost-Effective Data Pipelines? Find out the surname, the name of the author of the book and a list of all author's works by series.

Cost-Effective Data Pipelines — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Cost-Effective Data Pipelines" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Cost-Effective Data Pipelines by Sev Leonard Copyright 2023 MXLeonard LLC All - photo 1
Cost-Effective Data Pipelines

by Sev Leonard

Copyright 2023 MXLeonard LLC. All rights reserved.

Printed in the United States of America.

Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.

OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com.

  • Acquisitions Editor: Aaron Black
  • Development Editor: Virginia Wilson
  • Production Editor: Christopher Faucher
  • Interior Designer: David Futato
  • Cover Designer: Karen Montgomery
  • September 2023: First Edition
Revision History for the Early Release
  • 2022-09-21: First Release
  • 2022-12-08: Second Release
  • 2023-01-24: Third Release
  • 2023-03-15: Fourth Release

See http://oreilly.com/catalog/errata.csp?isbn=9781492098645 for release details.

The OReilly logo is a registered trademark of OReilly Media, Inc. Cost-Effective Data Pipelines, the cover image, and related trade dress are trademarks of OReilly Media, Inc.

The views expressed in this work are those of the author and do not represent the publishers views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

978-1-492-09858-4

Preface
A Note for Early Release Readers

With Early Release ebooks, you get books in their earliest formthe authors raw and unedited content as they writeso you can take advantage of these technologies long before the official release of these titles.

This will be the Preface of the final book. Please note that the GitHub repo is available here: https://github.com/gizm00/oreilly_dataeng_book.

If you have comments about how we might improve the content and/or examples in this book, or if you notice missing material within this chapter, please reach out to the editor at vwilson@oreilly.com.

In my work on data pipelines, the biggest cost impact I have seen to date was due to a bug. For months our pipeline was incorrectly transforming data, undetected until our customers noticed the data was wrong.

We could have caught this bug with schema validation tests, which youll learn about in this book. Instead, we spent a significant chunk of our annual cloud bill recomputing the bad data. It cost us the trust of our customers as well, to the point that the validity of the project as a whole was questioned. Sure, the cloud bill was bad, but the costs of inadequate development practices nearly scuttling the project was worse.

If you search the web for cloud cost optimization strategies you may read horror stories about an Amazon Web Services (AWS) Lambda function gone awry, or get some vague advice that you should right size your compute resources without any specifics on how to do it. These are important strategies that you will learn how to do in this book, but theres more to it than that.

My experience has been that costs in cloud data pipeline development come from the difficulties of wrangling a system that spans unknown third party data sources, cloud services, extremely sophisticated big data processing engines, and multiple code bases. Couple this with a fast-paced production environment and you can quickly devolve into a reactive work mode where code turns into spaghetti, pipelines become difficult to evolve and test, and no one knows whats really going on because theres insufficient monitoring.

Altogether this can create an environment where change is hard, resulting in longer lead times for bringing new functionality onboard. Bugs and burnout are common, eroding customer trust and adoption. These issues hit a companys bottom line in more ways than just the cloud bill.

I wrote this book because this isnt the way it has to be. With a focus on effective monitoring, software development best practices, and targeted advice on designing cloud compute and storage, this book will get you set up for success from the outset and enable you to manage the evolution of data pipelines in a cost effective way.

Ive used these approaches in batch and streaming systems, handling anywhere from a few thousand rows to petabytes of data running the gambit from well-defined, structured data to semi-structured sources that change frequently.

Who this book is for

Ive geared the content toward an intermediate to advanced audience, assuming you have some familiarity with software development best practices, some basics about working with cloud compute and storage, and a general idea about how batch and streaming data pipelines operate.

This book is written from my experience in the day to day development of data pipelines. If this is work you do already or aspire to do in the future you can consider this book a virtual mentor, advising you of common pitfalls and providing guidance honed from working on a variety of data pipeline projects.

If youre coming from a data analysis background youll find advice on software best practices to help you build testable, extendable pipelines. This will aid you in connecting analysis with data acquisition and storage to create end to end systems.

Developer velocity and cost conscious design are areas everyone from individual contributors to technical leads should have on their mind. In this book youll find advice on how to build quality into the development process, make efficient use of cloud resources, and reduce costs. Youll also learn the elements that go into monitoring to not only keep tabs on system health and performance, but gain insight into where redesign should be considered as well.

If you manage data engineering teams youll find helpful tips on effective development practices, areas where costs can escalate, and an overall approach to putting the right practices in place to help your team succeed.

What you will learn

If you would like to learn or improve at the following, this book will be a useful guide:

  • Drive pipeline evolution, head off performance issues, and quickly debug with effective monitoring

  • Reduce cloud spend with lower cost cloud service offerings and smart design strategies

  • Minimize waste without sacrificing performance by right sizing compute resources

  • Set up development and test environments that minimize cloud service dependencies

  • Create data pipeline codebases that are testable and extensible, fostering rapid development and evolution

  • Improve data quality and pipeline operation through validation and testing

What this book is not

This is not an architecture book. There are aspects of the guidance I provide that can tie back into architecture and system requirements, but I will not be discussing different architectural approaches or trade-offs. I do not cover topics such as data governance, data cataloging, or data lineage.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Cost-Effective Data Pipelines»

Look at similar books to Cost-Effective Data Pipelines. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Cost-Effective Data Pipelines»

Discussion, reviews of the book Cost-Effective Data Pipelines and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.