• Complain

Tye Rattenbury - Principles of Data Wrangling: Practical Techniques for Data Preparation

Here you can read online Tye Rattenbury - Principles of Data Wrangling: Practical Techniques for Data Preparation full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2017, publisher: O’Reilly Media, genre: Politics. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Tye Rattenbury Principles of Data Wrangling: Practical Techniques for Data Preparation

Principles of Data Wrangling: Practical Techniques for Data Preparation: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Principles of Data Wrangling: Practical Techniques for Data Preparation" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

A key task that any aspiring data-driven organization needs to learn is data wrangling, the process of converting raw data into something truly useful. This practical guide provides business analysts with an overview of various data wrangling techniques and tools, and puts the practice of data wrangling into context by asking, What are you trying to do and why?

Wrangling data consumes roughly 50-80% of an analysts time before any kind of analysis is possible. Written by key executives at Trifacta, this book walks you through the wrangling process by exploring several factorstime, granularity, scope, and structurethat you need to consider as you begin to work with data. Youll learn a shared language and a comprehensive understanding of data wrangling, with an emphasis on recent agile analytic processes used by many of todays data-driven organizations.

Appreciate the importanceand the satisfactionof wrangling data the right way.

  • Understand what kind of data is available
  • Choose which data to use and at what level of detail
  • Meaningfully combine multiple sources of data
  • Decide how to distill the results to a size and shape that can drive downstream analysis

Tye Rattenbury: author's other books


Who wrote Principles of Data Wrangling: Practical Techniques for Data Preparation? Find out the surname, the name of the author of the book and a list of all author's works by series.

Principles of Data Wrangling: Practical Techniques for Data Preparation — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Principles of Data Wrangling: Practical Techniques for Data Preparation" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Principles of Data Wrangling

by Tye Rattenbury , Joseph M. Hellerstein , Jeffrey Heer , Sean Kandel , and Connor Carreras

Copyright 2017 Trifacta, Inc. All rights reserved.

Printed in the United States of America.

Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.

OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .

  • Editor: Shannon Cutt
  • Production Editor: Kristen Brown
  • Copyeditor: Bob Russell, Octal Publishing, Inc.
  • Proofreader: Christina Edwards
  • Interior Designer: David Futato
  • Cover Designer: Karen Montgomery
  • Illustrator: Rebecca Demarest
  • May 2017: First Edition
Revision History for the First Edition
  • 2017-04-25: First Release
  • 2017-06-27: Second Release

The OReilly logo is a registered trademark of OReilly Media, Inc. Principles of Data Wrangling, the cover image, and related trade dress are trademarks of OReilly Media, Inc.

While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

978-1-491-93892-8

[LSI]

Foreword

Through the last decades of the twentieth century and into the twenty-first, data was largely a medium for bottom-line accounting: making sure that the books were balanced, the rules were followed, and the right numbers could be rolled up for executive decision-making. It was an era focused on a select group of IT staff engineering the golden master of organizational data; an era in which mantras like garbage in, garbage out captured the attitude that only carefully engineered data was useful.

Attitudes toward data have changed radically in the past decade, as new people, processes, and technologies have come forward to define the hallmarks of a data-driven organization. In this context, data is a medium for top-line value generation, providing evidence and content for the design of new products, new processes, and evermore efficient operation. Todays data-driven organizations have analysts working broadly across departments to find methods to use data creatively. It is an era in which new mantras like extracting signal from the noise capture a different attitude of agile experimentation and exploitation of large, diverse sources of data.

Of course, accounting still needs to get done in the twenty-first century, and the need remains to curate select datasets. But the data sources and processes for accountancy are relatively small and slow to change. The data that drives creative and exploratory analyses represents an (exponentially!) growing fraction of the data in most organizations, driving widespread rethinking of processes for data and computingincluding the way that IT organizations approach their traditional tasks.

The phrase data wrangling, born in the modern context of agile analytics, is meant to describe the lions share of the time people spend working with data. There is a common misperception that data analysis is mostly a process of running statistical algorithms on high-performance data engines. In practice, this is just the final step of a longer and more complex process; 50 to 80 percent of an analysts time is spent wrangling data to get it to the point at which this kind of analysis is possible. Not only does data wrangling consume most of an analysts workday, it also represents much of the analysts professional process: it captures activities like understanding what data is available; choosing what data to use and at what level of detail; understanding how to meaningfully combine multiple sources of data; and deciding how to distill the results to a size and shape that can drive downstream analysis. These activities represent the hard work that goes into both traditional data curation and modern data analysis. And in the context of agile analytics, these activities also capture the creative and scientific intuition of the analyst, which can dictate different decisions for each use case and data source.

We have been working on these issues with data-centric folks of various stripesfrom the IT professionals who fuel data infrastructure in large organizations, to professional data analysts, to data-savvy enthusiasts in roles from marketing to journalism to science and social causes. Much is changing across the board here. This book is our effort to wrangle the lessons we have learned in this context into a coherent overview, with a specific focus on the more recent and quickly growing agile analytic processes in data-driven organizations. Hopefully, some of these lessons will help to clarify the importanceand yes, the satisfactionof data wrangling done well.

Chapter 1. Introduction

Lets begin with the most important question: why should you read this book? The answer is simple: you want more value from your data. To put a little more meat on that statement, our objective in writing this book is to help the variety of people who manage the analysis or application of data in their organizations. The data might or might not be yours, in the strict sense of ownership. But the pains in extracting value from this data are.

Were focused on two kinds of readers. First are people who manage the analysis and application of data indirectlythe managers of teams or directors of data projects. Second are people who work with data directlythe analysts, engineers, architects, statisticians, and scientists.

If youre reading this book, youre interested in extracting value from data. We can categorize this value into two types along a temporal dimension: near-term value and long-term value. In the near term, you likely have a sizable list of questions that you want to answer using your data. Some of these questions might be vague; for example, Are people really shifting toward interacting with us through their mobile devices? Other questions might be more specific: When will our customers interactions primarily originate from mobile devices instead of from desktops or laptops?

What is stopping you from answering these questions? The most common answer we hear is time. You know the questions, you know how to answer them, but you just dont have enough hours in the day to wrangle your data into the right form.

Beyond the list of known questions related to the near-term value of your data is the optimism that your data has greater potential long-term value. Can you use it to forecast important seasonal changes? What about risks in your supply chain due to weather or geopolitical shifts? Can you understand how the move to mobile is affecting your customers purchasing patterns? Organizations generally hire data scientists to take on these longer-term, exploratory analyses. But even if you have the requisite skills to tackle these kinds of analyses, you might still struggle to be allocated sufficient time and resources. After all, exploratory analytics projects can take months, and often contain a nontrivial risk of producing primarily negative or ambiguous results.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Principles of Data Wrangling: Practical Techniques for Data Preparation»

Look at similar books to Principles of Data Wrangling: Practical Techniques for Data Preparation. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Principles of Data Wrangling: Practical Techniques for Data Preparation»

Discussion, reviews of the book Principles of Data Wrangling: Practical Techniques for Data Preparation and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.