Cleaning Excel Data With
Power Query
Straight to the Point
Oz du Soleil
Holy Macro! Books
PO Box 541731, Merritt Island FL 32953
Cleaning Excel Data with Power Query Straight to the Point
2019 by Tickling Keys, Inc.
All rights reserved. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information or storage retrieval system without written permission from the publisher.
All terms known in this book known to be trademarks have been appropriately capitalized. Trademarks are the property of their respective owners and are not affiliated with Holy Macro! Books
Every effort has been made to make this book as complete and accurate as possible, but no warranty or fitness is implied. The information is provided on an as is basis. The authors and the publisher shall have neither liability nor responsibility to any person or entity with respect to any loss or damages arising from the information contained in this book.
First Published: March 2019
Author: Oz du Soleil
Copyeditor: Kitty Wilson
Cover Design: Suat M. Ozgur
Cover Illustration: Shannon Mattiza, 6'4 Productions
Indexer: Nellie Jay
Tech Editor: Bill Jelen
Screen Reader Captions: Deb Govern
Compositor: Jill Cabot
Published by: Holy Macro! Books, PO Box 541731, Merritt Island, FL 32953
Distributed by Independent Publishers Group, Chicago, IL
ISBN 978-1-61547-250-5 PDF, 978-1-61547-371-7 ePub, 978-1-61547-149-2 Mobi
Table of Contents
About the Author
Oz du Soleil is an Excel MVP whos been working with Excel since 2001. Hes co-author, along with Bill Jelen, of Guerrilla Data Analysis , 2nd edition. Oz has several Excel courses on the LinkedIn Learning platform. Hes possibly best known for the dramatic and colorful Excel tutorials he posts on his YouTube channel Excel on Fire .
Data cleansing is Ozs area of specialization in Excel. From his earliest days with Excel, he has found himself constantly needing to fix names that are ALLCAPS, peel addresses away from phone numbers, clean up the messes that result from extracting data from PDF files,and fixing all the many other things that prevent data from being useful.
Most recently, Oz has presented Excel topics and master classes at conferences in Amsterdam; Sofia, Bulgaria; So Paulo, Brazil; Toronto, Canada; and cities around the United States.
When Oz isnt elbows-deep in the guts of a spreadsheet, he does storytelling around Portland, Oregon. He has told stories onstage for Risk! , Pants on Fire , Seven Deadly Sins , Pickathon , The Moth , and other storytelling shows.
Acknowledgments
Thanks to Bill Jelen for giving me the opportunity to share my favorite part of Excel with you. Thanks to the teams at Microsoft who built Power Query for Excel. Special thanks to Guy Hunkin, who has taken special interest in how we use Power Query and how it can be made even better.
Ken Puls and Miguel Escobar have been especially valuable in sharing knowledge about Power Query through blog posts and open challenges that help the community make better use of this tool.
Finally, thanks to the people who have been especially supportive of my efforts over the years, including my YouTube subscribers and the community of Excel MVPs. One day were going to get this worlds data clean!
INTRODUCTION
This book provides a quick Straight-to-the-Point introduction to Power Query. Theres a lot here, and theres also a lot thats not here. Reading this book will be like sitting down at a caf and striking up a conversation with the person at the next table, where you learn enough random stuff about that person to decide if you want to know more and keep in touch. In this case, of course, youll want to know more about Power Query. And there are lots of resources for digging deeper into it.
This book is designed to give you just what you need to experiment with Power Query because thats a great way to learn. I encourage you to play! Make up your own datasets and see what happens. To download the data sets used in this book, visit https://mrx.cl/ozdatapq.
So, What Is Power Query?
A crucial piece of data science, big data, data visualization, predictive analytics, machine learning, and other hot topics is that the dirty work of data cleansing must be performed before the glamorous work can start. Original data comes to us in a variety of messy formats, some of which are deliberate. Data may be sprinkled in multiple locations, such as stacked in columns; you might find a persons ID number in one source and her name in another source; columns of data may not match up; and you may encounter dozens of other messy situations that require cleanup.
Power Query helps you retrieve , cleanse , and transform data to make it useful. Power Query also automates tedious processes that might consume days each week if handled manually.
When I Became a Believer
One afternoon I was working with a clients project, and the data that came to me looked like the data you see below, on the left. This data shows hours worked at a law firm, and you can see that Mel worked 130 hours for Joram in February. Theres a lot of useful data here. The problem is that its in a report form for one specific view.
I wanted to convert the original matrix to sortable columns and rows, like what you see on the right. By making the data sortable, I knew I could access other things:
- Each attorneys totals by monthno matter for which law partner (which might be important if I needed to sum all the hours Mel worked for two partners over the course of three months)
- Who has worked over 550 hours, if anyone
- Each law partners total consumption of hours, by month
I thought about all the possible ways I could make this data sortable, such as by using clumsy formulas or dragging the data around manually. I remembered a YouTube video that MrExcel made on unwinding data, and I thought about looking that up. And then I remembered seeing the term unpivot somewhere in Power Query. Since the matrix looked like a pivot table that needed to be unpivoted, I wondered if thats what I needed. Yes! Unpivot was the solution, and it worked instantly!
Thats the day I embraced the gospel of Power Query.
FIRST, SOME BACKGROUND FOR POWER QUERY
Before we dive into Power Query examples, there are a few things you need to know.
Excel Versions
How you access Power Query depends on the version of Excel youre running:
- In Excel 2016, Power Query is in the Data tab of the ribbon: Its the unintuitively named Get Data option in the Get & Transform Data group. (I wont get into the history of why this ribbon option is not actually called Power Query, but Ill assure you that Microsoft is gradually working the term Power Query back into its products.)
- In Excel 2010 and 2013, Power Query is an add-in that must be downloaded from the Microsoft website: https://www.microsoft.com/en-us/download/details.aspx?id=39379.
- Currently, Power Query is not available for Mac versions of Excel.
Everything Happens via Queries
It is not likely that your source data is in Power Query. Data is typically imported into Power Query from various sources such as tables, other workbooks, folders, databases, Facebook, and SharePoint.