Pro Data Mashupfor Power BIPowering Up with Power Queryand the M Language to Find, Load,and Transform DataAdam AspinPro Data Mashup for Power BI: Powering Up with Power Query and the M Languageto Find, Load, and Transform Data Adam Aspin STAFFORD, UK ISBN-13 (pbk): 978-1-4842-8577-0 ISBN-13 (electronic): 978-1-4842-8578-7 https://doi.org/10.1007/978-1-4842-8578-7 Copyright 2022 by Adam Aspin This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made.
The publisher makes no warranty, express or implied, with respect to the material contained herein. Managing Director, Apress Media LLC: Welmoed Spahr Acquisitions Editor: Jonathan Gennick Development Editor: Laura Berendson Coordinating Editor: Jill Balzano Cover photo by Christopher Burns on Unsplash Distributed to the book trade worldwide by Springer Science+Business Media New York, 1 New York Plaza, Suite 4600, New York, NY 10004-1562, USA. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail orders-ny@ springer-sbm.com, or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation. For information on translations, please e-mail booktranslations@springernature.com; for reprint, paperback, or audio rights, please e-mail bookpermissions@springernature.com.
Apress titles may be purchased in bulk for academic, corporate, or promotional use. eBook versions and licenses are also available for most titles. For more information, reference our Print and eBook Bulk Sales web page at http://www.apress.com/bulk-sales. Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub via the books product page, located at www.apress.com. For more detailed information, please visit http://www.apress.com/source-code. Printed on acid-free paper Table of Contents About the Author xiii About the Technical Reviewer xv Acknowledgments xvii Introduction xix Chapter 1: Discovering and Loading Data with Power BI Desktop 1 The Data Load Process 2 Understanding Data Load 7 The Navigator Dialog 8 Searching for Usable Data 10 Display Options 11 Refresh 11 The Navigator Data Preview 12 Modifying Data 13 Data Sources 14 The Power BI Desktop Screen 16 The Get Data Dialog 16 Conclusion 19 Chapter 2: Discovering and Loading File-Based Data with Power BI Desktop 21 File Sources 21 Loading Data from Files 23 CSV Files 23 Text Files 29 Text and CSV Options 31 Simple XML Files 32 iii Table of ConTenTs Excel Files 34 Microsoft Access Databases 37 PDF Files 39 JSON Files 41 Conclusion 41 Chapter 3: Loading Data from Databases and Data Warehouses 43 Relational Databases 44 SQL Server 47 Automatically Loading Related Tables 52 Database Options 52 Searching for Tables 56 Oracle Databases 63 Other Relational Databases 68 Microsoft SQL Server Analysis Services Data Sources 71 Analysis Services Cube Tools 74 SSAS Tabular Data Warehouses 77 Import or Connect Live/DirectQuery 80 Other Database Connections 80 Conclusion 80 Chapter 4: DirectQuery and Connect Live 83 DirectQuery and Connect Live 83 Microsoft SQL Server Data 86 SQL Server Analysis Services Dimensional Data 93 Microsoft SQL Server Analysis Services Tabular Data Sources 96 DirectQuery with Non-Microsoft Databases 99 DirectQuery and In-Memory Tables 100 DirectQuery and Refreshing the Data 101 DirectQuery Optimization 101 iv Table of ConTenTs Modifying Connections 102 Changing Permissions 105 Conclusion 108 Chapter 5: Loading Data from the Web and the Cloud 109 Web and Cloud Services 110 Web Pages 110 Online Services 110 Microsoft Azure 112 Web Pages 113 Advanced Web Options 116 Table View or Web View 117 Salesforce 118 Loading Data from Salesforce Objects 118 Salesforce Reports 125 Microsoft Dynamics 365 126 Google Analytics 131 OData Feeds 134 OData Options 135 Azure SQL Database 136 Azure Synapse Analytics 142 Connecting to SQL Server on an Azure Virtual Machine 145 Azure Blob Storage 147 Azure Databricks 151 Azure Security 154 Conclusion 154 Chapter 6: Loading Data from Other Data Sources 155 Other Sources 155 Power BI Datasets 156 Power BI Dataflows 158 v Table of ConTenTs R Scripts 161 R Options 165 Python Scripts to Load Data 166 Python Options 169 Dataverse 170 ODBC Sources 171 Refreshing Data 181 Refreshing the Entire Data in the Power BI Desktop In- Memory Model 181 Refreshing an Individual Table 182 Adding Your Own Data 183 Conclusion 185 Chapter 7: Power Query 187 Power BI Desktop Queries 188 Editing Data After a Data Load 189 Transforming Data Before Loading 192 Transform or Load? 194 Power Query 195 The Applied Steps List 197 Power Query Ribbons 197 The View Ribbon 205 The Tools Ribbon 207 The Help Ribbon 208 Viewing a Full Record 208 Power Query Context Menus 209 Conclusion 211 Chapter 8: Structuring Data 213 Dataset Shaping 214 Renaming Columns 215 Reordering Columns 215 Removing Columns 217 vi Table of ConTenTs Choosing Columns 218 Merging Columns 220 Going to a Specific Column 223 Removing Records 225 Rows 226 Removing Duplicate Records 230 Sorting Data 231 Reversing the Row Order 233 Undoing a Sort Operation 233 Filtering Data 234 Selecting Specific Values 235 Finding Elements in the Filter List 236 Filtering Text Ranges 238 Filtering Numeric Ranges 238 Filtering Date and Time Ranges 239 Filtering Data 241 Applying Advanced Filters 243 Excluding Rows Where a Value Is Missing 245 Grouping Records 246 Simple Groups 247 Complex Groups 249 Saving Changes in Power Query 252 Exiting Power Query 252 Conclusion 253 Chapter 9: Shaping Data 255 Merging Data 256 Adding Data 256 Aggregating Data During a Merge Operation 261 Types of Join 265 Joining on Multiple Columns 266 Fuzzy Matching 268 vii Table of ConTenTs Fuzzy Matching Options 269 Merge As New Query 270 Preparing Datasets for Joins 271 Correct and Incorrect Joins 271 Examining Joined Data 272 Appending Data 274 Adding the Contents of One Query to Another 274 Appending the Contents of Multiple Queries 277 Changing the Data Structure 279 Unpivoting Tables 279 Pivoting Tables 282 Transposing Rows and Columns 284 Data Quality Analysis 284 Column Quality 285 Column Distribution 285 Column Profile 286 Profiling the Entire Dataset 288 Correcting Anomalies 289 Data Transformation Approaches 291 Conclusion 291 Chapter 10: Data Cleansing 293 Using the First Row As Headers 294 Changing Data Type 295 Detecting Data Types 298 Data Type Indicators 298 Switching Data Types 300 Data Type Using Locale 301 Replacing Values 302 Transforming Column Contents 305 Text Transformation 305 Adding a Prefix or a Suffix 307 viii Table of ConTenTs Removing Leading and Trailing Spaces 307 Removing Nonprinting Characters 308 Number Transformations 309 Conclusion 319 Chapter 11: Data Transformation 321 Filling Down Empty Cells 322 Extracting Part of a Columns Contents 326 Advanced Extract Options 327 Duplicating Columns 330 Splitting Columns 330 Splitting Column by a Delimiter 331 Advanced Options for Delimiter Split 334 Splitting Columns by Number of Characters 335 Splitting Columns by Character Switch 337 Merging Columns 337 Creating Columns from Examples 339 Adding Conditional Columns 341 Index Columns 344 Conclusion 345 Chapter 12: Complex Data Structures 347 Adding Multiple Files from a Source Folder 348 Filtering Source Files in a Folder 351 Displaying and Filtering File Attributes 355 The List Tools Transform Ribbon 357 Parsing XML Data from a Column 358 Parsing JSON Data from a Column 361 Complex JSON Files 363 Complex XML Files 365 Python and R Scripts 369 Using Python Scripts to Modify Data 370 ix Table of ConTenTs Using R Scripts to Modify Data 373 Convert a Column to a List 375 Query Folding 376 Reusing Data Sources 380 Pinning a Data Source 383 Copying Data from Power Query 384 Conclusion 385 Chapter 13: Organizing, Managing, and Parameterizing Queries 387 Managing the Transformation Process 388 Modifying a Step 389 Renaming a Step 390 Deleting a Step or a Series of Steps 390 Modifying an Existing Step 392 Adding a Step 395 Altering Process Step Sequencing 395 An Approach to Sequencing 395 Error Records 397 Removing Errors 397 Managing Queries 398 Organizing Queries 398 Grouping Queries 399 Duplicating Queries 402 Referencing Queries 402 Documenting Queries 405 Adding a Column As a New Query 406 Enabling Data Load 408 Enabling Report Refresh 409 Pending Changes 410 Parameterizing Queries 410 Creating a Simple Parameter 411 Creating a Set of Parameter Values 413 x Table of ConTenTs Creating a Query-Based Parameter 416 Modifying a Parameter 419 Applying a Parameter When Filtering Records 420 Modifying the Current Value of a Parameter 422 Applying a Parameter in a Search and Replace 423 Applying a Parameter to a Data Source 424 Other Uses for Parameters 427 Using Parameters in the Data Source Step 428 Applying a Parameter to a SQL Query 429 Query Icons 431 Power BI Templates with Parameters 431 Conclusion 432 Chapter 14: The M Language 433 What Is the M Language? 434 M and the Power Query Editor 435 Modifying the Code for a Step 436 M Expressions 438 Writing M by Adding Custom Columns 439 The Advanced Editor 441 Expressions in the Advanced Editor 442 The Let Statement 443 Modifying M in the Advanced Editor 444 Syntax Checking 445 Advanced Editor Options 446 Basic M Functions 447 Text Functions 448 M or DAX? 450 Number Functions 450 Date Functions 452 Time Functions 454 xi Table of ConTenTs Duration Functions 454 M Concepts 455 M Data Types 456 M Values 458 Defining Your Own Variables in M 458 Writing M Queries 459 M Autocomplete 460 Lists 462 Creating Lists Manually 462 Generating Sequences Using Lists 464 Accessing Values from a List 465 List Functions 465 Records 466 Tables 468 Other Function Areas 470 Custom Functions in M 470 Adding Comments to M Code 472 Single-Line Comments 473 Multiline Comments 473 Conclusion 473 Appendix A: Sample Data 475 Sample Data 475 Downloading the Sample Data 475 Index 477 xii
Next page