Contents in Detail
PRACTICAL SQL
A Beginners Guide to Storytelling with Data
by Anthony DeBarros
San Francisco
PRACTICAL SQL. Copyright 2018 by Anthony DeBarros.
All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher.
ISBN-10: 1-59327-827-6
ISBN-13: 978-1-59327-827-4
Publisher: William Pollock
Production Editor: Janelle Ludowise
Cover Illustration: Josh Ellingson
Interior Design: Octopod Studios
Developmental Editors: Liz Chadwick and Annie Choi
Technical Reviewer: Josh Berkus
Copyeditor: Anne Marie Walker
Compositor: Janelle Ludowise
Proofreader: James Fraleigh
For information on distribution, translations, or bulk sales, please contact No Starch Press, Inc. directly:
No Starch Press, Inc.
245 8th Street, San Francisco, CA 94103
phone: 1.415.863.9900;
www.nostarch.com
Library of Congress Cataloging-in-Publication Data
Names: DeBarros, Anthony, author.
Title: Practical SQL : a beginner's guide to storytelling with data / Anthony
DeBarros.
Description: San Francisco : No Starch Press, 2018. | Includes index.
Identifiers: LCCN 2018000030 (print) | LCCN 2017043947 (ebook) | ISBN
9781593278458 (epub) | ISBN 1593278454 (epub) | ISBN 9781593278274
(paperback) | ISBN 1593278276 (paperback) | ISBN 9781593278458 (ebook)
Subjects: LCSH: SQL (Computer program language) | Database design. | BISAC:
COMPUTERS / Programming Languages / SQL. | COMPUTERS / Database Management
/ General. | COMPUTERS / Database Management / Data Mining.
Classification: LCC QA76.73.S67 (print) | LCC QA76.73.S67 D44 2018 (ebook) |
DDC 005.75/6--dc23
LC record available at https://lccn.loc.gov/2018000030
No Starch Press and the No Starch Press logo are registered trademarks of No Starch Press, Inc. Other product and company names mentioned herein may be the trademarks of their respective owners. Rather than use a trademark symbol with every occurrence of a trademarked name, we are using the names only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark.
The information in this book is distributed on an As Is basis, without warranty. While every precaution has been taken in the preparation of this work, neither the author nor No Starch Press, Inc. shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the information contained in it.
About the Author
Anthony DeBarros is an award-winning journalist who has combined avid interests in data analysis, coding, and storytelling for much of his career. He spent more than 25 years with the Gannett company, including the Poughkeepsie Journal, USA TODAY, and Gannett Digital. He is currently senior vice president for content and product development for a publishing and events firm and lives and works in the Washington, D.C., area.
About the Technical Reviewer
Josh Berkus is a hacker emeritus for the PostgreSQL Project, where he served on the Core Team for 13 years. He was also a database consultant for 15 years, working with PostgreSQL, MySQL, CitusDB, Redis, CouchDB, Hadoop, and Microsoft SQL Server. Josh currently works as a Kubernetes community manager at Red Hat, Inc.
BRIEF CONTENTS
FOREWORD
When people ask which programming language I learned first, I often absent-mindedly reply, Python, forgetting that it was actually with SQL that I first learned to write code. This is probably because learning SQL felt so intuitive after spending years running formulas in Excel spreadsheets. I didnt have a technical background, but I found SQLs syntax, unlike that of many other programming languages, straightforward and easy to implement. For example, you run SELECT * on a SQL table to make every row and column appear. You simply use the JOIN keyword to return rows of data from different related tables, which you can then further group, sort, and analyze.
Im a graphics editor, and Ive worked as a developer and journalist at a number of publications, including POLITICO, Vox, and USA TODAY. My daily responsibilities involve analyzing data and creating visualizations from what I find. I first used SQL when I worked at The Chronicle of Higher Education and its sister publication, The Chronicle of Philanthropy. Our team analyzed data ranging from nonprofit financials to faculty salaries at colleges and universities. Many of our projects included as much as 20 years worth of data, and one of my main tasks was to import all that data into a SQL database and analyze it. I had to calculate the percent change in fundraising dollars at a nonprofit or find the median endowment size at a university to measure an institutions performance.
I discovered SQL to be a powerful language, one that fundamentally shaped my understanding of what you canand cantdo with data. SQL excels at bringing order to messy, large data sets and helps you discover how different data sets are related. Plus, its queries and functions are easy to reuse within the same project or even in a different database.
This leads me to Practical SQL. Looking back, I wish Id read , could be used to easily calculate a median in PostgresSQLwith the added bonus that it also finds your datas natural breaks or quantiles.
But at that stage in my career, I was only scratching the surface of SQLs capabilities. It wasnt until 2014, when I became a data developer at Gannett Digital on a team led by Anthony DeBarros, that I learned to use PostgreSQL. I began to understand just how enormously powerful SQL was for creating a reproducible and sustainable workflow.
When I met Anthony, he had been working at USA TODAY and other Gannett properties for more than 20 years, where he had led teams that built databases and published award-winning investigations. Anthony was able to show me the ins and outs of our teams databases in addition to teaching me how to properly build and maintain my own. It was through working with Anthony that I truly learned how to code.
One of the first projects Anthony and I collaborated on was the 2014 U.S. midterm elections. We helped build an election forecast data visualization to show USA TODAY readers the latest polling averages, campaign finance data, and biographical information for more than 1,300 candidates in more than 500 congressional and gubernatorial races. Building our data infrastructure was a complex, multistep process powered by a PostgreSQL database at its heart.
Anthony taught me how to write code that funneled all the data from our sources into a half-dozen tables in PostgreSQL. From there, we could query the data into a format that would power the maps, charts, and front-end presentation of our election forecast.
Around this time, I also learned one of my favorite things about PostgreSQLits powerful suite of geographic functions ( in this book). By adding the PostGIS extension to the database, you can create spatial data that you can then export as GeoJSON or as a shapefile, a format that is easy to map. You can also perform complex spatial analysis, like calculating the distance between two points or finding the density of schools or, as Anthony shows in the chapter, all the farmers markets in a given radius.