Table of Contents
Take the Data Science Salary and Tools Survey
As data analysts and engineersas professionals who like nothing better than petabytes of rich datawe find ourselves in a strange spot: We know very little about ourselves. But thats changing. This salary and tools survey is the third in an annual series. To keep the insights flowing, we need one thing: PEOPLE LIKE YOU TO TAKE THE SURVEY .
Anonymous and secure, the survey will continue to provide insight into the demographics, work environments, tools, and compensation of practitioners in our field. We hope youll consider it a civic service. We hope youll participate today.
2015 Data Science Salary Survey
Tools, Trends, What Pays (and What Doesnt) for Data Professionals
John King & Roger Magoulas
2015 DATA SCIENCE SALARY SURVEY
by John King and Roger Magoulas
The authors gratefully acknowledge the contribution of Owen S. Robbins and Benchmark Research Technologies, Inc., who conducted the original 2012/2013 Data Science Salary Survey referenced in the article.
Editor: Shannon Cutt
Designer: Ellie Volckhausen
Production Manager: Dan Fauxsmith
Copyright 2015 OReilly Media, Inc. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (.
November 15, 2013: First Edition
November 13, 2014: Second Edition
September 2, 2015: Third Edition
REVISION HISTORY FOR THE THIRD EDITION
2015-09-02: First Release
While the publisher and the author(s) have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author(s) disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
2014 DATA SCIENCE SALARY SURVEY
OVER 600 RESPONDENTS FROM A VARIETY OF INDUSTRIES COMPLETED THE SURVEY
THE RESEARCH IS BASED ON DATA collected through an online 32-question survey, including demographic information, time spent on various data-related tasks, and the use/non-use of 116 software tools.
Executive Summary
NOW IN ITS THIRD EDITION , the 2015 version of the Data Science Salary Survey explores patterns in tools, tasks, and compensation through the lens of clustering and linear models. The research is based on data collected through an online 32-question survey, including demographic information, time spent on various data-related tasks, and the use/non-use of 116 software tools. Over 600 respondents from a variety of industries completed the survey, two-thirds of whom are based in the United States.
Key findings include:
The same four toolsSQL, Excel, R, and Pythonremain at the top for the third year in a row
Spark (and Scala) use has grown tremendously from last year, and their users tend to earn more
Using last years data for comparison, R is now used by more data professionals who otherwise tend to use commercial tools
Inversely, R is no longer used as frequently by data practitioners who use other open source tools such as Python or Spark
Salaries in the software industry are highest
Even when all other variables are held equal, women are paid thousands less than their male counterparts
Cloud computing (still) pays
About 40% of variation in respondents salaries can be attributed to other pieces of data they provided
We invite you to not only read the report but participate: try plugging your own information into one of the linear models to predict your own salary. And, of course, the survey is open for the 2016 report. Spend just 5 to 10 minutes and take the anonymous salary survey here: http://www.oreilly.com/go/ds-salary-survey-2016. Thank you!
Introduction
FOR THE THIRD YEAR RUNNING, we at OReilly Media have collected survey data from data scientists, engineers, and others in the data space about their skills, tools, and salary. Some of the same patterns we saw last year are still presentnewer, scalable open source tools in general correlate with higher salaries, Spark in particular continues to establish itself as a top tool. Much of this is apparent from other sources: large software companies that traditionally produced only proprietary software have begun to embrace open source; Spark courses, training programs, and conference talks have sprung up in great numbers. But who actually uses which tools (and are the old ones really disappearing)? Which tools do the highest earners use, and is it fair to attribute a particular variation in salary to using a certain tool? We hope that the findings in this iteration of the Data Science Salary Survey will go beyond what is already obvious to any data scientist or Strata attendee.
Preliminaries
This report is based on an online survey open from November 2014 to July 2015, publicized to the OReilly audience but open to anyone who had the link. Of the 820 respondents who answered at least one question, about a quarter dropped out before completing the survey and have been excluded from all segments of analysis except for those showing responses to single questions. We should be careful when making conclusions about survey data from a self-selecting sampleit is a major assumption to claim it is an unbiased representation of all data scientists and engineersbut with a little knowledge about our audience, the information in this report should be sufficiently qualified to be useful. As is clear from the survey results, the OReilly audience tends to use more newer, open source tools, and underrepresents non-tech industries such as insurance and energy. OReilly contentin books, online, and at conferencesis focused on technology, in particular new technology, so it makes sense that our audience would tend to be early adopters of some of the newer tools.
A final word on the self-selecting nature of the sample: differences between results in this survey and other surveys may simply arise from the samples idiosyncrasies and not from any meaningful difference. Findings from other salary survey reportsthere have been a few recently in the data spacesometimes conflict directly with our findings, but this doesnt necessarily imply that one set of findings are erroneous. Likewise, discrepancies between our own salary surveys dont necessarily imply a trend. The methodology between this years survey and last years is close enough to allow us to make some conclusions based on year-to-year differences, but only when the numbers are very strong.