Practical Data Science with SAP
by Greg Foss and Paul Modderman
Copyright 2019 Greg Foss and Paul Modderman. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .
- Editors: Jonathan Hassell and Nicole Tache
- Production Editor: Nan Barber
- Copyeditor: Jasmine Kwityn
- Proofreader: Charles Roumeliotis
- Indexer: WordCo Indexing Services, Inc.
- Interior Designer: David Futato
- Cover Designer: Karen Montgomery
- Illustrator: Rebecca Demarest
- September 2019: First Edition
Revision History for the First Edition
- 2019-09-17: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781492046448 for release details.
The OReilly logo is a registered trademark of OReilly Media, Inc. Practical Data Science with SAP, the cover image, and related trade dress are trademarks of OReilly Media, Inc.
The views expressed in this work are those of the authors, and do not represent the publishers views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility ...
Preface
The future of data science and artificial intelligence has never looked brighter. AI now beats humans at games ranging from twitchy, reflexive Pong to deep, contemplative Go. Deep learning models recognize objects nearly as well as humans. Some even say self-driving cars perform better than their distracted human counterparts. The past decades massive gains in data volume, storage capacity, and computing power have enabled rapid advances in data science.
And of course technology has spread into every facet of your business (from finance and sales to production and logistics). However, is each part of your business turbocharged by data science and AI? Likely not. As wonderful as they might be, if you are not designing a self-driving car or predicting customer behavior, you are probably not using these technologies.
Many organizations may have access to business data from an enterprise resource planning (ERP) system such as SAP, and yours is likely no different. Data coming from a business system such as SAP is largely perfect as often validations and checks are in place before it is allowed to save to the database (and, one of the most essential and least rewarding tasks of a data scientist is cleaning the data). This means ERP data in SAP is ripe for the picking, and data science is here to do the harvesting!
Lets take a hypothetical scenario. The SAP Team at Big Bonanza Warehouse is in a constant state of process improvement. They know how to configure their SAP system to do the tasks their users want, and they play that system like a fiddle, dutifully taking requests and delivering solutions. However, there is a bit of a problem with reporting and analytics; they have a data warehouse and a business intelligence system, but developing reports is a multimonth process. The team often resorts to using standard ALV (ABAP List Viewer) reports, which are quite limited in power because they require a developer to code; in addition, it is very hard to harness the wealth of public data that could be used in conjunction with SAP. Just like at countless other enterprises, SAP data at Big Bonanza Warehouse is an island, siloed within its own system. Teams that dont work with SAP have no idea whats in there, and the teams that do work with it spend so much time maintaining the systems that they dont get the chance to look outside them.
SAP data shouldnt be an island, though. The team knows their data, how to find it, and what they want to do with it. However, when it comes to analyzing that data, everyones hands are tied by that multimonth report development process.
Sound familiar? Its the story at nearly every SAP shop with whom weve ever worked. And thats a lot in our combined 30+ years of experience.
We want to give that SAP team (and yours!) some modern insighttools and techniques they can use without defining data cubes, data warehouse objects, or learning complex frontend reports. In this book, well present simple scenarios such as dumping data straight out of SAP into a flat file and into a reporting tool. This is useful for ad-hoc reporting and investigations. Well also consider more complex scenarios, including using extractor tools and neural network models in the cloud to analyze data in ways not possible within SAP or contemporary data warehouses.
How to Read This Book
Youll need to approach this book from a conceptual point of view. We present alternative techniques for analyzing business data.We asknay, we begthe reader to think about business data (in particular SAP data) in new and interesting ways. This book is designed to awaken ideas around how to bridge the gap between your particular business data and the advances in data science. You need not be an expert in the complex algorithms that calculate gradient descent in a neural network, nor do you need to be an expert in your business data. But you do need to have a desire to straddle these two camps and have fun in the process.
From the data scientists perspective, the data science principles in this book are an introduction. If you can spot a sigmoid, tanh, or relu activation function at fifty paces, you can skip those parts. But were betting that if your guru level is that high in data science, youre a novice at the SAP stuff. Focus in on the SAP stories, showing you how to pull things out and demonstrating working with the real business data in the system.
From the SAP professionals perspective, youll break out of traditional reporting and analytics models. Youll learn to think of business applications and reporting in machine and deep learning terms. This may sound mystical, but by the end of the book you will have the tools necessary to take this step. Along the way youll automatically detect anomalies in sales data, predict the future from past data, process text as natural language, segment customers into smart groups, visualize all these things brilliantly, and teach bots to use business data.
In our world of AI and data science, asking the same old questions of your data is stale, naive, and (quite frankly) boring. We want you to ask questions of your data that you didnt even know you could ask. Maybe the price of tea in China really does have an outsize effect on your sales.
From the developers perspective, youll be inspired to learn wonderful programming languages like Python and R. We dont teach you these languages, but we challenge you to dip your toe into these warm and effervescent waters. If you are already an experienced R or Python developer, youre in good shape for the code sections. For the novice, we will point you to resources to get you started. Dont feel left out if you are inclined to use another language such as Java. The meta goal of this book is to get you to think of how to think of business data differently and if that means you want to use Java, by all means do so.
Operationalizing data science is a whole book in itself. Well frequently touch on how to operationalize ideas we present, but it is beyond the scope of this book to dive deep on creating robust pipelines.