LitArk » Books » Politics

Khaled El Emam - Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data

Here you can read online Khaled El Emam - Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2020, publisher: OReilly Media, Inc., genre: Politics. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data
Author:
Khaled El Emam / Lucy Mosquera / Richard Hoptroff
Publisher:
OReilly Media, Inc.
Genre:
Books / Politics
Year:
2020
Rating:
3 / 5
Favourites:
Add to favourites
Your mark:
- 60
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Building and testing machine learning models requires access to large and diverse data. But where can you find usable datasets without running into privacy issues? This practical book introduces techniques for generating synthetic datafake data generated from real dataso you can perform secondary analysis to do research, understand customer behaviors, develop new products, or generate new revenue. Data scientists will learn how synthetic data generation provides a way to make such data broadly available for secondary purposes while addressing many privacy concerns. Analysts will learn the principles and steps for generating synthetic data from real datasets. And business leaders will see how synthetic data can help accelerate time to a product or solution. This book describes: Steps for generating synthetic data using multivariate normal distributions Methods for distribution fitting covering different goodness-of-fit metrics How to replicate the simple structure of original data An approach for modeling data structure to consider complex relationships Multiple approaches and metrics you can use to assess data utility How analysis performed on real data can be replicated with synthetic data Privacy implications of synthetic data and methods to assess identity disclosure

Khaled El Emam: author's other books

Who wrote Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data? Find out the surname, the name of the author of the book and a list of all author's works by series.

Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Practical Synthetic Data Generation

by Khaled El Emam , Lucy Mosquera , and Richard Hoptroff

Printed in the United States of America.

Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.

OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .

Acquisitions Editor: Jonathan Hassell
Development Editor: Corbin Collins
Production Editor: Christopher Faucher
Copyeditor: Piper Editorial
Proofreader: JM Olejarz
Indexer: Potomac Indexing, LLC
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Jenny Bergman

May 2020: First Edition

Revision History for the First Edition

2020-05-19: First Release

See http://oreilly.com/catalog/errata.csp?isbn=9781492072744 for release details.

The OReilly logo is a registered trademark of OReilly Media, Inc. Practical Synthetic Data Generation, the cover image, and related trade dress are trademarks of OReilly Media, Inc.

The views expressed in this work are those of the authors, and do not represent the publishers views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

978-1-492-07274-4

[LSI]

Preface

Interest in synthetic data has been growing rapidly over the last few years. This interest has been driven by two simultaneous trends. The first is the demand for large amounts of data to train and build artificial intelligence and machine learning (AIML) models. The second is recent work that has demonstrated effective methods for generating high-quality synthetic data. Both have resulted in the recognition that synthetic data can solve some difficult problems quite effectively, especially within the AIML community. Companies like NVIDIA, IBM, and Alphabet, as well as agencies such as the US Census Bureau, have adopted different types of data synthesis methodologies to support model building, application development, and data dissemination .

This book provides you with a gentle introduction to methods for the following: generating synthetic data, evaluating the data that has been synthesized, understanding the privacy implications of synthetic data, and implementing synthetic data within your organization. We show how synthetic data can accelerate AIML projects. Some of the problems that can be tackled by having synthetic data would be too costly or dangerous to solve using more traditional methods (e.g., training models controlling autonomous vehicles), or simply cannot be done otherwise. We also explain how to assess the privacy risks from synthetic data, even though they tend to be minimal if synthesis is done properly.

While we want this book to be an introduction, we also want it to be applied. Therefore, we will discuss some of the issues that will be encountered with real data, not curated or cleaned data. Real data is complex and messy, and data synthesis needs to be able to work within that context.

Our intended audience is analytics leaders who are responsible for enabling AIML model development and application within their organizations, as well as data scientists who want to learn how data synthesis can be a useful tool for their work. We will use examples of different types of data synthesis to illustrate the broad applicability of this approach. Our main focus here is on the synthesis of structured data.

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, filenames, and file extensions.

OReilly Online Learning

Note

For more than 40 years, OReilly Media has provided technology and business training, knowledge, and insight to help companies succeed.

Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. OReillys online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from OReilly and 200+ other publishers. For more information, visit http://oreilly.com.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

OReilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/practical-synthetic-data-generation.

Email to comment or ask technical questions about this book.

For news and information about our books and courses, visit http://oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Watch us on YouTube: http://youtube.com/oreillymedia

Acknowledgments

The preparation of this book benefited from a series of interviews with subject matter experts. I would like to thank the following individuals for making themselves available to discuss their experiences and thoughts on the synthetic data market and technology: Fernanda Foertter, Jim Karkanias, Alexei Pozdnoukhov, Rev Lebaradian, John Ashley, Rob Csonger, and Simson Garfinkel.

Rob Csonger and his team provided the content for the section on autonomous vehicles .

Mike Hintze from Hintze Law LLC prepared the legal analysis in the identity disclosure chapter.

We wish to thank Janice Branson for reviewing earlier versions of the manuscript.

Our clients and collaborators, who often give us challenging problems, have been key to driving our innovations in the methods of data synthesis and the implementation of the technology in practice.

Chapter 1. Introducing Synthetic Data Generation

We start this chapter by explaining what synthetic data is and its benefits. Artificial intelligence and machine learning (AIML) projects run in various industries, and the use cases that we include in this chapter are intended to give a flavor of the broad applications of data synthesis. We define an AIML project quite broadly as well, to include, for example, the development of software applications that have AIML components .

Defining Synthetic Data

At a conceptual level, synthetic data is not real data, but data that has been generated from real data and that has the same statistical properties as the real data. This means that if an analyst works with a synthetic dataset, they should get analysis results similar to what they would get with real data. The degree to which a synthetic dataset is an accurate proxy for real data is a measure of

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data»

Look at similar books to Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Katharine Jarmul

Practical Data Privacy (Final Release)

Katharine Jarmul

Practical Data Privacy (6th Early Release)

Katharine Jarmul

Practical Data Privacy: Solving Privacy and Security Problems in Your Data Science Workflow (Fifth Early Release)

Pradip Kumar Das (editor)

Privacy and Security Issues in Big Data: An Analytical View on Business Intelligence (Services and Business Process Reengineering)

David Mertz

Cleaning Data for Effective Data Science: Doing the other 80% of the work with Python, R, and command-line tools

Dunning Ted

Sharing big data safely managing data security

Khaled El Emam

Building an Anonymization Pipeline: Creating safe data

Greg Foss

Practical Data Science with SAP: Machine Learning Techniques for Enterprise Data

Ulrika Jägare

Data Science Strategy For Dummies

Ankur A Patel

Hands-On Unsupervised Learning Using Python: How to Build Applied Machine Learning Solutions from Unlabeled Data

Ted Dunning and Ellen Friedman

Sharing Big Data Safely Managing Data Security

Terence Craig

Privacy and Big Data

Reviews about «Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data»

Discussion, reviews of the book Practical Synthetic Data Generation: Balancing Privacy and the Broad Availability of Data and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.