Practical Weak Supervision
by Wee Hyong Tok , Amit Bahree , and Senja Filipi
Copyright 2022 Wee Hyong Tok, Amit Bahree, and Senja Filipi. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .
- Acquisitions Editor: Jonathan Hassell
- Development Editor: Jeff Bleiel
- Production Editor: Kristen Brown
- Copyeditor:
- Proofreader:
- Indexer:
- Interior Designer: David Futato
- Cover Designer: Karen Montgomery
- Illustrator: Kate Dullea
- October 2021: First Edition
Revision History for the Early Release
- 2021-04-02: First Release
- 2021-05-11: Second Release
- 2021-06-22: Third Release
- 2021-07-23: Fourth Release
- 2021-08-13: Fifth Release
See http://oreilly.com/catalog/errata.csp?isbn=9781492077060 for release details.
The OReilly logo is a registered trademark of OReilly Media, Inc. Practical Weak Supervision, the cover image, and related trade dress are trademarks of OReilly Media, Inc.
The views expressed in this work are those of the authors, and do not represent the publishers views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
978-1-492-07706-0
[LSI]
Preface
Getting quality labeled data for supervised learning is an important step towards training performant machine learning models. In many real-world projects, getting labeled data often takes up a significant amount of time. Weak Supervision is emerging as an important catalyst towards enabling data science teams to fuse insights from heuristics , and crowd-sourcing to produce weakly labeled datasets that can be used as inputs for machine learning and deep learning tasks.
Who Should Read This Book
The primary audience of the book will be professional and citizen data scientists who are already working on machine learning projects, and face the typical challenges of getting good, quality labeled data for these projects. They will have working knowledge of the programming language Python, and are familiar with machine learning libraries and tools.
Navigating This Book
This book is organized roughly as follows:
provides a basic introduction to the field of Weak Supervision, and how data scientists and machine learning engineers can use it as part of the data science process.
discusses how to get started with using Snorkel for weak supervision and introduces concepts in using Snorkel for data programming.
describes how to use Snorkel for labeling, and provides code examples on how one can use Snorkel to label a text and image dataset.
Chapters are included as part of the book to enable practitioners to have an end-to-end understanding of how to use a weakly labeled dataset for text and image classification
discusses the practical considerations on using Snorkel with large datasets, and how to use Spark clusters to scale labeling.
Conventions Used in This Book
The following typographical conventions are used in this book:
ItalicIndicates new terms, URLs, email addresses, filenames, and file extensions.
Constant width
Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.
Constant width bold
Shows commands or other text that should be typed literally by the user.
Constant width italic
Shows text that should be replaced with user-supplied values or by values determined by context.
Tip
This element signifies a tip or suggestion.
Note
This element signifies a general note.
Warning
This element indicates a warning or caution.
Using Code Examples
All the code in the book is available in the following GitHub repository https://bit.ly/WeakSupervisionBook. The code in the chapters is correct but is a subset of the overall codebase. The code in the chapters is meant to outline the concepts. To run the code for yourself, we encourage you to clone the book GitHub repository.
If you have a technical question or a problem using the code examples, please send email to .
This book is here to help you get your job done. In general, if an example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless youre reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from OReilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your products documentation does require permission.
We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: Practical Weak Supervision by Wee Hyong Tok, Amit Bahree, and Senja Filipi (OReilly). Copyright 2022 Wee Hyong Tok, Amit Bahree, and Senja Filipi, 978-1-492-07706-0.
If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at .
OReilly Online Learning
Note
For more than 40 years, OReilly Media has provided technology and business training, knowledge, and insight to help companies succeed.
Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. OReillys online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from OReilly and 200+ other publishers. For more information, visit http://oreilly.com.
How to Contact Us
Please address comments and questions concerning this book to the publisher:
- OReilly Media, Inc.
- 1005 Gravenstein Highway North
- Sebastopol, CA 95472
- 800-998-9938 (in the United States or Canada)
- 707-829-0515 (international or local)
- 707-829-0104 (fax)
We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at