Hands-On Computer Vision with Detectron2
Develop object detection and segmentation models with a code and visualization approach
Van Vung Pham
BIRMINGHAMMUMBAI
Hands-On Computer Vision with Detectron2
Copyright 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Publishing Product Manager: Dhruv J. Kataria
Content Development Editor: Shreya Moharir
Technical Editor: Rahul Limbachiya
Copy Editor: Safis Editing
Project Coordinator: Farheen Fathima
Proofreader: Safis Editing
Indexer: Pratik Shirodkar
Production Designer: Jyoti Chauhan
Marketing Coordinators: Shifa Ansari, Vinishka Kalra
First published: April 2023
Production reference: 1290323
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80056-162-5
www.packtpub.com
To my father, Pham Van Hung, and my mother, Pham Thi Doai, for their sacrifices and love. To my loving wife, Thi Hong Hanh Le, for her unwavering support during this exciting and also time-consuming endeavor, and my children, Le Tra My Pham and Le Ha Chi Pham, for checking on me about my progress, and to my little one, Liam Le Pham, who was born while I was writing this book and brought new excitement and source of energy for me to complete it.
Foreword
I have known and worked with Van Vung Pham for more than 10 years and was also his academic advisor for his doctoral degree. Vung won several data visualization, computer vision, and machine learning challenges during his Ph.D. program, including using Detectron2 to detect and classify road damage. In this book, Hands-On Computer Vision with Detectron2, Vung takes you on a learning journey that starts with common computer vision tasks. He then walks you through the steps for developing computer vision applications using stunning deep-learning models with simple code by utilizing pre-trained models on the Detectron2 Model Zoo.
Existing models, trained on huge datasets, and for the most common object types, can meet common computer vision tasks. However, this book also focuses on developing computer vision applications on a custom domain for specific business requirements. For this, Vung provides the steps to collect and prepare data, train models, and fine-tune models on brain tumor datasets for object detection and instance segmentation tasks to illustrate how to develop computer vision applications on custom business domains.
In his presentations and examples, Vung provides code that can be conveniently executed on Google Colab and visualizations to help illustrate theoretical concepts. The ability to execute the code on Google Colab helps eliminate the burden of hardware and software setup, so you can get started quickly and conveniently. The visualizations allow you to easily grasp complicated computer vision concepts, better understand deep learning architectures for computer vision tasks, and become an expert in this area.
Beyond developing deep learning models for computer vision tasks, you will learn how to deploy the trained models to various environments. Vung explains different model formats, such as TorchScript and ONNX formats, and their respective execution platforms and environments, such as C++ servers, web browsers, or mobile and edge devices.
Become a developer and an expert in developing and deploying computer vision applications with Detectron2.
Tommy Dang
iDVL director and assistant professor, Texas Tech University
Contributors
About the author
Van Vung Pham is a passionate research scientist in machine learning, deep learning, data science, and data visualization. He has years of experience and numerous publications in these areas. He is currently working on projects that use deep learning to predict road damage from pictures or videos taken from roads. One of the projects uses Detectron2 and Faster R-CNN to predict and classify road damage and achieve state-of-the-art results for this task. Dr. Pham obtained his Ph.D. from the Computer Science Department, at Texas Tech University, Lubbock, Texas, USA. He is currently an assistant professor at the Computer Science Department, Sam Houston State University, Huntsville, Texas, USA.
I want to thank the people who have been close and supported me, especially my wife, Hanh, my parents, my children, and my Ph.D. advisor (Dr. Tommy Dang from Texas Tech University).
About the reviewers
Yiqiao Yin is a senior data scientist at an S&P 500 company LabCorp, developing AI-driven solutions for drug diagnostics and development. He has a BA in mathematics and a BSc in finance from the University of Rochester. He was a PhD student in statistics at Columbia University and has a wide range of research interests in representation learning: feature learning, deep learning, computer vision, and natural language processing. He has held professional positions as an enterprise-level data scientist at EURO STOXX 50 company Bayer, a quantitative researcher at AQR, working on alternative quantitative strategies to portfolio management and factor-based trading, and an equity trader at T3 Trading on Wall Street.
Nikita Dalvi is a highly skilled and experienced technical professional, currently pursuing a masters degree in computing and data science at Sam Houston State University. With a background in information and technology, she has honed her skills in programming languages such as Java and Python over the past five years, having worked with prestigious organizations such as Deloitte and Tech Mahindra. Driven by her passion for programming, she has taught herself new languages and technologies over the years and stayed up to date with the latest industry trends and best practices.