Go Web Scraping Quick
Start Guide
Implement the power of Go to scrape and crawl data
from the web
Vincent Smith
BIRMINGHAM - MUMBAI
Go Web Scraping Quick Start Guide
Copyright 2019 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Pavan Ramchandani
Acquisition Editor: Aditi Gour
Content Development Editor: Smit Carvalho
Technical Editor: Surabhi Kulkarni
Copy Editor: Safis Editing
Project Coordinator: Pragati Shukla
Proofreader: Safis Editing
Indexer: Mariammal Chettiyar
Graphics: Alishon Mendonsa
Production Coordinator: Jyoti Chauhan
First published: January 2019
Production reference: 1290119
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78961-570-8
www.packtpub.com
mapt.io
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Why subscribe?
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Packt.com
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at customercare@packtpub.com for more details.
At www.packt.com , you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Contributors
About the author
Vincent Smith has been a software engineer for 10 years, having worked in various fields from health and IT to machine learning, and large-scale web scrapers. He has worked for both large-scale Fortune 500 companies and start-ups alike and has sharpened his skills from the best of both worlds. While obtaining a degree in electrical engineering, he learned the foundations of writing good code through his Java courses. These basics helped spur his career in software development early in his professional career in order to provide support for his team. He fell in love with the process of teaching computers how to behave and set him on the path he still walks today.
I would like to first and foremost thank my parents and my wife for supporting me in writing this book, and believing that I actually do have something to share. I would like to thank my co-workers, past and present, for being a shining example that impostor syndrome is all in your head and you should always share your knowledge. You were not born with your knowledge, so be the one that someone else can learn from.
About the reviewer
Ladjimi Chiheb Eddine is a professional Python/Django developer with extensive knowledge of Ethereum, Solidity, GoLang, PostgreSQL, and Bitcoin. He is an open source enthusiast who is trying to help people in Stack Overflow and many QA forums by responding to their answers.
Currently, he resides in Paris, where he works as a senior Python/Django developer.
I would like to thank my family and all those who love me for their support over the years. Without them, I would not have been able to find the strength to continue my work and improve my skills.
Packt is searching for authors like you
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Preface
The internet is a place full of interesting information and insights just waiting to be gleaned. Much like golden nuggets, these fragmented pieces of data can be collected, filtered, combined, and refined to produce extremely valuable products. Armed with the right knowledge, skills, and a little creativity, you can build a web scraper that can power multi-billion-dollar companies. To support this, you need to use the best tools for the job, starting with a programming language built for speed, simplicity, and safety.
The Go programming language combines the best ideas from its predecessors and cutting-edge ideology, leaving out the unnecessary fluff, to produce a razor-sharp set of tools and clean architecture. With the Go standard library and projects from open source contributors, you have everything you need to build a web scraper of any size.
Who this book is for
This book is for anyone with a little coding experience who is curious about how to build a web scraper that is fast and efficient.
What this book covers
, Introducing Web Scraping and Go , explains what web scraping is and how to install the Go programming language and tools.
, The Request/Response Cycle , outlines the structure of HTTP requests and responses, and explains how to use Go to make and process them.
, Web Scraping Etiquette , explains how to build a web scraper that uses best practices and recommendations for crawling the web efficiently, while respecting others.
, Parsing HTML , shows how to use various tools to parse information from HTML pages.
, Web Scraping Navigation , demonstrates the best ways to navigate websites efficiently.
, Protecting Your Web Scraper , explains how to use various tools to navigate through the internet safely and securely.
Next page