fastText Quick Start Guide
Get started with Facebook's library for text representation and classification
Joydeep Bhattacharjee
BIRMINGHAM - MUMBAI
fastText Quick Start Guide
Copyright 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Sunith Shetty
Acquisition Editor: Reshma Raman
Content Development Editor: Aditi Gour
Technical Editor: Vaibhav Dwivedi
Copy Editor: Safis Editing
Project Coordinator: Hardik Bhinde
Proofreader: Safis Editing
Indexer: Tejal Daruwale Soni
Graphics: Jason Monteiro
Production Coordinator: Deepika Naik
First published: July 2018
Production reference: 1240718
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78913-099-7
www.packtpub.com
"To my wife, Saionee, for patiently hearing my ideas and giving me advice, support, and motivation, to get this book done, and to my mom, father-in-law, and mother-in-law for their love and support."
- Joydeep Bhattacharjee
mapt.io
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Why subscribe?
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
PacktPub.com
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at service@packtpub.com for more details.
At www.PacktPub.com , you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Contributors
About the author
Joydeep Bhattacharjee is a Principal Engineer who works for Nineleaps Technology Solutions. After graduating from National Institute of Technology at Silchar, he started working in the software industry, where he stumbled upon Python. Through Python, he stumbled upon machine learning. Now he primarily develops intelligent systems that can parse and process data to solve challenging problems at work. He believes in sharing knowledge and loves mentoring in machine learning. He also maintains a machine learning blog on Medium .
I'd like to thank to Sherin Thomas for help on PyTorch, and Deepayan Das and Kalyan Ram for their help on the Android sections.
Thanks to the Packt team for believing in me, to Reshma Raman for pushing me on the initial drafts, Aditi Gour for the helpful advice, reviews, and for coordinating the whole process, and Vaibhav Dwivedi for the final technical reviews and bringing the book to publishing.
About the reviewer
Krishna Modi is Managing Director at CodeAngle Technologies Pvt. Ltd.
Before starting his company, he worked with enterprises like Cisco Systems and Hewlett Packard Enterprises as a Consultant. He was also an SME for Cloud and Infrastructure Management at Fork Media Pvt. Ltd. Krishna has interest in Data Science and System Programming. He is a registered teaching faculty with ICAI and has been a speaker at various national and international technical sessions hosted by ICAI.
Krishna enjoys writing for technical magazines like PC Quest, Open Source For You and has contributed multiple articles on technology updates for AskMen India Edition.
I have to start by thanking my parents, getting in depth knowledge about the topic would not be possible without their constant support and encouragement. Special thanks to the superwoman in my life, Roshni, my beloved wife. She was as important to this book getting done as I was.
Thanks to everyone on the publishing team for choosing me for this task.
Thank you readers for making this a success.
Packt is searching for authors like you
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Table of Contents
Preface
FastText is a state-of-the-art tool that can be used to perform text classification and build efficient word representations. It is open source and is designed at Facebook Artificial Intelligence Research (FAIR) lab. It is written in C++, and you also have wrappers available in Python.
This book has the ambitious goal of covering all and techniques and know-how that you need to build NLP applications in the real world. It will also cover the algorithms on which fastText is built so that you will clearly understand the context in which you can expect the best results from fastText.
Who this book is for
This book will be of benefit to you if you are a software developer/machine learning engineer trying to understand the state-of-the-art in NLP. A large part of the book deals with real-life problems and considerations for creating an NLP pipeline. If you are an NLP researcher, there is a lot of value here because you will learn about the internal algorithms and considerations taken while developing the fastText software. All the code examples are written in Jupyter Notebooks. I highly recommend you type them out, change them, and tinker with them. Keep the code handy so that you can use it later in your actual projects.
What this book covers
, Introducing FastText , introduces fastText and the NLP context in which this library is useful. It will map the motivations behind building the library and the intended usage and benefits that the creators of the library intended to bring into NLP and the field of computational linguistics. There will also be specific instructions explaining how to install fastText on your work machine. Upon completion of this chapter, you will have fastText installed and running on your computer.