Natural Language Processing with Java
Second Edition
Techniques for building machine learning and neural network models for NLP
Richard M. Reese
AshishSingh Bhatia
BIRMINGHAM - MUMBAI
Natural Language Processing with JavaSecond Edition
Copyright 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Pravin Dhandre
Acquisition Editor: Divya Poojari
Content Development Editor: Eisha Dsouza
Technical Editor: Jovita Alva
Copy Editor: Safis Editing
Project Coordinator: Nidhi Joshi
Proofreader: Safis Editing
Indexer: Tejal Daruwale Soni
Graphics: Jisha Chirayil
Production Coordinator: Shraddha Falebhai
First published: March 2015
Second edition: July 2018
Production reference: 1300718
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78899-349-4
www.packtpub.com
To my parents, Smt. Ravindrakaur Bhatia and S. Tej Singh Bhatia, and to my brother, S. Ajit Singh Bhatia, for guiding, motivating, and supporting me when it was required most. To my friends, who are always there, and especially to Mr. Mitesh Soni, for the support and inspiration to write.
AshishSingh Bhatia
mapt.io
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Why subscribe?
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
PacktPub.com
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at service@packtpub.com for more details.
At www.PacktPub.com , you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
About the authors
Richard M. Reese has worked in both industry and academia. For 17 years, he worked in the telephone and aerospace industries, serving in several capacities, including research and development, software development, supervision, and training. He currently teaches at Tarleton State University. Richard has written several Java books and a C Pointer book. He uses a concise and easy-to-follow approach to teaching about topics. His Java books have addressed EJB 3.1, updates to Java 7 and 8, certification, functional programming, jMonkeyEngine, and natural language processing.
AshishSingh Bhatia is a learner, reader, seeker, and developer at core. He has over 10 years of IT experience in different domains, including banking, ERP, and education. He is persistently passionate about Python, Java, R, and web and mobile development. He is always ready to explore new technologies.
I would like to first and foremost thank my loving parents and friends for their continued support, patience, and encouragement.
About the reviewers
Doug Ortiz is an experienced enterprise cloud, big data, data analytics, and solutions architect who has designed, developed, re-engineered, and integrated enterprise solutions. His other expertise is in Amazon Web Services, Azure, Google Cloud, business intelligence, Hadoop, Spark, NoSQL databases, and SharePoint, to mention just a few.
He is the founder of Illustris, LLC, and is reachable at dougortiz@illustris.org.
Huge thanks to my wonderful wife, Milla, as well as Maria, Nikolay, and our children for all their support.
Paraskevas V. Lekeas received his PhD and MS in CS from the NTUA, Greece, where he conducted his postdoc on algorithmic engineering, and he also holds degrees in math and physics from the University of Athens. He was a professor at the TEI of Athens and the University of Crete before taking an internship at the University of Chicago. He has extensive experience in knowledge discovery and engineering, having addressed many challenges for startups and for corporations using a diverse arsenal of tools and technologies. He is leading the data group at H5, helping H5 advancing in innovative knowledge discovery.
Packt is searching for authors like you
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Preface
Natural Language Processing (NLP) allows you to take any sentence and identify patterns, special names, company names, and more. The second edition of Natural Language Processing with Java teaches you how to perform language analysis with the help of Java libraries, while constantly gaining insights from the outcomes.
You'll start by understanding how NLP and its various concepts work. Having got to grips with the basics, you'll explore important tools and libraries in Java for NLP, such as CoreNLP, OpenNLP, Neuroph, Mallet, and more. You'll then start performing NLP on different inputs and tasks, such as tokenization, model training, parts of speech, parsing trees, and more. You'll learn about statistical machine translation, summarization, dialog systems, complex searches, supervised and unsupervised NLP, and other things.
Next page