• Complain

Emil Hvitfeldt - Supervised Machine Learning for Text Analysis in R (Chapman & Hall/CRC Data Science Series)

Here you can read online Emil Hvitfeldt - Supervised Machine Learning for Text Analysis in R (Chapman & Hall/CRC Data Science Series) full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2021, publisher: Chapman and Hall/CRC, genre: Children. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Emil Hvitfeldt Supervised Machine Learning for Text Analysis in R (Chapman & Hall/CRC Data Science Series)
  • Book:
    Supervised Machine Learning for Text Analysis in R (Chapman & Hall/CRC Data Science Series)
  • Author:
  • Publisher:
    Chapman and Hall/CRC
  • Genre:
  • Year:
    2021
  • Rating:
    4 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 80
    • 1
    • 2
    • 3
    • 4
    • 5

Supervised Machine Learning for Text Analysis in R (Chapman & Hall/CRC Data Science Series): summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Supervised Machine Learning for Text Analysis in R (Chapman & Hall/CRC Data Science Series)" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Emil Hvitfeldt: author's other books


Who wrote Supervised Machine Learning for Text Analysis in R (Chapman & Hall/CRC Data Science Series)? Find out the surname, the name of the author of the book and a list of all author's works by series.

Supervised Machine Learning for Text Analysis in R (Chapman & Hall/CRC Data Science Series) — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Supervised Machine Learning for Text Analysis in R (Chapman & Hall/CRC Data Science Series)" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Contents
Supervised Machine Learning for Text Analysis in R First edition published 2022 - photo 1
Supervised Machine Learning for Text Analysis in R

First edition published 2022

by CRC Press

6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742

and by CRC Press

2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN

2022 Taylor & Francis Group, LLC

CRC Press is an imprint of Taylor & Francis Group, LLC

Reasonable efforts have been made to publish reliable data and information, but theauthor and publisher cannot assume responsibility for the validity of all materialsor the consequences of their use. The authors and publishers have attempted to tracethe copyright holders of all material reproduced in this publication and apologizeto copyright holders if permission to publish in this form has not been obtained.If any copyright material has not been acknowledged please write and let us know sowe may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted,reproduced, transmitted, or utilized in any form by any electronic, mechanical, orother means, now known or hereafter invented, including photocopying, microfilming,and recording, or in any information storage or retrieval system, without writtenpermission from the publishers.

For permission to photocopy or use material electronically from this work, access

Trademark notice: Product or corporate names may be trademarks or registered trademarks and are usedonly for identification and explanation without intent to infringe.

ISBN: 978-0-367-55418-7 (hbk)

ISBN: 978-0-367-55419-4 (pbk)

ISBN: 978-1-003-09345-9 (ebk)

DOI: 10.1201/9781003093459

Typeset in LMR10 font

by KnowledgeWorks Global Ltd.

In loving memory of my mother-in-law Lisa, who was the first soul to hear about and fully encourage the idea that eventually became this book E.H.

For Grace, Violet, and Lewis, who (thanks to the pandemic and remote school) had a front row seat to most of my work on this book J.S.

Preface

Modeling as a statistical practice can encompass a wide variety of activities. This book focuses on supervised or predictive modeling for text, using text data to make predictions about the world around us. We use the tidymodels framework for modeling, a consistent and flexible collection of R packages developed to encourage good statistical practice.

Supervised machine learning using text data involves building a statistical model to estimate some output from input that includes language. The two types of models we train in this book are regression and classification. Think of regression models as predicting numeric or continuous outputs, such as predicting the year of a United States Supreme Court opinion from the text of that opinion. Think of classification models as predicting outputs that are discrete quantities or class labels, such as predicting whether a GitHub issue is about documentation or not from the text of the issue. Models like these can be used to make predictions for new observations, to understand what features or characteristics contribute to differences in the output, and more. We can evaluate our models using performance metrics to determine which are best, which are acceptable for our specific context, and even which are fair.

Picture 2 Text data is important for many domains, from healthcare to marketing to the digital humanities, but specialized approaches are necessary to create features (predictors) for machine learning from language.

Natural language that we as speakers and/or writers use must be dramatically transformed to a machine-readable, numeric representation to be ready for computation. In this book, we explore typical text preprocessing steps from the ground up and consider the effects of these steps. We also show how to fluently use the textrecipes R package () to prepare text data within a modeling pipeline.

provides a practical introduction to text mining with R using tidy data principles, based on the tidytext package. If you have already started on the path of gaining insight from your text data, a next step is using that text directly in predictive modeling. Text data contains within it latent information that can be used for insight, understanding, and better decision-making, and predictive modeling with text can bring that information and insight to light. If you have already explored how to analyze text as demonstrated in Silge and Robinson (2017), this book will move one step further to show you how to learn and make predictions from that text data with supervised models. If you are unfamiliar with this previous work, this book will still provide a robust introduction to how text can be represented in useful ways for modeling and a diverse set of supervised modeling approaches for text.

______________________

Outline

The book is divided into three sections. We make a (perhaps arbitrary) distinction between machine learning methods and deep learning methods by defining deep learning as any kind of multilayer neural network (LSTM, bi-LSTM, CNN) and machine learning as anything else (regularized regression, naive Bayes, SVM, random forest). We make this distinction both because these different methods use separate software packages and modeling infrastructure, and from a pragmatic point of view, it is helpful to split up the chapters this way.

  • Natural language features: How do we transform text data into a representation useful for modeling? In these chapters, we explore the most common preprocessing steps for text, when they are helpful, and when they are not.

  • Machine learning methods: We investigate the power of some of the simpler and more lightweight models in our toolbox.

  • Deep learning methods: Given more time and resources, we see what is possible once we turn to neural networks.

Some of the topics in the second and third sections overlap as they provide different approaches to the same tasks.

Throughout the book, we will demonstrate with examples and build models using a selection of text data sets. A description of these data sets can be found in .

Picture 3We use three kinds of info boxes throughout the book to invite attention to notes and other ideas.

Picture 4 Some boxes call out warnings or possible problems to watch out for.

Picture 5 Boxes marked with hexagons highlight information about specific R packages and how they are used. We use bold for the names of R packages.

Topics this book will not cover

This book serves as a thorough introduction to prediction and modeling with text, along with detailed practical examples, but there are many areas of natural language processing we do not cover. The CRAN Task View on Natural Language Processing provides details on other ways to use R for computational linguistics. Specific topics we do not cover include:

  • Reading text data into memory: Text data may come to a data practitioner in any of a long list of heterogeneous formats. Text data exists in PDFs, databases, plain text files (single or multiple for a given project), websites, APIs, literal paper, and more. The skills needed to access and sometimes wrangle text data sets so that they are in memory and ready for analysis are so varied and extensive that we cannot hope to cover them in this book. We point readers to R packages such as

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Supervised Machine Learning for Text Analysis in R (Chapman & Hall/CRC Data Science Series)»

Look at similar books to Supervised Machine Learning for Text Analysis in R (Chapman & Hall/CRC Data Science Series). We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Supervised Machine Learning for Text Analysis in R (Chapman & Hall/CRC Data Science Series)»

Discussion, reviews of the book Supervised Machine Learning for Text Analysis in R (Chapman & Hall/CRC Data Science Series) and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.