• Complain

Charu C. Aggarwal - Machine Learning for Text

Here you can read online Charu C. Aggarwal - Machine Learning for Text full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2018, publisher: Springer, genre: Children. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Charu C. Aggarwal Machine Learning for Text
  • Book:
    Machine Learning for Text
  • Author:
  • Publisher:
    Springer
  • Genre:
  • Year:
    2018
  • Rating:
    3 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 60
    • 1
    • 2
    • 3
    • 4
    • 5

Machine Learning for Text: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Machine Learning for Text" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Text analytics is a field that lies on the interface of information retrieval,machine learning, and natural language processing, and this textbook carefully covers a coherently organized framework drawn from these intersecting topics. The chapters of this textbook is organized into three categories: - Basic algorithms: Chapters 1 through 7 discuss the classical algorithms for machine learning from text such as preprocessing, similarity computation, topic modeling, matrix factorization, clustering, classification, regression, and ensemble analysis. - Domain-sensitive mining: Chapters 8 and 9 discuss the learning methods from text when combined with different domains such as multimedia and the Web. The problem of information retrieval and Web search is also discussed in the context of its relationship with ranking and machine learning methods. - Sequence-centric mining: Chapters 10 through 14 discuss various sequence-centric and natural language applications, such as feature engineering, neural language models, deep learning, text summarization, information extraction, opinion mining, text segmentation, and event detection. This textbook covers machine learning topics for text in detail. Since the coverage is extensive,multiple courses can be offered from the same book, depending on course level. Even though the presentation is text-centric, Chapters 3 to 7 cover machine learning algorithms that are often used indomains beyond text data. Therefore, the book can be used to offer courses not just in text analytics but also from the broader perspective of machine learning (with text as a backdrop). This textbook targets graduate students in computer science, as well as researchers, professors, and industrial practitioners working in these related fields. This textbook is accompanied with a solution manual for classroom teaching.

Charu C. Aggarwal: author's other books


Who wrote Machine Learning for Text? Find out the surname, the name of the author of the book and a list of all author's works by series.

Machine Learning for Text — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Machine Learning for Text" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Springer International Publishing AG, part of Springer Nature 2018
Charu C. Aggarwal Machine Learning for Text
1. Machine Learning for Text: An Introduction
Charu C. Aggarwal 1
(1)
IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
1.1
1.1.1
1.2
1.3
1.3.1
1.3.2
1.3.3
1.3.3.1
1.3.3.2
1.3.3.3
1.3.3.4
1.3.4
1.3.4.1
1.3.4.2
1.3.4.3
1.3.4.4
1.3.4.5
1.3.4.6
1.3.5
1.3.6
1.3.7
1.3.8
1.3.9
1.3.10
1.3.11
1.4
1.5
1.5.1
1.6
The first forty years of life give us the text; the next thirty supply the commentary on it.Arthur Schopenhauer
1.1 Introduction
The extraction of useful insights from text with various types of statistical algorithms is referred to as text mining , text analytics , or machine learning from text . The choice of terminology largely depends on the base community of the practitioner. This book will use these terms interchangeably. Text analytics has become increasingly popular in recent years because of the ubiquity of text data on the Web, social networks, emails, digital libraries, and chat sites. Some common examples of sources of text are as follows:
  1. Digital libraries: Electronic content has outstripped the production of printed books and research papers in recent years. This phenomenon has led to the proliferation of digital libraries, which can be mined for useful insights. Some areas of research such as biomedical text mining specifically leverage the content of such libraries.
  2. Electronic news: An increasing trend in recent years has been the de-emphasis of printed newspapers and a move towards electronic news dissemination. This trend creates a massive stream of news documents that can be analyzed for important events and insights. In some cases, such as Google news, the articles are indexed by topic and recommended to readers based on past behavior or specified interests.
  3. Web and Web-enabled applications: The Web is a vast repository of documents that is further enriched with links and other types of side information. Web documents are also referred to as hypertext . The additional side information available with hypertext can be useful in the knowledge discovery process. In addition, many Web-enabled applications, such as social networks, chat boards, and bulletin boards, are a significant source of text for analysis.
    • Social media: Social media is a particularly prolific source of text because of the open nature of the platform in which any user can contribute. Social media posts are unique in that they often contain short and non-standard acronyms, which merit specialized mining techniques.
Numerous applications exist in the context of the types of insights one of trying to discover from a text collection. Some examples are as follows:
  • Search engines are used to index the Web and enable users to discover Web pages of interest. A significant amount of work has been done on crawling, indexing, and ranking tools for text data.
  • Text mining tools are often used to filter spam or identify interests of users in particular topics. In some cases, email providers might use the information mined from text data for advertising purposes.
  • Text mining is used by news portals to organize news items into relevant categories. Large collections of documents are often analyzed to discover relevant topics of interest. These learned categories are then used to categorize incoming streams of documents into relevant categories.
  • Recommender systems use text mining techniques to infer interests of users in specific items, news articles, or other content. These learned interests are used to recommend news articles or other content to users.
  • The Web enables users to express their interests, opinions, and sentiments in various ways. This has led to the important area of opinion mining and sentiment analysis. Such opinion mining and sentiment analysis techniques are used by marketing companies to make business decisions.
The area of text mining is closely related to that of information retrieval , although the latter topic focuses on the database management issues rather than the mining issues. Because of the close relationship between the two areas, this book will also discuss some of the information retrieval aspects that are either considered seminal or are closely related to text mining.
The ordering of words in a document provides a semantic meaning that cannot be inferred from a representation based on only the frequencies of words in that document. Nevertheless, it is still possible to make many types of useful predictions without inferring the semantic meaning. There are two feature representations that are popularly used in mining applications:
  1. Text as a bag-of-words: This is the most commonly used representation for text mining. In this case, the ordering of the words is not used in the mining process. The set of words in a document is converted into a sparse multidimensional representation , which is leveraged for mining purposes. Therefore, the universe of words (or terms ) corresponds to the dimensions (or features ) in this representation. For many applications such as classification, topic-modeling, and recommender systems, this type of representation is sufficient.
  2. Text as a set of sequences: In this case, the individual sentences in a document are extracted as strings or sequences. Therefore, the ordering of words matters in this representation, although the ordering is often localized within sentence or paragraph boundaries. A document is often treated as a set of independent and smaller units (e.g., sentences or paragraphs). This approach is used by applications that require greater semantic interpretation of the document content. This area is closely related to that of language modeling and natural language processing . The latter is often treated as a distinct field in its own right.
Text mining has traditionally focused on the first type of representation, although recent years have seen an increasing amount of attention on the second representation. This is primarily because of the increasing importance of artificial intelligence applications in which the language semantics, reasoning, and understanding are required. For example, question-answering systems have become increasingly popular in recent years, which require a greater degree of understanding and reasoning.
It is important to be cognizant of the sparse and high-dimensional characteristics of text when treating it as a multidimensional data set. This is because the dimensionality of the data depends on the number of words which is typically large. Furthermore, most of the word frequencies (i.e., feature values) are zero because documents contain small subsets of the vocabulary. Therefore, multidimensional mining methods need to be cognizant of the sparse and high-dimensional nature of the text representation for best results. The sparsity is not always a disadvantage. In fact, some models, such as the linear support vector machines discussed in Chap., are inherently suited to sparse and high-dimensional data.
This book will cover a wide variety of text mining algorithms, such as latent factor modeling, clustering, classification, retrieval, and various Web applications. The discussion in most of the chapters is self-sufficient, and it does not assume a background in data mining or machine learning other than a basic understanding of linear algebra and probability. In this chapter, we will provide an overview of the various topics covered in this book, and also provide a mapping of these topics to the different chapters.
Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Machine Learning for Text»

Look at similar books to Machine Learning for Text. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Machine Learning for Text»

Discussion, reviews of the book Machine Learning for Text and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.