LitArk » Books » Home and family

Henning Wachsmuth - Text Analysis Pipelines

Here you can read online Henning Wachsmuth - Text Analysis Pipelines full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. City: Cham, publisher: Springer International Publishing, genre: Home and family. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Text Analysis Pipelines
Author:
Henning Wachsmuth
Publisher:
Springer International Publishing
Genre:
Books / Home and family
City:
Cham
Rating:
5 / 5
Favourites:
Add to favourites
Your mark:
- 100
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Text Analysis Pipelines: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Text Analysis Pipelines" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Henning Wachsmuth: author's other books

Who wrote Text Analysis Pipelines? Find out the surname, the name of the author of the book and a list of all author's works by series.

Text Analysis Pipelines — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Text Analysis Pipelines" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Springer International Publishing Switzerland 2015

Henning Wachsmuth Text Analysis Pipelines Lecture Notes in Computer Science 9383 10.1007/978-3-319-25741-9_1

1. Introduction

Henning Wachsmuth 1

(1)

Bauhaus-Universitt Weimar, Weimar, Germany

Henning Wachsmuth

Email:

Abstract

The future of information search is not browsing through tons of web pages or documents. In times of big data and the information overload of the internet, experts in the field agree that both everyday and enterprise search will gradually shift from only retrieving large numbers of texts that potentially contain relevant information to directly mining relevant information in these texts (Etzioni 2011; Kelly and Hamm 2013; Ananiadou et al. 2013). In this chapter, we first motivate the benefit of such large-scale text mining for todays web search and big data analytics applications (Sect. ).

In turning from the smaller instruments in frequent use to the larger and more important machines, the economy arising from the increase of velocity becomes more striking. Charles Babbage

1.1 Information Search in Times of Big Data

Information search constitutes an integral part of almost everybodys everyday life. Todays web search engines achieve to rank the most relevant result highest for a large fraction of the information needs implied by search queries. Following Manning et al. (2008), an information need can be seen as a topic about which a user desires to know more. A result is relevant if it yields information that helps to fulfill the information need at hand.

Fig. 1.1

Screenshot of Pentaho Big Data Analytics as an example for an enterprise software. The shown heat grid visualizes the vehicle sales of a company.

Instead of directly providing relevant information , however, state-of-the-art web search engines mostly return only links to web pages that may contain relevant information , often thousands or millions of them. This can make search time-consuming or even unsuccessful for queries where relevant information has to be derived (e.g. for the query locations of search companies ), should be aggregated (e.g. user opinions on bing ), seems like a needle in a haystack (e.g. "if it isnt on google it doesnt exist" original source ), and so forth.

For enterprise environments, big data analytics applications aim to infer such high-quality information in the sense of relations, patterns, and hidden facts from vast amounts of data (Davenport 2012). Figure As with this software, big data analytics is still only on the verge of including unstructured texts into analysis, though such texts are assumed to make up 95 % of all enterprise-relevant data (HP Labs 2010). To provide answers to a wide spectrum of information needs , relevant texts must be filtered and relevant information must be identified in these texts. We hence argue that search engines and big data analytics applications need to perform more text mining .

1.1.1 Text Mining to the Rescue

Text mining brings together techniques from the research fields of information retrieval , data mining , and natural language processing in order to infer structured high-quality information from usually large numbers of unstructured texts (Ananiadou and McNaught 2005). While information retrieval deals, at its heart, with indexing and searching unstructured texts, data mining targets at the discovery of patterns in structured data. Natural language processing , finally, is concerned with algorithms and engineering issues for the understanding and generation of speech and human-readable text (Tsujii 2011). It bridges the gap between the other fields by converting unstructured into structured information. Text mining is studied within the broad interdisciplinary field of computational linguistics , as it addresses computational approaches from computer science to the processing of data and information while operationalizing findings from linguistics.

According to Sarawagi (2008), the most important text mining techniques for identifying and filtering relevant texts and information within the three fields refer to the areas of information extraction and text classification . The former aims at extracting entities , relations between entities , and events the entities participate in from mostly unstructured text. The latter denotes the task of assigning a text to one or more predefined categories, such as topics , genres , or sentiment polarities . Information extraction , text classification , and similar tasks are considered in both natural language processing and information retrieval . In this book, we summarize these tasks under the term text analysis . All text analyses have in common that they can significantly increase the velocity of information search in many situations.

In our past research project InfexBA was to classify and summarize opinions on products and their features found in large numbers of review texts. To this end, we analyzed the sequence of local sentiment on certain product features found in each of the reviews in order to account for the argumentation of texts.

Fig. 1.2

Google result page for the query Charles Babbage , showing an example of directly providing relevant information instead of returning only web links.

Of course, major search engines already use text analysis when addressing information needs (Pasca 2011). E.g., a Google search in late 2014 for Charles Babbage , the author of this chapters introductory quote, led to the results in Fig. But, in accordance with the quote of Babbage, the benefit of text mining arising from the increase of velocity becomes more striking when turning from predefined text analyses in frequent use to arbitrary and more complex text analysis processes .

1.2 A Need for Efficient and Robust Text Analysis Pipelines

Text mining deals with tasks that often entail complex text analysis processes , consisting of several interdependent steps that aim to infer sophisticated information types from collections and streams of natural language input texts (cf. Chap. for details). In the mentioned project InfexBA , different entity types (e.g. organization names) and event types (e.g. forecasts) had to be extracted from input texts and correctly brought into relation , before they could be normalized and aggregated. Such steps require syntactic annotations of texts, e.g. part-of-speech tags and parse tree labels (Sarawagi 2008). These in turn can only be added to a text that is segmented into lexical units, e.g. into tokens and sentences. Similarly, text classification often relies on so called features (Manning et al. 2008) that are derived from lexical and syntactic annotations or even from entities , like in ArguAna .

To realize the steps of a text analysis process , text analysis algorithms are employed that annotate new information types in a text or that classify, relate , normalize , or filter previously annotated information. Such algorithms perform analyses of different computational cost, ranging from the typically cheap evaluation of single rules and regular expressions, over the matching of lexicon terms and the statistical classification of text fragments, to complex syntactic analyses like dependency parsing (Bohnet 2010). Because of the interdependencies between analyses, the standard way to realize a text analysis process is in the form of a text analysis pipeline , which sequentially applies each employed text analysis algorithm to its input.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Text Analysis Pipelines»

Look at similar books to Text Analysis Pipelines. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Henning Mankell

The Journey to the End of the World

Henning Mankell

Wallander's First Case

Henning Koch

The Maggot People

Henning Schmidtke

The Politics of Global Tax Governance

Trüper Henning

Historical Teleologies in the Modern World

Schulzrinne Henning Tschofenig Hannes

Internet Protocol-based Emergency Services

Henning Kate

Henning Kate

Henning Mankell

Henning Mankell

Henning Mankell

Henning Mankell

Reviews about «Text Analysis Pipelines»

Discussion, reviews of the book Text Analysis Pipelines and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.