Overview
Documents such as The Book of Kells, a manuscript written around 800 AD are impressive examples that illustrate how much time and effort went into creating early textual documents. Books were expensive to produce and, consequently, owning large book collections was considered to be both a source of knowledge and also a status symbol for great power and wealth. This changed significantly in the fifteenth century when German blacksmith Johannes Gutenberg triggered the printing revolution with his invention of mechanical movable type printing. Suddenly, distributing and owning text documents was no longer just a privilege of the richer parts of society as his invention allowed for the mass production and spread of printed documents. Undeniably, his invention is one of the most important events in modern history since it laid the foundations for our knowledge-based society. Two more recent inventions significantly changed the way we create, distribute, and interact with textual documents even further. The introduction of the computer allowed us to create documents in digital format, hence enabling us to create multiple copies of the same textual document without any quality loss. The second important invention was the Internet which allowed us to easily distribute these digital documents. Given the widespread access to Internet that allows almost everyone to create and share text, it is a logical consequence that we are facing an ever-increasing amount of information in textual form. In fact, as of September 2014, more than one billion websites with even more webpages are available online. This constant information input is often referred to as information overload since the sheer amount of information that is created is impossible to be processed by the average user. Therefore, approaches and methods need to be developed that support us in finding the right information in this data ocean.
In the first part of this book, we present four use cases that center around helping users to overcome the information overload that they are facing. We focus on three different approaches: (1) analyzing textual documents to provide a summarized view of the documents content, (2) providing semantically enriched access to information, and (3) easing access to information by aggregating documents.
In Chap. , Ploch approaches the information overload challenge in the context of online news. Nowadays, it is doubtful that there is any major newspaper that does not maintain an online portal. Apart from saving costs that occur for printing and distributing traditional newspapers, the main advantage of distributing news online is that readers can be reached almost immediately. On one hand, the wide range of news brings many benefits to readers; they can find comprehensive information and capture news from different perspectives. On the other hand, the increasing amount of news material complicates their handling, which requires tools for facilitating consumption of news articles. In addition to reporting facts, news articles also contain opinions which may be very important for helping readers making decisions and for public figures to control their perception in the media. Analyzing the large number of news articles manually is next to impossible. Ploch presents how online readers of newspapers can be offered a structured overview of news. Focusing on news published in the German language, she illustrates how news articles can be categorized by topic and time of publication. In addition, she illustrates means to track the development of news events over time and to track opinions and resonance in the media about popular topics, persons, or organizations.
Besides professionally edited content on news portals, various alternative information sources exist on the web. Social networks and services like Twitter offer a wealth of information as thousands of users publicly exchange information. These so-called microblogs give voice to billions of people who often use this technology to express their opinions about brands, products, and persons. Analyzing these opinions can be of high value for companies since knowing where a brand is popular can be an important lead for the marketing strategy. Esiyok and Albayrak discuss in Chap. how tweets can be analyzed to identify users opinions about brands and present an application that displays the popularity of brands in specific locations on a map. This helps to identify trends and trendsetters and can offer aid for marketing decisions.
Chapter focuses on a specific type of information portal. Addressing the trend that users more often use the internet for informing themselves about any types of topics, healthcare providers and governments started setting up education campaigns on the WWW. Although healthcare providers have specific interest in providing health information services to all their clients, immigrants have been identified as a vulnerable population cohort that benefits less from existing healthcare systems since language and cultural barriers prevent them from using existing prevention services. Plumbaum et al. present an online health assistant that consists of three parts: (1) a multilingual health information assistant, (2) a cooking assistant, and (3) a virtual trainer. These assistants present a comprehensive approach to support people for healthier living by giving them information about health topics, supporting healthier eating and getting enough exercise.
In Chap. , Gunadi and Albayrak address the information overload challenge in the workspace environment. They argue that the bigger a company, the more complex is their IT infrastructure and, consequently, more resources exist where employees can store information. Examples include companies web server, internal file servers, but also the employees personal desktop computers. In their chapter, they present an information aggregation system that eases employees information gathering task when accessing distributed information. They outline challenges that distributed information cause and present different methods to aggregate retrieval results coming from these different sources.
1. Intelligent News Aggregator for German with Sentiment Analysis
Abstract
The comprehensive supply of information from different points of view, e.g., from the thousands of news articles published online every day, is a tremendous advantage of the digital era. However, the immense amount of news material poses a significant challenge to interested readers: It is hardly possible to fully digest this wealth of information, so that the need for systems supporting intelligent news consumption arises. This chapter describes an approach to automatically mining opinions from topically related news article clusters. We focus our work on the extraction of quotations from German news articles and on analyzing the quotations according to the sentiments they express. Our approach is realized as a news aggregation system capable of handling real-world news streams. We describe the architecture and interface of our news aggregator, and present a rule-based method for quotation extraction as well as our supervised approach to sentiment analysis. We evaluate the implemented models on two human-annotated datasets, which can be made available upon request.
As Many Heads, So Many Opinions ( Horace )
Since her 15th birthday Clara dreamed of becoming a journalist. She worked for the school newspapers and was a member of the debating society. Now, after passing the exams, Clara finally started living her dream. She remembered the day when she received the good news that they had accepted her application for an internship at the local newspaper. Thats a great chance for you, dont ruin it, her father told her. What a typical statement from her dad, she thought. Why would she ruin it? And what is there to ruin in an internship anyway? In fact, her tasks so far seemed quite forward. Even Dad would be able to do that, she thought with a grim on her face. At the beginning her tasks were restricted to copying articles and typing e-mails, but her boss quickly recognized her talent and assigned her an important research task for a special issue on espionage amongst friends, a topic that gained attention after the revelations of the US-whistleblower Edward Snowden in 2013. In particular, Clara and her internship friend Martha were supposed to analyze newspaper reports about the behavior of US-president Barack Obama and the German Chancellor Angela Merkel before and after the revelations. The interns should identify the politicians attitudes toward national security-related topics and if they possibly changed after the publication of the secret documents by Snowden. How the politicians have commented on the revelations and spying in general? Did they disagree on all points or was there also agreement? Which statements about spying among allies were the most controversial? The interns were also advised to work out whether Snowdens actions influenced Merkels positions to crucial election topics during her campaign for the German federal election in 2013.