Our main goal is to show new ways and means to extract reliable and valuable knowledge of the blogosphere. Following an abstract view of the blogosphere from two different angles, we dive deeper into the diverse varieties of blogs and introduce some interesting ones. Then, we continue our journey by collecting requirements for retrieving new knowledge and showing the path from content collection to data mining and knowledge visualization. After this we present a tool that actively supports the extraction of knowledge and show two mining functionalities included in the aforementioned tool. At the end, we will discuss our expectations for the future trends of the blogosphere and social media analytics in general.
1.3 Dimensions of Web 2.0
Various scholars [.
Fig. 1.2
The four main dimensions determining Web 2.0 (Adapted from Hoeren et al. [])
The enormous improvements in availability, speed, reliability and network bandwidth made during the past 15 years are referred to as the first dimension net infrastructure . This also refers to the advances made in programming and software, in particular with respect to extensions in client-side scripting that have brought, for instance, the AJAX technology, as well as developments in server-side programming (see 1 in Fig.).
Based on technologies such as AJAX or programming languages like Ruby, the functional dimension (see 2 in Fig. and a migration of applications from the desktop to the web. By ruling out the need for bug fixing, local installation or updating, software can now be obtained as Software as a Service (short SaaS) over the Internet. Nonetheless, these services imply that data resides on the web, with all the associated debates about data privacy.
The third dimension (see Fig. ].
Finally, the social dimension of Web 2.0 embraces different formats and services that support communication, interaction and collaboration between users (see 4 in Fig. ].
1.4 Web 2.0 and Weblogs
The terminology and visions of the early debates around the new Internet could in fact already be discovered in publications of the early popularization phase of the Internet in the nineties. Even back then, the Internet as well as all the services and forms of communication supported by it, was already being recognized as a revolutionary phenomenon that could fundamentally change social communication. Those visions ranged from a possible revitalization of the public [] declared, with reference to the metaphor of the virtual agora , that such a development would enable the public to proclaim many-voiced statements, directly and without detours via any kind of intermediator.
The central arguments in this debate were to a large extent similar to the ones regarding the discussion about participative journalism : The vision of a stronger involvement of the recipients in any public communication as well as the overcoming of intermediators in favor of direct and unlimited communication. The utopia of Brecht from the thirties to change the prevalent distributional system into a communicational system in view of the developing television technology, has at last become true [].
In other words, the Web heavily benefits from user contributions and user-generated content (UGC) []. This development is closely associated with the frequently predicted collapse of traditional journalism, which is already becoming increasingly obsolete now that everybody has the capability of becoming their own reporter or commentator.
Particularly popular among Web 2.0 formats, and widely rumored to have the potential to provide direct and unlimited communication, are weblogs commonly known as blogs . A blog has a journal-like structure containing several articles, called posts , ordered by their entry date. Each post consists of a title, a publication date, and its main content. The author of a blog, called a blogger , has two possible schemes to sort his post collection known as categories or tags . Categories introduce a hierarchical sorting schema that enables the author to group posts together. In contrast, tags are important keywords attached to a post that highlight aspects of the posts and improve the detectability of a post.
Weblogging systems are specialized, but easy-to-use, Content Management Systems (CMS) with a strong focus on updatable content, social interaction, and interoperability with other web authoring systems. The technical solutions agreed upon among developers of weblogging systems are a fine example of how new, innovative conventions and best practices can be developed on top of existing standards set by the World Wide Web Consortium and the community. Applications like these that offer a simplified mode of participation in todays Internet in contrast to earlier traditional web applications, are now described as Web 2.0 applications while the concurrently developing Participation Internet has been referred to up to now as Web 2.0 [].
Since the end of the 1990s weblogs have evolved to become an essential component of todays cyber culture [.
Fig. 1.3
Blog writing usage trends 20062014. Blogging shows no signs of slowing its growing prominence in popular culture and society. The number of blogs increased from 35,77 (2006) million to 260,47 million (2014) []
Meanwhile, the point of origin of weblogs is indefinable since their potential areas of application are numerous. Beginning with personal diaries, reaching over to knowledge and activity management platforms in private or business contexts alike, and finally to enabling content-related and journalistic web offerings [].