Abstract
This book aims to a present systematic study of practices and theories for vertical search ranking. The studies in this book can be categorized into two major classes. One class is single-domain-related ranking that focuses on ranking for a specific vertical, such as news search ranking, medical domain search ranking, visual search ranking, mobile search ranking, and entity search ranking. Another class is multidomain-related ranking, which focuses on ranking that involves multiple verticals, such as multiaspect ranking, aggregating vertical search ranking, and cross-vertical ranking. This chapter discusses organization, audience, and further reading for this book.
Keywords
Vertical search ranking
news search ranking
medical domain search ranking
visual search ranking
mobile search ranking
multiaspect relevance ranking
entity ranking
aggregated vertical search
cross-vertical search ranking
1.1 Defining the Area
In the past decade, the impact of general Web search capabilities has been stunning. However, with exponential information growth on the Internet, it becomes more and more difficult for a general Web search engine to address the particular informational and research needs of niche users. As a response to the great need for deeper, more specific, more relevant search results, vertical search engines have emerged in various domains. By leveraging domain knowledge and focusing on specific user tasks, vertical search has great potential to serve users highly relevant search results from specific domains.
The core component of vertical search is relevance ranking , which has attracted more and more attention from both industry and academia during the past few years. This book aims to present systematic study of practices and theories for vertical search ranking. The studies in this book can be categorized into to two major classes. One class is single-domain-related ranking that focuses on ranking for a specific vertical, such as news search ranking and medical domain search ranking. However, in this book the term vertical has a more general meaning than topic. It refers to specific topics such as news and medical information, specific result types such as entities, and specific search interfaces such as mobile search. The second class of vertical search study covered in this book class is multidomain-related ranking, which focuses on ranking involving multiple verticals, such as multiaspect ranking, aggregating vertical search ranking, and cross-vertical ranking.
1.2 The Content and Organization of This Book
This book aims to present an in-depth and systematic study of practices and theories related to vertical search ranking. The organization of this book is as follows.
introduces a few news search ranking approaches, including a learning-to-rank approach and a joint learning approach from clickthroughs. The chapter then describes a scalable clustering approach to group news search results.
studies another important vertical search, the medical domain search. With the exponential growth of electronic health records (EHRs), it is imperative to identify effective means to help medical clinicians as well as administrators and researchers retrieve information from EHRs. Recent research advances in natural language processing (NLP) have provided improved capabilities for automatically extracting concepts from narrative clinical documents. However, before these NLP-based tools become widely available and versatile enough to handle vaguely defined information retrieval needs by EHR users, a convenient and cost-effective solution continues to be in great demand. In this chapter, we introduce the concept of medical information retrieval, which provides medical professionals a handy tool to search among unstructured clinical narratives via an interface similar to that of general-purpose Web search engines, e.g., Google. In the latter part of the chapter, we also introduce several advanced features, such as intelligent, ontology-driven medical search query recommendation services and a collaborative search feature that encourages sharing of medical search knowledge among end users of EHR search tools.
is intended to introduce some fundamental and practical technologies as well as some major emerging trends in visual search ranking. The chapter first describes the generic visual search system, in which three categories of visual search are presented: i.e., text-based , query example-based and concept-based visual search ranking. Then we describe the three categories in detail, including a review of various popular algorithms. To further improve the performance of initial search results, visual search re-ranking of four paradigms will be presented: 1) self-reranking , which focuses on detecting relevant patterns from initial search results without any external knowledge; 2) example-based reranking , in which the query examples are provided by users so that the relevant patterns can be discovered from these examples; 3) crowd-reranking , which mines relevant patterns from crowd-sourcing information available on the Web; and 4) interactive reranking , which utilizes user interaction to guide the reranking process. In addition, we also discuss the relationship between learning and visual search, since most recent visual search ranking frameworks are developed based on machine learning technologies. Last, we conclude with several promising directions for future research.
introduces mobile search ranking . The wide availability of Internet access on mobile devices, such as phones and personal media players, has allowed users to search and access Web information while on the go. The availability of continuous fine-grained location information on these devices has enabled mobile local search, which employs user location as a key factor to search for local entities (e.g., a restaurant, store, gas station, or attraction) to overtake a significant part of the query volume. This is also evident by the rising popularity of location-based search engines on mobile devices, such as Bing Local, Google Local, Yahoo! Local, and Yelp. The quality of any mobile local search engine is mainly determined by its ranking function, which formally specifies how we retrieve and rank local entities in response to a users query. Acquiring effective ranking signals and heuristics to develop an effective ranking function is arguably the single most important research problem in mobile local search. This chapter first overviews the ranking signals in mobile local search (e.g., distance and customer rating score of a business), which have been recognized to be quite different from general Web search. We next present a recent data analysis that studies the behavior of mobile local search ranking signals using a large-scale query log, which reveals interesting heuristics that can be used to guide the exploitation of different signals to develop effective ranking features. Finally, we also discuss several interesting future research directions.
is about entity ranking , which is a recent paradigm that refers to retrieving and ranking related objects and entities from different structured sources in various scenarios. Entities typically have associated categories and relationships with other entities. In this chapter, we introduce how to build a Web-scale entity ranking system based on machine = learned ranking models. Specifically, the entity ranking system usually takes advantage of structured knowledge bases, entity relationship graphs, and user data to derive useful features for facilitating semantic search with entities directly within the learning-to-rank framework. Similar to generic Web search ranking, entity pairwise preference can be leveraged to form the objective function of entity ranking. More than that, this chapter introduces ways to incorporate the categorization information and preference of related entities into the objective function for learning. This chapter further discusses how entity ranking is different from regular Web search in terms of presentation bias and the interaction of categories of query entities and result facets.