1. Introduction
Abstract
The practice of business is changing. More and more companies are amassing larger and larger amounts of data, storing them in bigger and bigger databases. Every day, telephone companies are collecting several terabytes of data about who we call, when we call them, and how long we talk to them. Every time we scan our loyalty card at a grocery store, we provide valuable information about the products we like, when we consume them, and the price we are willing to pay for them. In fact, data collection has become particularly valuable for understanding the relationship between price and demand. Large Consumer-to-Consumer (C2C) online auction sites (such as eBay or uBid) own immense treasure chests of price and demand data as they observe individuals willingness to pay (i.e., individuals bids) as well as product supply (i.e., auction inventories) and demand (i.e., the proportion of auctions that transact), dispersed both geographically (i.e., across different markets and nations) and temporally (i.e., across economically or seasonally changing environments).
1.1 Analytics and Business
The practice of business is changing. More and more companies are amassing larger and larger amounts of data, storing them in bigger and bigger databases. Every day, telephone companies are collecting several terabytes of data about who we call, when we call them, and how long we talk to them. Every time we scan our loyalty card at a grocery store, we provide valuable information about the products we like, when we consume them, and the price we are willing to pay for them. In fact, data collection has become particularly valuable for understanding the relationship between price and demand. Large Consumer-to-Consumer (C2C) online auction sites (such as eBay or uBid) own immense treasure chests of price and demand data as they observe individuals willingness to pay (i.e., individuals bids) as well as product supply (i.e., auction inventories) and demand (i.e., the proportion of auctions that transact), dispersed both geographically (i.e., across different markets and nations) and temporally (i.e., across economically or seasonally changing environments).
The Internet is a particularly convenient place for data collection: every time we click on a link or visit a new Website, we leave a digital footprint (e.g., in the form of cookies or other tracking devices), thus allowing marketers to assemble a complete picture of our browsing behavior (and, ultimately, our personality and purchasing preferences). While this trove of personal information has led to some concerns about consumers privacy, and is able to anticipate outbreaks earlier than conventional methods, which can help policy makers in epidemiology or health care make timelier and more accurate decisions.
Data mining is particularly important for companies that only operate online (such as Amazon or Netflix). The reason is that these companies never meet their customers in person and thus do not have the ability to observe their behavior or directly ask them about their needs. Thus, the ability to deduce customers preferences from their browsing behavior is key for online retailers. Indeed, Amazon carefully analyzes a users past transactions (together with transactions from other users) in order to make recommendations about new products. For instance, it may recommend to us a new book (based on other books we have purchased in the past) or a product accessory (based on the accessories other customers have bought). If these recommendations match a users preferences and needs, then there is a higher chance of a new transaction and increased sales for Amazon! Automated and data-driven recommendations (also known as recommendation engines
The collection and analysis of data is important not only on the Internet it is equally important for more traditional (e.g., brick-and-mortar) businesses. Take the example of the credit cardindustry (or other credit-granting industries, such as mortgage and banking or the insurance industry). Credit card issuers often experience adverse selection in the sense that those consumers who want their products most eagerly are often the ones who also carry the highest risk. Indeed, the reason that a person is desperate for a new credit card may be that he has an extremely bad credit score and no other company is willing to issue him a credit card. On the other hand, people who already own two or three credit cards (and have a stellar credit score) may be rather unlikely to respond to a new credit card offer. So, do we want that person who responds to our offer in a rather eager and desperate fashion as our new customer? This is exactly the situation that Capital One faced several years ago when it entered the credit card market. As a new company, it wanted to gain market share quickly. However, there was also a danger that those customers who were willing to switch most quickly were also the most risky ones. In order to respond to these challenges, Capital One created a new (and innovative, at that time) information-based strategy in which they conducted thousands of laboratory-like experiments in order to better understand what characteristics distinguish good customers from bad. Moreover, they also carefully mined customers behavior, such as the way in which a customer responded to a credit card offer. For instance, a customer responding via phone would be flagged as a little more risky than one who assembled a written response sent via regular mail.
Successful applications of data-driven decision making in business are plentiful and are increasing on a daily basis. Harrahs Casinos uses data analytics not only to record their customers past activities but especially to predict future behavior. In fact, Harrahs can predict a customers potential net worth (i.e., how much money they would be gambling per visit and how often they would be visiting over their lifetime) based on data mining techniques. Using that net worth analysis, they create custom advertising messages and special offer packages for each customer. Data mining can also help tap into the pulse of the nation (or the consumer). By analyzing sentiments (e.g., positive vs. negative opinions) over thousands of blogs, companies can obtain real-time information about their brand image. This could be particularly important when products face problems (e.g., car recalls) or for identifying new product opportunities (e.g., sleeper movies at the box office).
The list of successful data mining stories goes on. AT&T uses social network analysis (i.e., mining the links and nodes in a network) to identify fraud in their telephone network. Automated and data-driven fraud detection is also popular with credit card companies such as Visa and Mastercard. Large accounting companies (such as PriceWaterhouse) develop data-driven methods to unearth inconsistencies in accounting statements. Other companies (such as IBM) use internal as well as external data in order to predict a customers wallet (i.e., their potential for purchasing additional services). And the list goes on. More curious examples include human resource management at successful sports teams. For instance, both the Boston Red Sox (baseball) and the New England Patriots (football) are famous for using data analytics to make decisions about the composition of their teams. All of this shows that data can play a key role and can provide a competitive edge across many different sectors and in many different business processes (both internal and external).