Privacy in Social Networks
Synthesis Lectures on Data Mining and Knowledge Discovery
Editors
Jiawei Han, University of Illinois at Urbana-Champaign
Lise Getoor, University of Maryland
Wei Wang, University of North Carolina, Chapel Hill
Johanness Gehrke, Cornell University
Robert Grossman, University of Chicago
Synthesis Lectures on Data Mining and Knowledge Discovery is edited by Jiawei Han, Lise Getoor, Wei Wang, and Johannes Gehrke. The series publishes 50- to 150-page publications on topics pertaining to data mining, web mining, text mining, and knowledge discovery, including tutorials and case studies. The scope will largely follow the purview of premier computer science conferences, such as KDD. Potential topics include, but not limited to, data mining algorithms, innovative data mining applications, data mining systems, mining text, web and semi-structured data, high performance and parallel/distributed data mining, data mining standards, data mining and knowledge discovery framework and process, data mining foundations, mining data streams and sensor data, mining multi-media data, mining social networks and graph data, mining spatial and temporal data, pre-processing and post-processing in data mining, robust and scalable statistical methods, security, privacy, and adversarial data mining, visual data mining, visual analytics, and data visualization.
Privacy in Social Networks
Elena Zheleva, Evimaria Terzi, and Lise Getoor 2012
Community Detection and Mining in Social Media
Lei Tang and Huan Liu 2010
Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions
Giovanni Seni and John F. Elder 2010
Modeling and Data Mining in Blogosphere
Nitin Agarwal and Huan Liu 2009
Copyright 2012 by Morgan & Claypool
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any meanselectronic, mechanical, photocopy, recording, or any other except for brief quotations in printed reviews, without the prior permission of the publisher.
Privacy in Social Networks
Elena Zheleva, Evimaria Terzi, and Lise Getoor
www.morganclaypool.com
ISBN: 9781608458622 paperback
ISBN: 9781608458639 ebook
DOI 10.2200/S00408ED1V01Y201203DMK004
A Publication in the Morgan & Claypool Publishers series
SYNTHESIS LECTURES ON DATA MINING AND KNOWLEDGE DISCOVERY
Lecture #4
Series Editors: Jiawei Han, University of Illinois at Urbana-Champaign
Lise Getoor, University of Maryland
Wei Wang, University of North Carolina, Chapel Hill
Johanness Gehrke, Cornell University
Robert Grossman, University of Chicago
Series ISSN
Synthesis Lectures on Data Mining and Knowledge Discovery
Print 2151-0067 Electronic 2151-0075
Privacy in Social Networks
Elena Zheleva
LivingSocial
Evimaria Terzi
Boston University
Lise Getoor
University of Maryland, College Park
SYNTHESIS LECTURES ON DATA MINING AND KNOWLEDGE DISCOVERY #4
ABSTRACT
This synthesis lecture provides a survey of work on privacy in online social networks (OSNs). This work encompasses concerns of users as well as service providers and third parties. Our goal is to approach such concerns from a computer-science perspective, and building upon existing work on privacy, security, statistical modeling and databases to provide an overview of the technical and algorithmic issues related to privacy in OSNs. We start our survey by introducing a simple OSN data model and describe common statistical-inference techniques that can be used to infer potentially sensitive information. Next, we describe some privacy definitions and privacy mechanisms for data publishing. Finally, we describe a set of recent techniques for modeling, evaluating, and managing individual users privacy risk within the context of OSNs.
KEYWORDS
privacy, social networks, affiliation networks, personalization, protection mechanisms, anonymization, privacy risk
Contents
Acknowledgments
The authors would like to thank Michael Hay and Ashwin Machanavajjhala for the invaluable and thorough feedback on this manuscript. We would also like to thank the LINQS group at the University of Maryland, College Park and the data-management group at Boston University. This manuscript was supported in part by NSF grant #IIS0746930 and NSF grant #1017529, and gifts from Microsoft, Yahoo!, and Google. Some of the images included in ].
Elena Zheleva, Evimaria Terzi, and Lise Getoor
February 2012
CHAPTER 1
Introduction
Social-networking sites and other online collaborative tools have emerged as immensely popular places for people to post and share information. Facebook, Google+, MySpace, LinkedIn, etc., all have benefits, ranging from practical (e.g., sharing a business document via Google Docs) to purely social (e.g., communicating with distant friends via Facebook). At the same time, not surprisingly, information sharing poses real threats to user privacy. For example, in social-networking sites, documented threats include identity theft, digital stalking, and personalized spam []. Thus, online presence in social networks involves a trade-off between the benefits of sharing information with colleagues, friends, and acquaintances, and the risks that personal information is used in unintended, and potentially harmful, ways.
In the last few years, with the myriads of social media and social network websites appearing online, there has been a renewed and growing interest in understanding social phenomena rising from peoples interactions and affiliations []. These websites have thousands, and even millions, of users which voluntarily submit personal information in order to benefit from the services offered, such as maintaining friendships, blogging, sharing photos, music, articles, and so on. This rich information can also be used in a variety of ways to study peoples personal preferences, patterns of communication and flow of information. Apart from facilitating sociological studies, the publicly available user information can be used to train predictive models, which can infer hidden information about individuals and possibly predict users behavior. These models are widely used to improve the user experience within the social-networking websites. For example, a model that predicts what a Facebook user considers important can be used to judiciously select the pieces of information shown on the users feed.
Although such models can be utilized towards the development of better personalization strategies, they inevitably raise privacy concerns. On the one hand, users want to have the best possible experience within the online social-networking sites. This means that strong user models need to be built; strong models also require information about the user: demographic information, behavioral information, information about the users social context, etc. On the other hand, access to this information poses serious privacy concerns. While many users are happy to benefit from the aggregate information collected from millions of users for improving search results and recommendations, many are less comfortable with sharing their own information for the purpose of, say, targeted advertising. And, when it comes to making use of personal information for unintended means (to sell to third parties, as the basis for denying access to resources, etc.), there are even greater concerns.
Next page