Applied Cybersecurity Mathematics and Applied Cybersecurity Mathematics and Leigh Metcalf
William Casey
AMSTERDAM BOSTON HEIDELBERG LONDON
NEW YORK OXFORD PARIS SAN DIEGO
SAN FRANCISCO SINGAPORE SYDNEY TOKYO
Syngress is an imprint of Elsevier
Syngress is an imprint of Elsevier
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, USA
2016 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publishers permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions .
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this eld are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
ISBN: 978-0-12-804452-0
Acquisition Editor: Brian Romer
Editorial Project Manager: Anna Valutkevich
Production Project Manager: Mohana Natarajan
Cover Designer: Mark Rogers
Typeset by SPi Global, India
Biography
Leigh Metcalf researches network security, game theory, formal languages, and dynamical systems. She is Editor in Chief of the Journal on Digital Threats and has a PhD in Mathematics.
William Casey is a Senior Research Member at the Carnegie Mellon University Software Engineering Institute. His work focuses on the design of scalable cyber-security within social technological systems. Casey has made contributions in the areas of cybersecurity, natural language processing, genomics, bioinformatics, and applied mathematics in academic, industry, and government settings. He has held appointments at the University of Warwick and New York University. Casey received his PhD in applied mathematics from the Courant Institute at New York University. He also holds an MS in mathematics from Southern Illinois University Carbondale, and an MA in mathematics from the University of Missouri Columbia. Casey is a member of the Association for Computing Machinery (ACM).
Introduction
The practice of cybersecurity involves diverse data sets, including DNS, malware samples, routing data, network trafc, user interaction, and more. There is no one size ts all analysis scheme for this data, a new method must be created for each data set. The best methods have a mathematical basis to them.
A mathematical model of a system is an abstract description of that system that uses mathematical concepts. We want to take the systems in cybersecurity and create mathematical models of them in order to analyze the systems, make predictions of the system, derive information about the system, or other goals, depending on the system. This book is designed to give you the basic understanding of the mathematical concepts that are most relevant to designing models for cybersecurity models.
Cybersecurity is often about nding the needle in the needlestack. Finding that one bit that looks almost, but not quite like, everything else. In a network that can generate gigabytes of trafc a day, discovering that small amount of anomalous trafc that is associated with malware is a difcult proposition. Similarly, nding the one set of maliciously registered domains in the hundreds of million of domain names is not an easy process.
There are a wide variety of mathematical techniques that can be used to create methods to analyze cybersecurity data. These techniques are the underpinnings that essentially make it work. Statistics cares about the origin of the data, how it was collected, and what assumptions you can make about the data. Mathematical techniques, such as graph theory, are developed on the structure known as a graph, and work no matter what they are used to model. That is the beauty of math.
The point of this book is not to spend time going through proofs as to why the var-ious mathematical techniques work, but rather to give an introduction into the areas themselves. Careful consideration was taken in the chapters to include the description of what things are and how they work, but to not overwhelm the reader with the why. The why is not always relevant to understanding the what or how. This book is designed for the cybersecurity analyst who wishes to create new techniques that have a secure foundation in math.
Cybersecurity and Applied Mathematics. http://dx.doi.org/10.1016/B978-0-12-804452-0.00001-4
2016 Elsevier Inc. All rights reserved.
2 CHAPTER 1 Introduction
The content is designed to cover various areas that are used in cybersecurity today, to give the reader a rm basis in understanding how they can be applied in creating new analysis methods as well as to enable the reader to achieve greater understanding of current methods. The reader is expected to have studied calculus in order to understand the concepts in the book.
Metrics, similarity, and sets The human eye can discern differences between two objects, but cannot necessarily quantify that difference. For example, a red apple and a green apple are obviously different, but still similar in that they are both apples. If we consider a red apple and a computer, they are obviously completely different. We can only say similar but different or obviously different.
We need to quantify this difference in a reasonable way. To this end, we create a framework that standardizes the properties that a distance should satisfy. Before we cover the distance, we begin with the basics of set theory. Distances are inherently dened on a set, so the knowledge of some basics of set theory is useful. The chapter concludes with relevant examples of distances.
2.1 INTRODUCTION TO SET THEORY
We have lots of words for collections of things, a deck of cards, a ock of birds, a pod of whales, a murder of ravens, or a eet of automobiles. The underlying theme of these words is collection of things that have a property in common; or as we say it in math, a set of things such as a set of birds or a set of domains. A set of domains would not contain a URL, because then we could not call it a set of domains. On the other hand, a set of domains could contain 127.0.0.1 because an IP address can act like a domain. The important notion within set theory is the idea that we can regard any collection of objects as a single thing.