Cybersecurity and Applied Mathematics
First Edition
Leigh Metcalf
William Casey
Copyright
Syngress is an imprint of Elsevier
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, USA
2016 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publishers permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
ISBN: 978-0-12-804452-0
For information on all Syngress publications visit our website at https://www.elsevier.com/
Acquisition Editor: Brian Romer
Editorial Project Manager: Anna Valutkevich
Production Project Manager: Mohana Natarajan
Cover Designer: Mark Rogers
Typeset by SPi Global, India
Biography
Leigh Metcalf researches network security, game theory, formal languages, and dynamical systems. She is Editor in Chief of the Journal on Digital Threats and has a PhD in Mathematics.
William Casey is a Senior Research Member at the Carnegie Mellon University Software Engineering Institute. His work focuses on the design of scalable cybersecurity within social technological systems. Casey has made contributions in the areas of cybersecurity, natural language processing, genomics, bioinformatics, and applied mathematics in academic, industry, and government settings. He has held appointments at the University of Warwick and New York University. Casey received his PhD in applied mathematics from the Courant Institute at New York University. He also holds an MS in mathematics from Southern Illinois University Carbondale, and an MA in mathematics from the University of Missouri Columbia. Casey is a member of the Association for Computing Machinery (ACM).
Chapter 1
Introduction
Abstract
In this chapter we discuss the purpose of the book and the mathematical underpinnings of it.
Keywords
Introduction; Models; Mathematical techniques
The practice of cybersecurity involves diverse data sets, including DNS, malware samples, routing data, network traffic, user interaction, and more. There is no one size fits all analysis scheme for this data, a new method must be created for each data set. The best methods have a mathematical basis to them.
A mathematical model of a system is an abstract description of that system that uses mathematical concepts. We want to take the systems in cybersecurity and create mathematical models of them in order to analyze the systems, make predictions of the system, derive information about the system, or other goals, depending on the system. This book is designed to give you the basic understanding of the mathematical concepts that are most relevant to designing models for cybersecurity models.
Cybersecurity is often about finding the needle in the needlestack. Finding that one bit that looks almost, but not quite like, everything else. In a network that can generate gigabytes of traffic a day, discovering that small amount of anomalous traffic that is associated with malware is a difficult proposition. Similarly, finding the one set of maliciously registered domains in the hundreds of million of domain names is not an easy process.
There are a wide variety of mathematical techniques that can be used to create methods to analyze cybersecurity data. These techniques are the underpinnings that essentially make it work. Statistics cares about the origin of the data, how it was collected, and what assumptions you can make about the data. Mathematical techniques, such as graph theory, are developed on the structure known as a graph, and work no matter what they are used to model. That is the beauty of math.
The point of this book is not to spend time going through proofs as to why the various mathematical techniques work, but rather to give an introduction into the areas themselves. Careful consideration was taken in the chapters to include the description of what things are and how they work, but to not overwhelm the reader with the why. The why is not always relevant to understanding the what or how. This book is designed for the cybersecurity analyst who wishes to create new techniques that have a secure foundation in math.
The content is designed to cover various areas that are used in cybersecurity today, to give the reader a firm basis in understanding how they can be applied in creating new analysis methods as well as to enable the reader to achieve greater understanding of current methods. The reader is expected to have studied calculus in order to understand the concepts in the book.
Chapter 2
Metrics, similarity, and sets
Abstract
In this chapter we cover an introduction to set theory, with common operations such as subset, intersection, union, set difference, complement and symmetric difference with examples from cybersecurity data. Set functions are also discussed, which leads us directly to the definition of a metric. We cover the variations of metric, including pseudometric, quasimetric, and semimetric. Similarities are also discussed. We then illustrate the metric on various sets, including strings, sets, Internet and cybersecurity specific metrics.
Keywords
Sets; Set operations; Functions; Metric; Similarity
The human eye can discern differences between two objects, but cannot necessarily quantify that difference. For example, a red apple and a green apple are obviously different, but still similar in that they are both apples. If we consider a red apple and a computer, they are obviously completely different. We can only say similar but different or obviously different.