Scott Mongeau and Andrzej Hajdasinski
Cybersecurity Data Science
Best Practices in an Emerging ProfessionForeword by Timothy Shimeall
1st ed. 2021
Logo of the publisher
Scott Mongeau
Nyenrode Business Universiteit, Breukelen, Netherlands
Andrzej Hajdasinski
Nyenrode Business Universiteit, Breukelen, Netherlands
ISBN 978-3-030-74895-1 e-ISBN 978-3-030-74896-8
https://doi.org/10.1007/978-3-030-74896-8
Springer Nature Switzerland AG 2021
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Frontispiece design: Andreas Kallipolitis, iamtraum.com
Ars longa, vita brevis, occasio praeceps, experimentum periculosum, iudicium difficile.
Life is short, the art long, opportunity fleeting, experiment treacherous, judgment difficult.
Hippocrates
Dedicated to Marloes, family, and friends
Foreword
While data science has been emerging as a profession since 2005, the professionalization of its application to cybersecurity is less mature. One reason for this relative immaturity is that both data science and cybersecurity have been undergoing extensive change and accepted practices are still evolving. Another reason is that, unlike many fields of data analysis, cybersecurity has intelligent opposition to its methods, specifically attackers who wish to intrude on computer systems and networks. To date, cybersecurity has been in a race against that opposition, and the state of data science for cybersecurity reflects that race. Under those conditions, accepted practices are rapidly challenged and modified.
Despite these challenges, the importance of cybersecurity data science is increased due to a number of pressures. The velocity of cybersecurity data is large, and increasing. A single, moderately busy, server or firewall generates gigabytes of log entries every day. A network traffic log for a large network generates tens of billions of entries per day. Security event analysis systems only deal with some of the more immediate and easily recognized issues. Data science approaches that can efficiently categorize and focus attention on the most impactful streams within this fire hose of data are urgently needed. At the same time, the activities of the attackers are increasingly diverse in subtlety, impact, and targeting. While some are easily recognized and of immediate effect on a recognizable target within the perception of the defenders, others mimic desirable traffic, lie latent within the target until desired by the attacker, or hit outside of the defenders perception, in unmonitored portions of their infrastructure or in the infrastructure of suppliers or vendors. By employing explicit feature engineering and sensitivity analysis, cybersecurity data science may focus on those features most revealing of even subtle activities and also provide the chance to secure on a community-wide basis. Federating and sharing data within even a tightly related community is often difficult due to the lack of common methods for data analysis and interpretation. Cybersecurity data science, with its explicit consideration of the characteristics of data and of analysis methods, offers an opportunity to bridge the federation and sharing difficulties.
This book thoroughly, if not exhaustively, documents the lack of maturity in data science applied to cybersecurity. More than identifying this lack of maturity, it uses a mixed-mode data collection, both qualitative and quantitative, to point to how the gaps in cybersecurity data science can be filled as it emerges as a full profession. Using a multifaceted mix of detailed literature review, survey of experts, and modeling, Dr. Mongeau has carefully delineated both where this data science profession is currently lacking and how those lacks could be addressed in future work. A wide range of factors are included, and clear recommendations are provided.
The reader who comes to this volume from an interest in cybersecurity will gain much in understanding how data science methods apply in this space. The book refers to various methods of analysis, and how those methods lend insight into cybersecurity objectives. This book serves as a broad and useful introduction to how data science contributes to cybersecurity, as that science is practiced by modern professionals.
The reader who comes to this volume from an interest in data science will find this book summarizes the state of data science as a profession (and the path forward) but then focuses directly on the specific needs of cybersecurity and how the profession would help to protect data in the modern world. The analysis problems (such as the chronic lack of ground truth) and methods to remediate those problems are covered thoroughly, and from a perspective that speaks to the data scientist.
The reader who comes to this volume from a managerial perspective, or one who seeks to understand the emergence of this field and the current capability of those practicing data science for cybersecurity, will find the clear description of the state of the field useful. This book offers a solid state of the field and is supportive to both realistic appraisal of what can be gained from practitioners and to what to look for as emerging capabilities in the near future. The interviews and quantitative analysis give in-depth understanding of what is, and what is coming.
Taken in total, this book offers an extremely useful clarification to the emergence of cybersecurity data science as a specific profession, borrowing from both cybersecurity and from general data science. Dr. Mongeau has provided a useful degree of clarity to these rapidly developing fields. From this basis, a variety of useful work will spring in the days to come.