Eeep, we couldnt find that page. Search, or go to the .
Find answers on the fly, or master something new. Subscribe today.
Back to top
2018 window.NREUM||(NREUM={});NREUM.info={"beacon":"bam.nr-data.net","errorBeacon":"bam.nr-data.net","transactionName":"YgdaZ0NSW0cEB0RdWltNfkZfUEFdCgofXFBHDVYdR1pQQxZeRl1QQj1aWkU=","applicationTime":102,"applicationID":"3275661,67267027,67267028","agent":"","licenseKey":"510f1a6865","queueTime":0}
ACKNOWLEDGMENTS
Thanks to Annie Choi, Laurel Chun, and Bill Pollock at No Starch Press and to my copyeditor, Bart Reed. In all justice, they should be regarded as co-authors of this book. Thanks in advance to the workers responsible for printing, transporting, and selling copies of this book, and the engineers responsible for its digital storage, transmission, and rendering. Thanks to Hillary Sanders for bringing her remarkable talents to the project exactly when they were needed. Gratitude to Gabor Szappanos for his excellent and exacting technical review.
Thanks to my two year old daughter Maya, who, I'm happy to share, slowed this project down dramatically. Thanks to Alen Capalik, Danny Hillis, Chris Greamo, Anup Ghosh, and Joe Levy for their mentorship over the past 10 years. Deep appreciation to the Defense Advanced Research Projects Agency (DARPA) and Timothy Fraser for supporting the research on which much of this book is based. Thanks to Mandiant, and Mila Parkour, for obtaining and curating the APT1 malware samples used for demonstration purposes in this book. Deep appreciation to the authors of Python, NetworkX, matplotlib , numpy , sklearn , Keras , seaborn , pefile , icoutils , malwr.com, CuckooBox, capstone , pandas , and sqlite for your contributions to free and open source security and data science software.
Tremendous gratitude to my parents, Maryl Gearhart and Geoff Saxe, for introducing me to computers, for tolerating my teenage hacker phase (and all the illegality that entailed), and for their boundless love and support. Thanks to Gary Glickman for his indispensable love and support. Finally, thanks to Ksenya Gurshtein, my partner in life, for supporting me in this endeavor completely and without hesitation.
Joshua Saxe
Thanks to Josh, for including me in this! Thanks to Ani Adhikari for being an amazing teacher. Thanks to Jacob Michelini, because he really wanted his name in a book.
Hillary Sanders
AN OVERVIEW OF DATASETS AND TOOLS
All data and code for this book are available for download at http://www.malwaredatascience.com/. Be warned: there is Windows malware in the data. If you unzip the data on a machine with an antivirus engine running on it, many of the malware examples will likely get deleted or quarantined.
NOTE
We have modified a few bytes in each malware executable so as to disable it from executing. That being said, you cant be too careful about where you store it. We recommend storing it on a non-Windows machine thats isolated from your home or business network.
Ideally, you should only experiment with the code and data within an isolated virtual machine. For convenience, weve provided a VirtualBox Ubuntu instance at http://www.malwaredatascience.com/ that has the data and code preloaded onto it, along with all the necessary open source libraries.
Overview of Datasets
Now lets walk through the datasets that accompany each chapter of this book.
Chapter 1: Basic Static Malware Analysis
Recall that in we walk through basic static analysis of a malware binary called ircbot.exe. This malware is an implant, meaning it hides on users systems and waits for commands from an attacker, allowing the attacker to collect private data from a victims computer or achieve malicious ends like erasing the victims hard drive. This binary is available in the data accompanying this book at ch1/ircbot.exe.
We also use an example of fakepdfmalware.exe in this chapter (located at ch1/fakepdfmalware.exe). This is a malware program that has an Adobe Acrobat/PDF desktop icon to trick users into thinking theyre opening a PDF document when theyre actually running the malicious program and infecting their systems.
Chapter 2: Beyond Basic Static Analysis: x86 Disassembly
In this chapter we explore a deeper topic in malware reverse engineering: analyzing x86 disassembly. We reuse the ircbot.exe example from in this chapter.
Chapter 3: A Brief Introduction to Dynamic Analysis
For our discussion of dynamic malware analysis in s malware database for examples of ransomware.
Chapter 4: Identifying Attack Campaigns Using Malware Networks
introduces the application of network analysis and visualization to malware. To demonstrate these techniques, we use a set of high-quality malware samples used in high-profile attacks, focusing our analysis on a set of malware samples likely produced by a group within the Chinese military known to the security community as Advanced Persistent Threat 1 (or APT1 for short).
These samples and the APT1 group that generated them were discovered and made public by cybersecurity firm Mandiant. In its report (excerpted here) titled APT1: Exposing One of Chinas Cyber Espionage Units (https://www.fireeye.com/content/dam/fireeye-www/services/pdfs/mandiant-apt1-report.pdf), Mandiant found the following:
- Since 2006, Mandiant has observed APT1 compromise 141 companies spanning 20 major industries.
- APT1 has a well-defined attack methodology, honed over years and designed to steal large volumes of valuable intellectual property.
- Once APT1 has established access, they periodically revisit the victims network over several months or years and steal broad categories of intellectual property, including technology blueprints, proprietary manufacturing processes, test results, business plans, pricing documents, partnership agreements, and emails and contact lists from victim organizations leadership.
- APT1 uses some tools and techniques that we have not yet observed being used by other groups including two utilities designed to steal email: GETMAIL and MAPIGET.
- APT1 maintained access to victim networks for an average of 356 days.
- The longest time period APT1 maintained access to a victims network was 1,764 days, or four years and ten months.
- Among other large-scale thefts of intellectual property, we have observed APT1 stealing 6.5TB of compressed data from a single organization over a ten-month time period.
- In the first month of 2011, APT1 successfully compromised at least 17 new victims operating in 10 different industries.
As this excerpt of the report shows, the APT1 samples were used for high-stakes, nation statelevel espionage. These samples are available in the data accompanying this book at