Linux Observability with BPF
by David Calavera and Lorenzo Fontana
Copyright 2020 David Calavera and Lorenzo Fontana. All rights reserved.
Printed in the United States of America.
Published by OReilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.
OReilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com .
- Acquisitions Editor: John Devins
- Development Editor: Melissa Potter
- Production Editor: Katherine Tozer
- Copyeditor: Kim Wimpsett
- Proofreader: Octal Publishing, LLC
- Indexer: Ellen Troutman
- Interior Designer: David Futato
- Cover Designers: Karen Montgomery, Suzy Wiviott
- Illustrator: Rebecca Demarest
- October 2019: First Edition
Revision History for the First Edition
- 2019-10-15: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781492050209 for release details.
The OReilly logo is a registered trademark of OReilly Media, Inc. Linux Observability with BPF, the cover image, and related trade dress are trademarks of OReilly Media, Inc.
The views expressed in this work are those of the authors, and do not represent the publishers views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
This work is part of a collaboration between OReilly and Sysdig. See our statement of editorial independence .
978-1-492-05020-9
[LSI]
Foreword
As a programmer (and a self-confessed dweeb), I like to stay up to date on the latest additions to various kernels and research in computing. When I first played around with Berkeley Packet Filter (BPF) and Express Data Path (XDP) in Linux, I was in love. These are such nice tools, and I am glad this book is putting BPF and XDP on the center stage so that more people can begin using them in their projects.
Let me go into detail about my background and why I fell in love with these kernel interfaces. I was working as a Docker core maintainer, along with David. Docker, if you are not familiar, shells out to iptables
for a lot of the filtering and routing logic for containers. The first patch I ever made to Docker was to fix a problem in which a version of iptables
on CentOS didnt have the same command-line flags, so writing to iptables
was failing. There were a lot of weird issues like this, and anyone who has ever shelled out to a tool in their software can likely commiserate. Not only that, but having thousands of rules on a host is not what iptables
was built for and results in performance side effects.
Then I heard about BPF and XDP. This was like music to my ears. No longer would my scars from iptables
bleed with another bug! The kernel community is even working on replacing iptables
with BPF! Hallelujah! Cilium, a tool for container networking, is using BPF and XDP for the internals of its project as well.
But thats not all! BPF can do so much more than just fulfilling the iptables
use case. With BPF, you can trace any syscall or kernel function as well as any user-space program. bpftrace gives users DTrace-like abilities in Linux from their command line. You can trace all the files that are being opened and the process calling the open ones, count the syscalls by the program calling them, trace the OOM killer, and morethe world is your oyster! BPF and XDP are also used in Cloudflare and Facebooks load balancer to prevent distributed denial-of-service attacks. I wont spoil why XDP is so great at dropping packets because you will learn about that in the XDP and networking chapters of this book!
I have had the privilege of knowing Lorenzo through the Kubernetes community. His tool, kubectl-trace
, allows users to easily run their custom tracing programs within their Kubernetes clusters.
Personally, my favorite use case for BPF has been writing custom tracers to prove to other folks that the performance of their software is not up to par or makes a really expensive number of calls to syscalls. Never underestimate the power of proving someone wrong with hard data. Dont fret, this book will walk you through writing your first tracing program so that you can do the same. The beauty of BPF lies in the fact that before now other tools used lossy queues to send sample sets to user-space for aggregation, whereas BPF is great for production because it allows for constructing histograms and filtering directly at the source of events.
I have spent half of my career working on tools for developers. The best tools allow autonomy in their interfaces for developers like you to use them for things even the authors never imagined. To quote Richard Feynman, I learned very early the difference between knowing the name of something and knowing something. Until now you might have only known the name BPF and that it might be useful to you.
What I love about this book is that it gives you the knowledge you need to be able to create all new tools using BPF. After reading and following the exercises, you will be empowered to use BPF like a super power. You can keep it in your toolkit to use on demand when its most needed and most useful. You wont just learn BPF; you will understand it. This book is a path to open your mind to the possibilities of what you can build with BPF.
This developing ecosystem is very exciting! I hope it will grow even larger as more people begin wielding BPFs power. I am excited to learn about what the readers of this book end up building, whether its a script to track down a crazy software bug or a custom firewall or even infrared decoding. Be sure to let us all know what you build!
Jessie Frazelle
Preface
In 2015, David was working as a core developer for Docker, the company that made containers popular. His day-to-day work was divided between helping the community and growing the project. Part of his job was reviewing the firehose of pull requests that members of the community sent us; he also had to ensure that Docker worked for all kinds of scenarios, including high-performance workloads that were running and provisioning thousands of containers at any point of time.
To diagnose performance issues at Docker, we used flame graphs, which are advanced visualizations to help you navigate that data easily. The Go programming language makes it really easy to measure and extract application performance data using an embedded HTTP endpoint and generate graphs based on that data. David wrote an article about Gos profiler capabilities and how you can use its data to generate flame graphs. A big pitfall about the way that Docker collects performance data is that the profiler is disabled by default, so if youre trying to debug a performance issue, the first action to take is to restart Docker. The main issue with this strategy is that by restarting the service, youll probably lose the relevant data that youre trying to collect, and then you need to wait until the event youre trying to trace happens again. In Davids article about Docker flame graphs, he mentioned this as a necessary step to measure Dockers performance, but that it didnt need to be this way. This realization made him start researching different technologies to collect and analyze any applications performance, which led him to discover BPF.