inside front cover
Logging in Action
With Fluentd, Kubernetes and more
Phil Wilkins
Foreword by Christian Posta and Anurag Gupta
To comment go to liveBook
Manning
Shelter Island
For more information on this and other Manning titles go to
www.manning.com
Copyright
For online information and ordering of these and other Manning books, please visit www.manning.com. The publisher offers discounts on these books when ordered in quantity.
For more information, please contact
Special Sales Department
Manning Publications Co.
20 Baldwin Road
PO Box 761
Shelter Island, NY 11964
Email: orders@manning.com
2022 by Manning Publications Co. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.
Recognizing the importance of preserving what has been written, it is Mannings policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.
| Manning Publications Co. 20 Baldwin Road Technical PO Box 761 Shelter Island, NY 11964 |
Development editor: | Katie Sposato Johnson |
Technical development editor: | Sam Zaydel |
Review editor: | Aleksandar Dragosavljevi |
Production editor: | Andy Marinkovich |
Copy editor: | Carrie Andrews |
Proofreader: | Melody Dolab |
Technical proofreader: | Kerry Koitzsch |
Typesetter and cover designer | Marija Tudor |
ISBN: 9781617298356
front matter
foreword
Software is the lifeblood of most industries today and can be a differentiator for those companies that can iterate quickly and find customer value before their competitors. Some of the recent trends that allow large organizations to move fast include the adoption of cloud platforms and microservice architectures. While some of the trends have evolved, one thing has remained constant: when things go wrong, we need to quickly understand where to look to fix the problem. Microservices and ephemeral cloud infrastructure (containers, etc.) exacerbate this problem.
I vividly remember working on a particularly nasty distributed problem for a client a few years back wherein a set of services would communicate with each other to provide some business function, and after six days (almost on the dot!), the set of services would all come crashing down. The resulting outage caused significant revenue loss for this client. The client decided to restart all of the services one by one after four days to avoid the problem.
After observing the system for a few days, I noticed that the memory usage of all of the services involved in the call graph was growing significantly, so I worked with the client to safely capture memory and thread dumps to understand what was happening. I determined that a particular buffer was getting filled, but when looking through the code it was very difficult to identify why this was happening. The system included both blocking and nonblocking code on various threads, which made it difficult to work with. I had to turn to a tried-and-true foundation of working with distributed systems to help diagnose the issue: logging events.
After a few days spent diligently poring over many hundreds of thousands of log lines across the various services, I was able to see that a certain combination of messages that flowed through the system triggered a memory leak in all of the services, which would eventually cause an OOM or out-of-memory event in the services.
Although logging helped significantly in this endeavor, it was not easy. The logging was not consistent across the services, the timestamps were wrong, and the technology used to pull the logs from the machines would sometimes fail, crash, or corrupt the log files. We also lost valuable log data as the services were restarted after four days because the client could not take an outage. If the client had a better logging and observability architecture, a lot of this would have been simplified and would have reduced the time to pinpoint the OOM issue.
In this book, Phil Wilkins does an amazing job of conveying the principles of good logging patterns and demonstrates this with concrete technology and examples using a ubiquitous log collection and aggregation technology called Fluentd. Fluentd is used to collect, unify, and stream logging data from a variety of systems to a centralized data store, which can then be used for proper analysis. Phil walks the reader through building a logging system, taking into account such things as timestamps, structured human-readable data, and more complex things such as routing and massaging the logging data.
If youre building distributed systems such as microservices architectures, you will want to seriously consider your logging and observability architecture to support your day-to-day operations. This book will be a useful companion as you embark on your journey.
Christian Posta , VP, Global Field CTO at Solo.io
I started my Fluentd journey seven years ago by integrating the project as the core piece of Microsoft Azures Log Analytic Linux agent. The initial learning curve was challenging; however, the benefits we received from a growing community, plugin ecosystem, and ease of extensibility made the project a favorite within Azure environments. I then jumped to Treasure Data, where I managed the project, and afterward joined Elastic, where I learned of other logging toolsets. After admiring Fluentd from afar, I finally left Elastic, started Calyptia, a company built around the Fluentd ecosystem, and became a project maintainer.
When starting as a maintainer, I immersed myself in the community, surveying users about their pains and where we could do better. The community highlighted their knowledge gaps on getting started and where to find in-depth explanations of certain topics, and asked for more concrete examples.
In a happy coincidence, I also met Phil Wilkins while chatting with the community and had the opportunity to read his work Logging in Action. Phil has immense talent for deciphering complex topics and providing easy-to-understand visuals and instruction. Logging in Action fills many of the communitys gaps with architecture details and deep step-by-step explanations.