Chapter 1. An Overview of Explainability
A Note for Early Release Readers
With Early Release ebooks, you get books in their earliest formthe authors raw and unedited content as they writeso you can take advantage of these technologies long before the official release of these titles.
This will be the 2nd chapter of the final book. Please note that the GitHub repo will be made active later on.
If you have comments about how we might improve the content and/or examples in this book, or if you notice missing material within this chapter, please reach out to the editor at rfernando@oreilly.com.
Explainability has been a part of machine learning since the inception of AI. The very first AIs, rule-based chain systems, were specifically constructed to provide a clear understanding of what led to a prediction. The field continued to pursue explainability as a key part of models, partly due to a focus on general AI but also to justify that the research was sane and on the right track, for many decades until the complexity of model architectures outpaced our ability to explain what was happening. After the invention of ML neurons and neural nets in the 1980s, research into explainability waned as researchers focused on surviving the first AI winter by turning to techniques that were explainable because they relied solely on statistical techniques that were well-proven in other fields. Explainability in its modern form (and what we largely focus on in this book) was revived, now as a distinct field of research, in the mid 2010s in response to the persistent question of this model works really well but how?
In just a few years, the field has gone from obscurity to one of intense interest and investigation. Remarkably, many powerful explainability techniques have been invented, or repurposed from other fields, in the short time since. However, the rapid transition from theory to practice, and the increasing need for explainability from users who interact with ML, such as users and business stakeholders, has led to growing confusion about the capability and extent of different methods. Many fundamental terms of explainability are routinely used to represent different, even contradictory, ideas, and it is easy for explanations to be misunderstood due to practitioners rushing to provide assurance that ML is working as expected. Even the terms explainability and interpretability are routinely swapped, despite having very different focuses. For example, while writing this book, we were asked by a knowledgeable industry organization to describe explainable and interpretable capabilities of a system, but the definitions of explainability and interpretability were flipped in comparison to how the rest of industry defines the terms! Recognizing the confusion over explainability, the purpose of this chapter is to provide a background and common language for future chapters.
What Are Explanations?
When a model makes a prediction, Explainable AI methods generate an explanation that gives insight into the models behavior as it arrived at that prediction. When we seek explanations, we are trying to understand why did X happen? Figuring out this Why can help us build a better comprehension of what influences a model, how that influence occurs, and where the model performs (or fails). As part of building our own mental models, we often find a pure explanation to be unsatisfactory, so we are also interested in explanations which provide a counterfactual, or foil, to the original situation. Counterfactuals are scenarios which seek to provide an opposing, plausible, scenario of why X did not happen. If we are seeking to explain why did it rain today? we may also try to find the counterfactual explanation for why did it not rain today [in a hypothetical world]? While our primary explanation for why it rained might include temperature, barometric pressure, and humidity, it may be easier to explain that it did not rain because there were no clouds in the sky, implying that clouds are part of an explanation for why it does rain.
We also often seek explanations that are causal, or in the form of X was predicted because of Y. These explanations are attractive because they give an immediate sense of what a counterfactual prediction would be: remove X and presumably the prediction will no longer be Y. It certainly sounds more definitive to say it rains because there are clouds in the sky. However, this is not always true; rain can occur even with clear skies in some circumstances. Establishing causality with data-focused explanations is extremely difficult (even for time-series data), and no explainability techniques have been proposed that are both useful in practice and have a high level of guarantee in their analysis. Instead, if you want to establish causal relationships within your model or data, we recommend you explore the field of interpretable, causal models.
Explainability Consumers
Understanding and using the results of Explainable AI can look very different depending on who is receiving the explanation. As a practitioner, for example, your needs from an explanation are very different from those of a non-technical individual who may be receiving an explanation as part of an ML system in production that they may not even know exists!
Understanding the primary types of users, or personas, will be helpful as you learn about different techniques so you can assess which will best suit your audiences needs. In Chapter 7, we will go into more detail about how to build good experiences for these different audiences with explainability.