ML Infra Best Practices: Explainability

5 min readMar 2, 2020

What makes AI algorithms difficult to understand is also what makes them great predictors. Machine learning is increasingly being used in critical areas such as healthcare, banking and criminal justice systems.

Regulations Galore

We’re already seeing ample regulation being introduced in various parts of the world:

General Data Protection Regulation Articles 13 and 22, were the first regulation about automated decision-making mandating that anyone given an automated decision such as a loan rejection, insurance claim, et al has the right to be informed and the right to a meaningful explanation.
On December 4th, the Federal Reserve Board, the Consumer Financial Protection Bureau (CFPB), the Federal Deposit Insurance Corporation (FDIC) deemed the importance of consumer protection on use of alternative source of data (such as social media, cash flows of a person) across a wide range of banking operations like credit underwriting, fraud detection, marketing, pricing, servicing, and account management.

Take for example the medical field. What if there was a machine learning model which could accurately predict which diseases one would get but not explain the factors which are the most critical? For most part, such an algorithm which human experts cannot explain will get rejected by the FDA.

So what makes AI in-explainable?

In a report by Forrester, Evoke Trust with Explainable AI, November 2018, challenges with explainability of neural networks are explicitly called out.

“In the case of neural networks, this is because they automatically extract features from data and weight those features through a process called backpropagation. Backpropagation continuously optimizes toward a specific goal, such as predictive accuracy, but the resulting logic gets obscured by the manifold interactions between neurons in the network’s hidden layers.”

Linear models like logistic regression and trees which are intuitive are more interpretable, easy to validate and explain to a non-expert in data science.

Do explainable models mean poor performance? Is regulation the only reason why I need explainability?

There is typically a tradeoff between model interpretability and accuracy. However, there are studies where the reverse has been shown to be true, i.e. interpretability leads to more accuracy such as in this talk by Dr. Cynthia Rudin.

Model performance vs. interpretability (Source: https://www.datascience.com)

There are more reasons why explainability is seeing a surge in importance. Some of the benefits of interpretability and explainability include:

Bias is another reason for the focus on interpretability and explainability to ensure our algorithms don’t discriminate against certain groups.
Privacy to ensure that sensitive data is protected.
Debugging: Neural networks are not infallible. If we cannot explain how a neural network is functioning we will probably not be able to catch cases where third parties try to fool the algorithm itself such as the one pixel attack.
Human element of using a model: Then there are fields like medical sciences. What if a model accurately predicted that certain patients were more like to suffer from cancer but could not explain why that is the case? Technically you should be informing the patient but the best answer you’ll have is that the algorithm said so. Medical sciences is as much a human problem as a scientific one.
Improving robustness, improving data collection strategies and improving feature engineering

Note the usage of the word interpretability in the above paragraph, which brings me to the next popular question.

Wait, so interpretability is different from explainability?

Interpretability is about the extent to which a cause and effect can be observed within a system. Or, to put it another way, it is the extent to which you are able to predict what is going to happen, given a change in input or algorithmic parameters. Explainability, on the other hand, is the extent to which the importance of the various attributes or internal mechanics of the machine learning system can be explained.

To put it in a coarse way, interpretability is at the surface level where you can tell how a model output will change depending on certain factors. Explainability on the other hand is diving deeper into how it actually works. Interpretability is thus a prerequisite for explainability.

So how does one go about interpretable and explainable models?

Interpretable Models

The easiest way to have interpretability is to start using an interpretable model in the first place, such as linear models or decision trees. Models typically include: linear regression, kNN, Naive Bayes, logistic regression, rule fits and many more.

Feature Importance

Feature importance tells you the degree to which a feature is relevant for a particular model. Certain frameworks like scikit-learn and XGBoost support computing and getting the feature importance through an API. Frameworks like Skater compute this based on an information theoretic criteria, measuring the entropy in the change of predictions, given a perturbation of a given feature.

Model-agnostic Interpretation Tools

The other option is the use of model-agnostic interpretation tools that can be applied to any supervised machine learning model. There are broadly two categories for these tools — global and per data point. Here is an overview of some of them:

Partial Dependence Plots (PDP): Partial Dependence describes the marginal impact of a feature on model prediction, holding other features in the model constant. PDPs can show the relationship between the target variable and the feature. Note that this is a global method i.e. it takes into account all values.
Individual Conditional Explanations (ICE): ICE displays show how the instance’s prediction changes when a feature changes with one line per data point. This is equivalent to PDP but for individual instances, hence, has better results considering it does not average over all data points.
Accumulated Local Effects: This is a faster and unbiased alternative to PDPs and is hence another global technique.
Global Surrogate Models: This is exactly what it sounds like, it’s a global method and it’s a surrogate model which tries to approximate the predictions of the black box model but is itself a more interpretable model.
Local Surrogate Models: LIME is a popular framework which employs a local (not global) Interpretable surrogate model. Instead of trying to fit a global surrogate model, LIME focuses on fitting local surrogate models to explain why single predictions were made.
SHAP: Shapley values are derived from a game theory and is a method of assigning payout values in a game where there are multiple players. In the machine learning world, the game is the prediction task for a single instance, the players are the different feature of a machine learning model and the payout is their importance. SHAP is a way to explain individual interpretations using those principles.

In a future post I will explain in detail the above methods and a framework on which method to use when.