Lectures - Explainability in Machine Learning / July-Dec 2023

(xml-0) Introduction
tl;dr: Course logistics and motivation toards Explainability in Machine Learning.
[slides]

Suggested Readings:

Ch-3.1 from Interpretable Machine Learning book by Christoph Molnar
Towards A Rigorous Science of Interpretable Machine Learning, Doshi-Velez and Kim, 2017
Interpretable machine learning: definitions, methods, and applications, Murdoch et al. 2019
Explanation in artificial intelligence: Insights from the social sciences, Tim Miller, 2019
Examples are not Enough, Learn to Criticize! Criticism for Interpretability, Kim et al. 2016

(xml-1) Taxonomy, Scope, and Evaluation of Explainability
tl;dr: Attending different notions and properties surrounding Explainability.
[slides]

Suggested Readings:

Ch 3.2 to 3.6 from Interpretable Machine Learning book by Christoph Molnar

(xml-2) Inherently Interpretable Models (Linear and Logistic Regression)
tl;dr: Interpretability in the basic linear models
[slides]

Suggested Readings:

(xml-3) Inherently Interpretable Models (GLM, GLA, DT)
tl;dr: Interpretability in the generalized linear models and Decision Trees
[slides]

Suggested Readings:

Ch 5.3 to 5.7 from Interpretable Machine Learning book by Christoph Molnar

(xml-4) Model-Agnostic Methods - PDP and ALE
tl;dr: Global methods describe the average behavior of a machine learning model; Partial Dependence Plots and Accumulated Local Effects
[slides]

Suggested Readings:

Ch 6, Ch 7, Ch 8.1 and 8.2 from Interpretable Machine Learning book by Christoph Molnar

(xml-5) Model-Agnostic Methods - feature interactions, surrogate, etc.
tl;dr: Global methods describe the average behavior of a machine learning model - some more methods in the global category
[slides]

Suggested Readings:

Ch 8.3 and 8.7 from Interpretable Machine Learning book by Christoph Molnar

(xml-6) Local Model-Agnostic Methods - LIME.
tl;dr: Locally approximating a black-box ML model with an interpretable surrogate
[slides]

Suggested Readings:

LIME, SIGKDD, 2016

(xml-7) Local Model-Agnostic Methods - Counterfactual Explanations.
tl;dr: Imagining a hypothetical reality to explain predictions of individual instances
[slides]

Suggested Readings:

(xml-8) Interpreting Neural Networks (Feature Visualization)
tl;dr: Methods to understand the units (learned) in DNNs
[slides]

Suggested Readings:

Distil on Visualization

(xml-9) Interpreting Neural Networks (Pixel Attribution)
tl;dr: Highlighting the input features (e.g., pixels) that are relevant for a DNNprediction
[slides]

Suggested Readings:

(xml-10) Interpreting Neural Networks (Concept-based Explanations)
tl;dr: Learning a low-dim representation that can faithfully explain the downstream task of the DNN
[slides]

Suggested Readings:

(xml-11) Causal Inference and Explainability
tl;dr: Linking Causalily (cause and effect) and Explainability in Machine learning
[slides]

Suggested Readings:

(xml-12) Attention and Explanations
tl;dr: Does the Attention mechanism provide Explanation for the model?
[slides]

Suggested Readings:

(xml-13) Generating Robust Counterfactuals (Algorithmic Recourse)
tl;dr: Guest lecture: What factors need to be considered while generating counterfactuals for achieving algorithmic recourse?
[slides]

Suggested Readings:

(xml-14) Causal Antehoc Explanations
tl;dr: Guest lecture: Causal associations in neural networks (direct, indirect, and total causal effects)
[slides]

Suggested Readings:

(xml-15) Mechanistic Interpretability
tl;dr: Guest lecture: On the internals of LLMs
[slides]

Suggested Readings: