Lectures
You can download the lectures here. We will try to upload lectures prior to their corresponding classes.
-
(xml-0) Introduction
tl;dr: Course logistics and motivation toards Explainability in Machine Learning.
[slides]
Suggested Readings:
- Ch-3.1 from Interpretable Machine Learning book by Christoph Molnar
- Towards A Rigorous Science of Interpretable Machine Learning, Doshi-Velez and Kim, 2017
- Interpretable machine learning: definitions, methods, and applications, Murdoch et al. 2019
- Explanation in artificial intelligence: Insights from the social sciences, Tim Miller, 2019
- Examples are not Enough, Learn to Criticize! Criticism for Interpretability, Kim et al. 2016
-
(xml-1) Taxonomy, Scope, and Evaluation of Explainability
tl;dr: Attending different notions and properties surrounding Explainability.
[slides]
Suggested Readings:
-
(xml-2) Inherently Interpretable Models (Linear and Logistic Regression)
tl;dr: Interpretability in the basic linear models
[slides]
-
(xml-3) Inherently Interpretable Models (GLM, GLA, DT)
tl;dr: Interpretability in the generalized linear models and Decision Trees
[slides]
Suggested Readings:
-
(xml-4) Model-Agnostic Methods - PDP and ALE
tl;dr: Global methods describe the average behavior of a machine learning model; Partial Dependence Plots and Accumulated Local Effects
[slides]
-
(xml-5) Model-Agnostic Methods - feature interactions, surrogate, etc.
tl;dr: Global methods describe the average behavior of a machine learning model - some more methods in the global category
[slides]
Suggested Readings:
-
(xml-6) Local Model-Agnostic Methods - LIME.
tl;dr: Locally approximating a black-box ML model with an interpretable surrogate
[slides]
Suggested Readings:
-
(xml-7) Local Model-Agnostic Methods - Counterfactual Explanations.
tl;dr: Imagining a hypothetical reality to explain predictions of individual instances
[slides]
-
(xml-8) Interpreting Neural Networks (Feature Visualization)
tl;dr: Methods to understand the units (learned) in DNNs
[slides]
Suggested Readings:
-
(xml-9) Interpreting Neural Networks (Pixel Attribution)
tl;dr: Highlighting the input features (e.g., pixels) that are relevant for a DNNprediction
[slides]
-
(xml-10) Interpreting Neural Networks (Concept-based Explanations)
tl;dr: Learning a low-dim representation that can faithfully explain the downstream task of the DNN
[slides]
-
(xml-11) Causal Inference and Explainability
tl;dr: Linking Causalily (cause and effect) and Explainability in Machine learning
[slides]
-
(xml-12) Attention and Explanations
tl;dr: Does the Attention mechanism provide Explanation for the model?
[slides]
Suggested Readings:
-
(xml-13) Generating Robust Counterfactuals (Algorithmic Recourse)
tl;dr: Guest lecture: What factors need to be considered while generating counterfactuals for achieving algorithmic recourse?
[slides]
Suggested Readings:
-
(xml-14) Causal Antehoc Explanations
tl;dr: Guest lecture: Causal associations in neural networks (direct, indirect, and total causal effects)
[slides]
Suggested Readings:
-
(xml-15) Mechanistic Interpretability
tl;dr: Guest lecture: On the internals of LLMs
[slides]
Suggested Readings: