Lectures - Explainability in Machine Learning / July-Dec 2024

(xml-1) Introduction
tl;dr: Attending different notions and properties surrounding Explainability.
[slides]

Suggested Readings:

Ch-3.1 from Interpretable Machine Learning book by Christoph Molnar
Towards A Rigorous Science of Interpretable Machine Learning, Doshi-Velez and Kim, 2017
Interpretable machine learning: definitions, methods, and applications, Murdoch et al. 2019
Explanation in artificial intelligence: Insights from the social sciences, Tim Miller, 2019
Examples are not Enough, Learn to Criticize! Criticism for Interpretability, Kim et al. 2016

(xml-2) Taxonomy, Scope, and Evaluation of Explainability
tl;dr: Different notions and properties surrounding Explainability.
[slides]

Suggested Readings:

Ch 3.2 to 3.6 from Interpretable Machine Learning book by Christoph Molnar

(xml-3) Inherently Interpretable Models
tl;dr: Models with straightforward explanations.
[slides]

Suggested Readings:

(xml-4) Local Model-Agnostic Methods - LIME.
tl;dr: Locally approximating a black-box ML model with an interpretable surrogate
[slides]

Suggested Readings:

(xml-5) Local Model-Agnostic Methods - Counterfactual Explanations.
tl;dr: Imagining a hypothetical reality to explain predictions of individual instances
[slides]

Suggested Readings:

(xml-6) Interpreting Neural Networks (Feature Visualization)
tl;dr: Methods to understand the units (learned) in DNNs
[slides]

Suggested Readings:

Distil on Visualization

(xml-7) Interpreting Neural Networks (Pixel Attribution)
tl;dr: Highlighting the input features (e.g., pixels) relevant for a DNN prediction.
[slides]

Suggested Readings:

(xml-8) Interpreting Neural Networks (Concept-based Explanations)
tl;dr: Learning a low-dim human-understandable representation that can explain the inference of the DNN.
[slides]

Suggested Readings:

(xml-9) Attention and Explanations
tl;dr: Does the Attention mechanism provide Explanation for the model?
[slides]

Suggested Readings: