Lectures
You can access the lecture slides here. We will try to upload them before their corresponding classes.
-
(xml-1) Introduction
tl;dr: Attending different notions and properties surrounding Explainability.
[slides]
Suggested Readings:
- Ch-3.1 from Interpretable Machine Learning book by Christoph Molnar
- Towards A Rigorous Science of Interpretable Machine Learning, Doshi-Velez and Kim, 2017
- Interpretable machine learning: definitions, methods, and applications, Murdoch et al. 2019
- Explanation in artificial intelligence: Insights from the social sciences, Tim Miller, 2019
- Examples are not Enough, Learn to Criticize! Criticism for Interpretability, Kim et al. 2016
-
(xml-2) Taxonomy, Scope, and Evaluation of Explainability
tl;dr: Different notions and properties surrounding Explainability.
[slides]
Suggested Readings:
-
-
(xml-4) Local Model-Agnostic Methods - LIME.
tl;dr: Locally approximating a black-box ML model with an interpretable surrogate
[slides]
Suggested Readings:
-
(xml-5) Local Model-Agnostic Methods - Counterfactual Explanations.
tl;dr: Imagining a hypothetical reality to explain predictions of individual instances
[slides]
-
(xml-6) Interpreting Neural Networks (Feature Visualization)
tl;dr: Methods to understand the units (learned) in DNNs
[slides]
Suggested Readings:
-
(xml-7) Interpreting Neural Networks (Pixel Attribution)
tl;dr: Highlighting the input features (e.g., pixels) relevant for a DNN prediction.
[slides]
Suggested Readings:
- Visualising image classification models and saliency maps (ICLRW-2012)
- Visualizing and understanding CNNs (ECCV-2013)
- SmoothGrad (2017)
- Interpreting DNNs is fragile (AAAI-2019)
- Sanity Checks for Saliency Maps (NeurIPS-2018)
- The (un) reliability of saliency methods (Explainable AI-2019)
- Sanity Checks for Saliency Metrics (AAAI-2020)
-
(xml-8) Interpreting Neural Networks (Concept-based Explanations)
tl;dr: Learning a low-dim human-understandable representation that can explain the inference of the DNN.
[slides]
-
(xml-9) Attention and Explanations
tl;dr: Does the Attention mechanism provide Explanation for the model?
[slides]
Suggested Readings: