Data-free Knowledge Extraction from Deep Neural Networks

Welcome to the tutorial on Data-free Knowledge Extraction to be held as part of the NCVPRIPG 2023 conference.

Abstract

Data-free Knowledge Extraction (DFKE) refers to extracting useful information from a trained deep neural network (DNN) without accessing the underlying training data over which the DNN is trained. The extracted information can be diverse. For instance, it can be a replica of the DNN itself, some sensitive information about the underlying training data, patterns from thereof, etc. DFKE can be extremely vexing, particularly in deployments like MLaaS (Machine Learning as a service). Considering the amount of data, human expertise, and computational resources typically required to learn sophisticated DNNs, it is natural to consider them intellectual property. Therefore, they need to be protected against any such attempts of extraction (referred to as attacks). On the other hand, philosophically, it would be interesting to (i) understand the utility of these trained models without their training data and (ii) formulate guarantees on the information leakage (or extraction). In this tutorial, I plan first to introduce the phenomenon of data-free model extraction and discuss different ways in which it can be manifested, both in white-box and black-box scenarios. Later, I will focus more on the potential threats of leaking sensitive information about the training data to a dishonest user in the form of different attacks. Finally, I will discuss some of the active directions to investigate further.

Topics to be discussed (may not be in the same sequence)

Introduction
- Deploying Deep models post-training: What and What not?
- Knowledge Distillation
- Noise Optimization towards CNN visualization
- Generative Adversarial Networks (GAN)
Data-free Knowledge Distillation (towards creating a replica of the target model)
- Via Noise Optimization
- Via Generative Reconstruction
- Adversarial Exploration
Data-free attacks (towards extracting sensitive information about the training data)
Conclusion and Future Directions

Slides of the tutorial

Schedule

Session 1: 2 - 4 PM, July 21, 2023 (Friday)
Session 2: 4.30 - 5.30 PM, July 21, 2023 (Friday)

Speakers

Konda Reddy Mopuri, IIT Hyderabad

Bibliography and References

Knowledge Distillation
1. Konwledge Distillation, Hinton et. al.[NIPS 2014 Workshop]
2. Survey on Knowledge Distillation, [IJCV 2021].
Noise Optimization
1. Class Impressions, Mopuri et al. [ECCV 2018]
2. Data Impressions, Mopuri et al. [ICML 2019]
3. Mining Data Impressions as a Substitute for training data [TPAMI 2021]
4. Data-free Knowledge Distillation in Deep Neural Networks, Lopes et al. [NeurIPS 2017 Workshop]
5. [Deep Dream] by Google, 2015
6. Deep Inversion, [CVPR 2020]
Generative Reconstruction
1. GANs by Ian Goodfellow et al. [NeurIPS 2014]
2. DAFL [ICCV 2019]
3. DFKA [CVPR 2021]
4. Batch Normalization Statistics: [Luo et al. 2020][Haroush et al. 2020][Besnier et al. 2019]
Arbitrary Transfer set (and/or Weakly related set)
1. Data-enriching GANs, Addepalli et al. [AAAI 2020]
2. Focus on diversity: [Fang et al. 2021][Han et al. 2021]
3. Noise and Arbitrary Data as the Transfer Set, Mopuri and Nayak et al. [WACV 2021]
4. Learning Students in the Wild, Chen et al. [CVPR 2021]
Adversarial Exploration
1. ZSKT, Micaelli et al. [NeurIPS 2019]
2. DFAD, Fang et al. [2020]
Other Applications of DFKE
1. Object Detection and Semantic Segmentation: [DFAD, 2020] [Object Detection BMVC 2021]
2. Domain Adaptation and Continual Learning: [TPAMI 2021]
Black-box setting and other attacks
1. Stealing ML models via prediction APIs [USENIX 2016](attacks on online services of Amazon and BigML)
2. Membership Inference attack, [Shokri et al. 2017]
3. Model Inversion attack, [Zhang et al. 2020]
4. DFME (soft label) [CVPR, 2021]
5. DFME (hard label) [CVPR 2022]

Contact