Deep Learning for Computer Vision / Aug-Nov 2022

Updates

  • 2022-11-15 00:00:00 +0000: New Lecture is up: (dl4cv-20) Attetion++ [slides]
  • 2022-11-11 00:00:00 +0000: New Lecture is up: (dl4cv-19) Learning Efficient DL Models (Compressing the Models) [slides]
  • 2022-11-04 00:00:00 +0000: New Lecture is up: (dl4cv-18) Adversarial Images [slides]
  • 2022-11-01 00:00:00 +0000: New Lecture is up: (dl4cv-17c) Generative Adversarial Networks [example-code] [slides]
  • 2022-10-31 00:00:00 +0000: New Lecture is up: (dl4cv-17b) Variational Autoencoder [slides]
  • 2022-10-28 00:00:00 +0000: New Lecture is up: (dl4cv-17a) Autoencoders [slides]
  • 2022-10-21 00:00:00 +0000: New Lecture is up: (dl4cv-17) Generative Models [slides]

Course Description

Computer vision deals with algorithmic analysis and understanding of the visual world via images and videos. It enjoys an important role in various applications such as healthcare, autonomy, security, entertainment, manufacturing, etc. The recent emergence of Deep learning has affected the field of computer vision significantly. Deep learning techniques have remarkably improved the performance of vision systems using large datasets. This course doesn't focus on the traditional computer vision techniques but discusses the deep learning methods for computer vision. Starting with the basics, the course covers the recent advancements in the area of computer vision driven by deep learning. Students are expected to have done a course on Machine learning, and preferably have some introduction to deep learning too.

Course Contents

Part-1: Foundations of Deep learning (Implementing and training different types of neural networks )

  1. Starting from an artificial neuron we aim to understand feed forward and recurrent architectures of Artificial Neural Networks. We visit Neurons (MP, perceptron), MLP, CNNs, and RNNs (LSTM and GRU). We will understand how to train these models (via the Gradient Descent technique using the Backpropagation algorithm).
  2. Realizing these architectures in the PyTorch framework. We understand the underlying computational graph, Autograd function, batch processing of the data, etc. culminating into writing PyTorch modules.

Part-2: Applications in Computer Vision (with a research flavour)

  1. Object recognition, detection, semantic segmentation
  2. Vision and Language
  3. Generative models: GANs and VAEs
  4. Recent trends (Attention, Transformers, learning with limited supervision, etc.)

Logistics

Open Elective: Senior (final year) B.Tech, M.Tech, and Ph.D. students are the target students

Class Room: 5201 (CORE-5)

Timings: D1 slot (Mon and Tue 4-4:55 PM and Fri 3-3:55 PM)

Visit this page regularly for the updates and information regarding the course.


Instructors

Teaching Assistants

Kamal Kumar (kkamal@iitg.ac.in)