General guideline for completing the assignment:

Feel free to stack-overflow, Google, and research any questions/issues you have. You are also welcome to use any programming language. The only thing we ask is that this work is done independently by you without the help of your family/friends. If the assignment requires coding, please write necessary docs to instruct people to run your codes.

Question: Knowledge Distillation (1.5 hours)

  1. We have trained a large neural network for an image classification task and now want to deploy it on our mobile app. Unfortunately, the model is too large to be run on mobile devices. We want to apply knowledge distillation to obtain a much smaller model that still achieves decent accuracy. If you're not familiar with knowledge distillation, have a look at this paper: Distilling the Knowledge in a Neural Network. We will work with the CIFAR10 dataset. Download a pretrained ResNet model (e.g for PyTorch or tensorflow) and report the accuracy and inference time of the model.
  2. Set up a significantly smaller model with any architecture of your choice that can train quickly on your local machine. Train the small network on a combination of distillation loss on the logits of the large networks and a standard loss between the small network's output and the ground truth labels. Use a higher temperature for the softmax layer in the distillation loss. Train the smaller network for a while and report both the achieved accuracy and its inference time. (Don't worry too much about finding perfect hyperparameters for high accuracy or training for many epochs, we are more interested in seeing if you developed a good training process for this problem which could be used to find specific hyperparameters later on).