Different implementations of the ubiquitous convolution

less than 1 minute read

Published:

Throughout the Deep Learning field, nn.Conv2d is being used left-right-center for building efficient convolution layers in PyTorch without worrying much about how they are implemented under the hood. In this post, we will specifically gain some insights into different convolution implementations like a naive nested for-loop, Im2Col, Winograd, Strassen and FFT algorithms and infer their pros & cons based on latencies incurred on a N1 CPU and a T4 GPU. We will also relate Strassen’s algorithm to DeepMind’s recent computing breakthrough with AlphaTensor.