Learning rates are important

And other lessons learnt creating my first RNN, LSTM and CNN

Lovkush Agarwal

Feb 05, 2024

A couple of months ago I wrote my first RNN, LSTM and CNN using pytorch and Google Colab notebooks.

The main things I learnt from this experience:

GPT-4 was invaluable for my learning. It greatly reduces the barrier to entry because it can generate example code, explain what different parts of the code are doing, explain how to setup Google Colab notebooks, etc.
GPT-4 is not perfect. It did make two or three subtle errors in its example code. What is impressive though is when I asked about those bits of the code, it recognized its mistake and corrected itself!
For the CNNs, I set myself the goal of getting 99% accuracy on the test set for MNIST. The first few architectures were getting around 97 to 98% accuracy. The thing that let me achieve 99+% accuracy was introducing a learning rate scheduler that would reduce the learning rate over time.
I found it hard to know with high confidence whether I wrote the internals of the networks exactly correctly or not. It seems easy to introduce tiny hard-to-detect bugs.
PyTorch has various minor conventions that introduce friction into the learning process. The one that stood out most to me is that PyTorch’s CrossEntropy function does soft max computation for you. This is not intuitive at all - and is one concrete example of how using GPT sped up my learning.
Not new, but this experiences reinforces the general fact that there is a lot of tacit knowledge that can only gained by getting your hands dirty. For example, I had previously read and heard that training neural networks is more of an art than a science, but I can now appreciate this a lot more having tried training some networks of my own.

It's Not Obvious

Learning rates are important

And other lessons learnt creating my first RNN, LSTM and CNN