In this guide, we’re going to continue improving our toy example of a linear regression model modeling the equation y = 2x + 1. We’re going to introduce five changes to our overall flow:

  1. Properly splitting data into train/test/validation (we won’t use the test set since we don’t really hyperparameter tune)
  2. Putting data and models on the GPU for better performance
  3. Adding a learning rate optimizer
  4. Adding model checkpointing
  5. Adding Tensorboard

Step 1: Initializing the Data

GPUs are powerful devices capable of performing matrix operations at incredible speeds. However, they do struggle with sequential or I/O-blocked operations. As a consequence, we’ll preprocess our data first and then put it on the GPU. Preprocessing for us just means making the train/test/val splits for this example.

import torch
import torch.nn as nn
from sklearn.model_selection import train_test_split # check the import
import numpy as np

SEED = 0

torch.manual_seed(0)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

X = np.array([[i] for i in range(1000)])
Y = np.array([[2 * i + 1] for i in range(1000)])

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=SEED)
X_test, X_val, Y_test, Y_val = train_test_split(X_test, Y_test, test_size=0.5, random_state=SEED)

X_train = torch.Tensor(X_train)
X_train.to(device)
X_test = torch.Tensor(X_test)
X_test.to(device)
X_val = torch.Tensor(X_val)
X_val.to(device)

Y_train = torch.Tensor(Y_train)
Y_train.to(device)
Y_test = torch.Tensor(Y_test)
Y_test.to(device)
Y_val = torch.Tensor(Y_val)
Y_val.to(device)

There is a lot of redundant code in here with making everything a torch.Tensor and sending it to the device but I opted for that instead of potentially confusing unpacking syntax.

Step 2: Making the Linear Regression

Nothing has changed about this.

class LinearRegression(nn.Module):
    def __init__(self):
        super().__init__()
        self.L1 = nn.Linear(1, 1, bias=True)

    def forward(self, X):
        return self.L1(X)

Step 3: The Training Loop

This is the step with the most additions by far. Basically every single change is represented in the trafining loop.

We’re going to add a SummaryWriter to log our loss to Tensorboard and a scheduler to control our learning rate as we go. We’ll use a simple StepLR since the model converges quickly so we don’t need to get super cute with the way the learning rate changes.

from torch.optim.lr_scheduler import StepLR # check import
from torch.nn import MSELoss
from torch.utils.tensorboard import SummaryWriter

model = LinearRegression()
model.to(device)

writer = SummaryWriter('/tensorboard')

optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
loss_function = MSELoss()
scheduler = StepLR(optimizer, step_size=1000, gamma=0.99)

N_STEPS = 25000

for i in range(N_STEPS):
	Y_pred_train = model.forward(X_train)
	loss = loss_function(Y_pred_train, Y_train)
	model.zero_grad()
	loss.backward()
	optimizer.step()
	scheduler.step()

	if (i + 1) % 1000 == 0:
		print(f"Iteration {i + 1} @ LR {optimizer.param_groups[0]['lr']:.4f}: {loss.item()}")

		Y_pred_test = model.forward(X_test)
		test_loss = loss_function(Y_pred_test, Y_test)

		writer.add_scalar('loss/train', loss.item(), i)
		writer.add_scalar('loss/test', test_loss.item(), i)

		torch.save(model.state_dict(), f'/model_checkpoints/checkpoint_{(i + 1):5d}.pth')

writer.flush()
writer.close()