In this guide, we’re going to continue improving our toy example of a linear regression model modeling the equation y = 2x + 1
. We’re going to introduce five changes to our overall flow:
GPUs are powerful devices capable of performing matrix operations at incredible speeds. However, they do struggle with sequential or I/O-blocked operations. As a consequence, we’ll preprocess our data first and then put it on the GPU. Preprocessing for us just means making the train/test/val splits for this example.
import torch
import torch.nn as nn
from sklearn.model_selection import train_test_split # check the import
import numpy as np
SEED = 0
torch.manual_seed(0)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
X = np.array([[i] for i in range(1000)])
Y = np.array([[2 * i + 1] for i in range(1000)])
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=SEED)
X_test, X_val, Y_test, Y_val = train_test_split(X_test, Y_test, test_size=0.5, random_state=SEED)
X_train = torch.Tensor(X_train)
X_train.to(device)
X_test = torch.Tensor(X_test)
X_test.to(device)
X_val = torch.Tensor(X_val)
X_val.to(device)
Y_train = torch.Tensor(Y_train)
Y_train.to(device)
Y_test = torch.Tensor(Y_test)
Y_test.to(device)
Y_val = torch.Tensor(Y_val)
Y_val.to(device)
There is a lot of redundant code in here with making everything a torch.Tensor
and sending it to the device but I opted for that instead of potentially confusing unpacking syntax.
Nothing has changed about this.
class LinearRegression(nn.Module):
def __init__(self):
super().__init__()
self.L1 = nn.Linear(1, 1, bias=True)
def forward(self, X):
return self.L1(X)
This is the step with the most additions by far. Basically every single change is represented in the trafining loop.
We’re going to add a SummaryWriter
to log our loss to Tensorboard and a scheduler
to control our learning rate as we go. We’ll use a simple StepLR
since the model converges quickly so we don’t need to get super cute with the way the learning rate changes.
from torch.optim.lr_scheduler import StepLR # check import
from torch.nn import MSELoss
from torch.utils.tensorboard import SummaryWriter
model = LinearRegression()
model.to(device)
writer = SummaryWriter('/tensorboard')
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
loss_function = MSELoss()
scheduler = StepLR(optimizer, step_size=1000, gamma=0.99)
N_STEPS = 25000
for i in range(N_STEPS):
Y_pred_train = model.forward(X_train)
loss = loss_function(Y_pred_train, Y_train)
model.zero_grad()
loss.backward()
optimizer.step()
scheduler.step()
if (i + 1) % 1000 == 0:
print(f"Iteration {i + 1} @ LR {optimizer.param_groups[0]['lr']:.4f}: {loss.item()}")
Y_pred_test = model.forward(X_test)
test_loss = loss_function(Y_pred_test, Y_test)
writer.add_scalar('loss/train', loss.item(), i)
writer.add_scalar('loss/test', test_loss.item(), i)
torch.save(model.state_dict(), f'/model_checkpoints/checkpoint_{(i + 1):5d}.pth')
writer.flush()
writer.close()