Convolution Neural Networks

Conv Layer

Using a convolution with a stride of 2 instead of max pooling essentially achieves the same goal of downsampling an image by reducing its spatial dimensions, but with the key difference that the convolution layer can learn more complex feature combinations from overlapping regions, while max pooling only selects the maximum value within a window, potentially losing information about the finer details within that region; making the convolution with stride approach often preferred for preserving more spatial information in a neural network.


source

ConvLayer

 ConvLayer (in_channels:int=3, out_channels:int=16, kernel_size:int=3,
            stride:int=2, bias:bool=True, normalization:Optional[Type[torc
            h.nn.modules.module.Module]]=<class
            'torch.nn.modules.batchnorm.BatchNorm2d'>, activation:Optional
            [Type[torch.nn.modules.module.Module]]=<class
            'torch.nn.modules.activation.ReLU'>)

*A 2D convolutional layer with optional batch normalization and activation.

This layer performs 2D convolution with stride 2 for downsampling, optionally followed by batch normalization and activation.*

Type Default Details
in_channels int 3 input channels
out_channels int 16 output channels
kernel_size int 3 kernel size
stride int 2 stride
bias bool True If True, adds a learnable bias to the convolution
normalization Optional BatchNorm2d Normalization layer to use after convolution
activation Optional ReLU Activation function to use after normalization

source

ConvLayer.forward

 ConvLayer.forward (x:torch.Tensor)

forward method of the ConvLayer

Type Details
x Tensor input image tensor of dimension (B, C, W, H)
Returns Tensor output image tensor of dimension (B, C, W/2, H/2)

Usage

B, C, H, W = 64, 1, 28, 28
X = torch.rand(B, C, H,W)
# stride 2 layer downsample to (W/2, H/2)
net = ConvLayer(
    in_channels=C,
    out_channels=16,
    kernel_size=3,
    stride=2,
    normalization=None,
    )

print("Y: ",net(X).shape)
# # flatten all dims except batch dim 1
Y = torch.flatten(net(X), 1)
print(Y.shape)
summary(net, input_size=(B, C, H, W), depth=4)
Y:  torch.Size([64, 16, 14, 14])
torch.Size([64, 3136])
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
ConvLayer                                [64, 16, 14, 14]          --
├─Sequential: 1-1                        [64, 16, 14, 14]          --
│    └─Conv2d: 2-1                       [64, 16, 14, 14]          160
│    └─ReLU: 2-2                         [64, 16, 14, 14]          --
==========================================================================================
Total params: 160
Trainable params: 160
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 2.01
==========================================================================================
Input size (MB): 0.20
Forward/backward pass size (MB): 1.61
Params size (MB): 0.00
Estimated Total Size (MB): 1.81
==========================================================================================

Configs

cfg = OmegaConf.load('../config/model/image/convlayer.yaml')
net = instantiate(cfg.defaults)
B, C, H, W = 64, 1, 28, 28
X = torch.rand(B, C, H,W)
print(summary(net))
print("Y: ",net(X).shape)
[16:42:19] WARNING - setting conv bias to False as Batchnorm is used
=================================================================
Layer (type:depth-idx)                   Param #
=================================================================
ConvLayer                                --
├─Sequential: 1-1                        --
│    └─Conv2d: 2-1                       144
│    └─BatchNorm2d: 2-2                  32
│    └─ReLU: 2-3                         --
=================================================================
Total params: 176
Trainable params: 176
Non-trainable params: 0
=================================================================
Y:  torch.Size([64, 16, 14, 14])

Deconv Layer


source

DeconvLayer

 DeconvLayer (in_channels:int=16, out_channels:int=3, kernel_size:int=3,
              bias:bool=True, normalization:Optional[Type[torch.nn.modules
              .module.Module]]=None, activation:Optional[Type[torch.nn.mod
              ules.module.Module]]=<class
              'torch.nn.modules.activation.ReLU'>)

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
Type Default Details
in_channels int 16 input channels
out_channels int 3 output channels
kernel_size int 3 kernel size
bias bool True
normalization Optional None
activation Optional ReLU

source

ConvLayer

 ConvLayer (in_channels:int=3, out_channels:int=16, kernel_size:int=3,
            stride:int=2, bias:bool=True, normalization:Optional[Type[torc
            h.nn.modules.module.Module]]=<class
            'torch.nn.modules.batchnorm.BatchNorm2d'>, activation:Optional
            [Type[torch.nn.modules.module.Module]]=<class
            'torch.nn.modules.activation.ReLU'>)

*A 2D convolutional layer with optional batch normalization and activation.

This layer performs 2D convolution with stride 2 for downsampling, optionally followed by batch normalization and activation.*

Type Default Details
in_channels int 3 input channels
out_channels int 16 output channels
kernel_size int 3 kernel size
stride int 2 stride
bias bool True If True, adds a learnable bias to the convolution
normalization Optional BatchNorm2d Normalization layer to use after convolution
activation Optional ReLU Activation function to use after normalization

Usage

B, C, H, W = 64, 3, 28, 28
X = torch.rand(B, C, H, W)
deconv = DeconvLayer(3, 8)
print(deconv)
print("Y: ",deconv(X).shape)
DeconvLayer(
  (_net): Sequential(
    (0): UpsamplingNearest2d(scale_factor=2.0, mode='nearest')
    (1): Conv2d(3, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (2): ReLU()
  )
)
Y:  torch.Size([64, 8, 56, 56])

ConvNet

Simple convolution network for image recognition


source

ConvNet

 ConvNet (n_features:List[int]=[1, 8, 16, 32, 64], num_classes:int=10,
          kernel_size:int=3, bias:bool=False,
          normalization:torch.nn.modules.module.Module=<class
          'torch.nn.modules.batchnorm.BatchNorm2d'>,
          activation:torch.nn.modules.module.Module=<class
          'torch.nn.modules.activation.ReLU'>)

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
Type Default Details
n_features List [1, 8, 16, 32, 64] channel/feature expansion
num_classes int 10 num_classes
kernel_size int 3 kernel size
bias bool False conv2d bias
normalization Module BatchNorm2d normalization (before activation)
activation Module ReLU activation function

source

ConvLayer

 ConvLayer (in_channels:int=3, out_channels:int=16, kernel_size:int=3,
            stride:int=2, bias:bool=True, normalization:Optional[Type[torc
            h.nn.modules.module.Module]]=<class
            'torch.nn.modules.batchnorm.BatchNorm2d'>, activation:Optional
            [Type[torch.nn.modules.module.Module]]=<class
            'torch.nn.modules.activation.ReLU'>)

*A 2D convolutional layer with optional batch normalization and activation.

This layer performs 2D convolution with stride 2 for downsampling, optionally followed by batch normalization and activation.*

Type Default Details
in_channels int 3 input channels
out_channels int 16 output channels
kernel_size int 3 kernel size
stride int 2 stride
bias bool True If True, adds a learnable bias to the convolution
normalization Optional BatchNorm2d Normalization layer to use after convolution
activation Optional ReLU Activation function to use after normalization

Usage

B, C, H, W = 64, 1, 28, 28
X = torch.rand(B, C, H, W)
X.shape
torch.Size([64, 1, 28, 28])
# model instantiation
convnet = ConvNet(
            n_features=[1, 8, 16, 32, 64, 128], # channel/feature expansion
            num_classes=10, # num_classes
            kernel_size=3, # kernel size
            bias=False, # conv2d bias
            normalization=nn.BatchNorm2d, # normalization (before activation)
            activation=nn.ReLU,
)
out = convnet(X)
print(out.shape)
print(summary(convnet, depth=4))
# from config
cfg = OmegaConf.load('../config/model/image/convnet.yaml')
# print(cfg.defaults)
# convnet = instantiate(cfg.defaults)
print(cfg.batchnorm)
convnet = instantiate(cfg.baseline)

# print(convnet(X).shape)
torch.Size([64, 10])
=================================================================
Layer (type:depth-idx)                   Param #
=================================================================
ConvNet                                  --
├─Sequential: 1-1                        --
│    └─ConvLayer: 2-1                    --
│    │    └─Sequential: 3-1              --
│    │    │    └─Conv2d: 4-1             72
│    │    │    └─BatchNorm2d: 4-2        16
│    │    │    └─ReLU: 4-3               --
│    └─ConvLayer: 2-2                    --
│    │    └─Sequential: 3-2              --
│    │    │    └─Conv2d: 4-4             1,152
│    │    │    └─BatchNorm2d: 4-5        32
│    │    │    └─ReLU: 4-6               --
│    └─ConvLayer: 2-3                    --
│    │    └─Sequential: 3-3              --
│    │    │    └─Conv2d: 4-7             4,608
│    │    │    └─BatchNorm2d: 4-8        64
│    │    │    └─ReLU: 4-9               --
│    └─ConvLayer: 2-4                    --
│    │    └─Sequential: 3-4              --
│    │    │    └─Conv2d: 4-10            18,432
│    │    │    └─BatchNorm2d: 4-11       128
│    │    │    └─ReLU: 4-12              --
│    └─ConvLayer: 2-5                    --
│    │    └─Sequential: 3-5              --
│    │    │    └─Conv2d: 4-13            73,728
│    │    │    └─BatchNorm2d: 4-14       256
│    │    │    └─ReLU: 4-15              --
│    └─ConvLayer: 2-6                    --
│    │    └─Sequential: 3-6              --
│    │    │    └─Conv2d: 4-16            11,530
│    └─Flatten: 2-7                      --
=================================================================
Total params: 110,018
Trainable params: 110,018
Non-trainable params: 0
=================================================================
{'_target_': 'nimrod.models.conv.ConvNet', 'n_features': [1, 8, 16, 32, 64, 128], 'num_classes': 10, 'kernel_size': 3, 'bias': False, 'normalization': {'_target_': 'hydra.utils.get_class', 'path': 'torch.nn.BatchNorm2d'}, 'activation': {'_target_': 'hydra.utils.get_class', 'path': 'torch.nn.ReLU'}}

Training

Dataloaders

# data module config
cfg = OmegaConf.load('../config/data/image/fashion_mnist.yaml')

BATCH_SIZE = 512
datamodule = instantiate(cfg, batch_size=BATCH_SIZE)
datamodule.prepare_data()
datamodule.setup()

# one data point 
X,y = datamodule.test_ds[0]
print("X (C,H,W): ", X.shape, "y: ", y)

# a batch of data via dataloader
XX,YY = next(iter(datamodule.test_dataloader()))
print("XX (B,C,H,W): ", XX.shape, "YY: ", YY.shape)

print(len(datamodule.train_ds))
print(len(datamodule.train_ds) // BATCH_SIZE)
[16:58:01] INFO - Init ImageDataModule for fashion_mnist
[16:58:15] INFO - split train into train/val [0.8, 0.2]
[16:58:15] INFO - train: 48000 val: 12000, test: 10000
X (C,H,W):  torch.Size([1, 32, 32]) y:  9
XX (B,C,H,W):  torch.Size([512, 1, 32, 32]) YY:  torch.Size([512])
48000
93

Model & hardware

device = get_device()
print(device)
cfg = OmegaConf.load('../config/model/image/convnet.yaml')
# print(cfg.defaults)
# convnet = instantiate(cfg.defaults)
print(cfg.baseline)
convnet = instantiate(cfg.baseline)
model = convnet.to(device)

summary(model, input_size=(B, C, H, W), depth=4)
[16:58:15] INFO - Using device: cuda
cuda
{'_target_': 'nimrod.models.conv.ConvNet', 'n_features': [1, 8, 16, 32, 64], 'num_classes': 10, 'kernel_size': 3, 'bias': True, 'normalization': None, 'activation': {'_target_': 'hydra.utils.get_class', 'path': 'torch.nn.ReLU'}}
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
ConvNet                                  [64, 10]                  --
├─Sequential: 1-1                        [64, 10]                  --
│    └─ConvLayer: 2-1                    [64, 8, 14, 14]           --
│    │    └─Sequential: 3-1              [64, 8, 14, 14]           --
│    │    │    └─Conv2d: 4-1             [64, 8, 14, 14]           80
│    │    │    └─ReLU: 4-2               [64, 8, 14, 14]           --
│    └─ConvLayer: 2-2                    [64, 16, 7, 7]            --
│    │    └─Sequential: 3-2              [64, 16, 7, 7]            --
│    │    │    └─Conv2d: 4-3             [64, 16, 7, 7]            1,168
│    │    │    └─ReLU: 4-4               [64, 16, 7, 7]            --
│    └─ConvLayer: 2-3                    [64, 32, 4, 4]            --
│    │    └─Sequential: 3-3              [64, 32, 4, 4]            --
│    │    │    └─Conv2d: 4-5             [64, 32, 4, 4]            4,640
│    │    │    └─ReLU: 4-6               [64, 32, 4, 4]            --
│    └─ConvLayer: 2-4                    [64, 64, 2, 2]            --
│    │    └─Sequential: 3-4              [64, 64, 2, 2]            --
│    │    │    └─Conv2d: 4-7             [64, 64, 2, 2]            18,496
│    │    │    └─ReLU: 4-8               [64, 64, 2, 2]            --
│    └─ConvLayer: 2-5                    [64, 10, 1, 1]            --
│    │    └─Sequential: 3-5              [64, 10, 1, 1]            --
│    │    │    └─Conv2d: 4-9             [64, 10, 1, 1]            5,770
│    └─Flatten: 2-6                      [64, 10]                  --
==========================================================================================
Total params: 30,154
Trainable params: 30,154
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 14.52
==========================================================================================
Input size (MB): 0.20
Forward/backward pass size (MB): 1.60
Params size (MB): 0.12
Estimated Total Size (MB): 1.92
==========================================================================================

LR finder

cfg = OmegaConf.load('../config/model/image/convnet.yaml')
model = instantiate(cfg.batchnorm)
print(summary(model, depth=4))


criterion = nn.CrossEntropyLoss()    
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4) #, weight_decay=1e-5)
    
# Initialize LR Finder
lr_finder = LRFinder(model, optimizer, criterion, device=device)
    
# Run LR range test
lr_finder.range_test(
    datamodule.train_dataloader(),
    start_lr=1e-5,      # Extremely small starting learning rate
    end_lr=10,          # Large ending learning rate
    num_iter=100,   # Number of iterations to test
    smooth_f=0.05,   # Smoothing factor for the loss
    diverge_th=5, 
)
    
# Plot the learning rate vs loss
_, lr_found = lr_finder.plot(log_lr=True)
print('Suggested lr:', lr_found)
    
lr_finder.reset()
=================================================================
Layer (type:depth-idx)                   Param #
=================================================================
ConvNet                                  --
├─Sequential: 1-1                        --
│    └─ConvLayer: 2-1                    --
│    │    └─Sequential: 3-1              --
│    │    │    └─Conv2d: 4-1             72
│    │    │    └─BatchNorm2d: 4-2        16
│    │    │    └─ReLU: 4-3               --
│    └─ConvLayer: 2-2                    --
│    │    └─Sequential: 3-2              --
│    │    │    └─Conv2d: 4-4             1,152
│    │    │    └─BatchNorm2d: 4-5        32
│    │    │    └─ReLU: 4-6               --
│    └─ConvLayer: 2-3                    --
│    │    └─Sequential: 3-3              --
│    │    │    └─Conv2d: 4-7             4,608
│    │    │    └─BatchNorm2d: 4-8        64
│    │    │    └─ReLU: 4-9               --
│    └─ConvLayer: 2-4                    --
│    │    └─Sequential: 3-4              --
│    │    │    └─Conv2d: 4-10            18,432
│    │    │    └─BatchNorm2d: 4-11       128
│    │    │    └─ReLU: 4-12              --
│    └─ConvLayer: 2-5                    --
│    │    └─Sequential: 3-5              --
│    │    │    └─Conv2d: 4-13            73,728
│    │    │    └─BatchNorm2d: 4-14       256
│    │    │    └─ReLU: 4-15              --
│    └─ConvLayer: 2-6                    --
│    │    └─Sequential: 3-6              --
│    │    │    └─Conv2d: 4-16            11,530
│    └─Flatten: 2-7                      --
=================================================================
Total params: 110,018
Trainable params: 110,018
Non-trainable params: 0
=================================================================
Stopping early, the loss has diverged
Learning rate search finished. See the graph with {finder_name}.plot()
LR suggestion: steepest gradient
Suggested LR: 2.66E-03

Suggested lr: 0.002656087782946686

1-cycle warm-up

# data module config
cfg_dm = OmegaConf.load('../config/data/image/fashion_mnist.yaml')
cfg_dm.batch_size = 512
datamodule = instantiate(cfg_dm)
datamodule.prepare_data()
datamodule.setup()

# device = 'cpu'
print(device)
cfg_mdl = OmegaConf.load('../config/model/image/convnet.yaml')
convnet = instantiate(cfg_mdl.batchnorm)
model = convnet.to(device)

N_EPOCHS = 5

# lr_found = 7e-3 # from lr finder

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
steps_per_epoch = len(datamodule.train_ds) // cfg_dm.batch_size
total_steps = steps_per_epoch* N_EPOCHS
print(f"size training set: {len(datamodule.train_ds)}, bs: {cfg_dm.batch_size}, steps/epoch: {steps_per_epoch}, total steps: {total_steps}")
# scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.01, steps_per_epoch=steps_per_epochs, epochs=1)

scheduler = torch.optim.lr_scheduler.OneCycleLR(
        optimizer,
        max_lr=lr_found,  # Peak learning rate
        # total_steps=len(datamodule.train_ds) * N_EPOCHS,  # Total training iterations
        steps_per_epoch=steps_per_epoch,
        epochs=N_EPOCHS,
        pct_start=0.3,  # 30% of training increasing LR, 70% decreasing
        anneal_strategy='cos',  # Cosine annealing
        div_factor=10,  # Initial lr = max_lr / div_factor
        # final_div_factor=1e4,
        three_phase=False  # Two phase LR schedule (increase then decrease)
    )

################################


lrs = []
current_step = 0
train_loss_history = []
eval_loss_history = []
avg_train_loss_hist = []
avg_eval_loss_hist = []
max_acc = 0

for epoch in range(N_EPOCHS):
    i = 0
    model.train()
    for images, labels in datamodule.train_dataloader():
        if current_step >= total_steps:
            print(f"Reached total steps: {current_step}/{total_steps}")
            break
        optimizer.zero_grad()
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        loss = criterion(outputs, labels)        
        loss.backward()
        optimizer.step()
        scheduler.step()    
        current_step += 1
        train_loss_history.append(loss.item())
        # current_lr = scheduler.get_last_lr()[0]
        current_lr = optimizer.param_groups[0]['lr']
        lrs.append(current_lr)
        if not (i % 100):
            print(f"Loss {loss.item():.4f}, Current LR: {current_lr:.10f}, Step: {current_step}/{total_steps}")
        i += 1

    model.eval()
    with torch.no_grad():
        correct = 0
        total = 0
        for images, labels in datamodule.val_dataloader():
            # model expects input (B,H*W)
            images = images.to(device)
            labels = labels.to(device)
            # Pass the input through the model
            outputs = model(images)
            # eval loss
            eval_loss = criterion(outputs, labels)
            eval_loss_history.append(eval_loss.item())
            # Get the predicted labels
            _, predicted = torch.max(outputs.data, 1)

            # Update the total and correct counts
            total += labels.size(0)
            correct += (predicted == labels).sum()
            acc = 100 * correct / total
            if acc > max_acc:
                max_acc = acc

        # Print the accuracy
    print(f"Epoch {epoch + 1}: Last training Loss {loss.item():.4f}, Last Eval loss {eval_loss.item():.4f} Accuracy = {100 * correct / total:.2f}% Best Accuracy: {max_acc:.2f}")
    # print(f'Current LR: {optimizer.param_groups[0]["lr"]:.5f}')

###################
plt.figure(1)
plt.subplot(211)
plt.ylabel('loss')
plt.xlabel('step')
plt.plot(train_loss_history)
plt.plot(eval_loss_history)
plt.subplot(212)
plt.ylabel('lr')
plt.xlabel('step')
plt.plot(lrs)
[22:24:51] INFO - Init ImageDataModule for fashion_mnist
[22:24:59] INFO - split train into train/val [0.8, 0.2]
[22:24:59] INFO - train: 48000 val: 12000, test: 10000
cuda
size training set: 48000, bs: 512, steps/epoch: 93, total steps: 465
CPU times: user 2 μs, sys: 0 ns, total: 2 μs
Wall time: 5.01 μs
Loss 2.3470, Current LR: 0.0002659163, Step: 1/465
Epoch 1: Last training Loss 0.3637, Last Eval loss 0.3208 Accuracy = 86.98% Best Accuracy: 89.84
Loss 0.3831, Current LR: 0.0021199485, Step: 95/465
Epoch 2: Last training Loss 0.2433, Last Eval loss 0.2295 Accuracy = 89.87% Best Accuracy: 91.80
Loss 0.2716, Current LR: 0.0025014400, Step: 189/465
Epoch 3: Last training Loss 0.2054, Last Eval loss 0.1951 Accuracy = 90.92% Best Accuracy: 92.19
Loss 0.1860, Current LR: 0.0015607708, Step: 283/465
Epoch 4: Last training Loss 0.1976, Last Eval loss 0.1941 Accuracy = 91.68% Best Accuracy: 92.77
Loss 0.1703, Current LR: 0.0004413380, Step: 377/465
Reached total steps: 465/465
Epoch 5: Last training Loss 0.1558, Last Eval loss 0.1827 Accuracy = 91.88% Best Accuracy: 93.03

ConvNetX


source

ConvNetX

 ConvNetX (nnet:__main__.ConvNet, num_classes:int,
           optimizer:torch.optim.optimizer.Optimizer, scheduler:<module'to
           rch.optim.lr_scheduler'from'/opt/hostedtoolcache/Python/3.10.16
           /x64/lib/python3.10/site-
           packages/torch/optim/lr_scheduler.py'>)

Helper class that provides a standard way to create an ABC using inheritance.

Usage

cfg = OmegaConf.load('../config/model/image/convnetx.yaml')
model = instantiate(cfg)
[13:32:35] INFO - ConvNetX: init
[13:32:35] INFO - Classifier: init
B, C, H, W = 64, 1, 28, 28
X = torch.rand(B, C, H, W)
X.shape
print(model(X).shape)
torch.Size([64, 10])
summary(model, input_size=(B, C, H, W), depth=5)
===============================================================================================
Layer (type:depth-idx)                        Output Shape              Param #
===============================================================================================
ConvNetX                                      [64, 10]                  --
├─ConvNet: 1-1                                [64, 10]                  --
│    └─Sequential: 2-1                        [64, 10]                  --
│    │    └─ConvLayer: 3-1                    [64, 8, 14, 14]           --
│    │    │    └─Sequential: 4-1              [64, 8, 14, 14]           --
│    │    │    │    └─Conv2d: 5-1             [64, 8, 14, 14]           72
│    │    │    │    └─BatchNorm2d: 5-2        [64, 8, 14, 14]           16
│    │    │    │    └─ReLU: 5-3               [64, 8, 14, 14]           --
│    │    └─ConvLayer: 3-2                    [64, 16, 7, 7]            --
│    │    │    └─Sequential: 4-2              [64, 16, 7, 7]            --
│    │    │    │    └─Conv2d: 5-4             [64, 16, 7, 7]            1,152
│    │    │    │    └─BatchNorm2d: 5-5        [64, 16, 7, 7]            32
│    │    │    │    └─ReLU: 5-6               [64, 16, 7, 7]            --
│    │    └─ConvLayer: 3-3                    [64, 32, 4, 4]            --
│    │    │    └─Sequential: 4-3              [64, 32, 4, 4]            --
│    │    │    │    └─Conv2d: 5-7             [64, 32, 4, 4]            4,608
│    │    │    │    └─BatchNorm2d: 5-8        [64, 32, 4, 4]            64
│    │    │    │    └─ReLU: 5-9               [64, 32, 4, 4]            --
│    │    └─ConvLayer: 3-4                    [64, 64, 2, 2]            --
│    │    │    └─Sequential: 4-4              [64, 64, 2, 2]            --
│    │    │    │    └─Conv2d: 5-10            [64, 64, 2, 2]            18,432
│    │    │    │    └─BatchNorm2d: 5-11       [64, 64, 2, 2]            128
│    │    │    │    └─ReLU: 5-12              [64, 64, 2, 2]            --
│    │    └─ConvLayer: 3-5                    [64, 10, 1, 1]            --
│    │    │    └─Sequential: 4-5              [64, 10, 1, 1]            --
│    │    │    │    └─Conv2d: 5-13            [64, 10, 1, 1]            5,770
│    │    └─Flatten: 3-6                      [64, 10]                  --
===============================================================================================
Total params: 30,274
Trainable params: 30,274
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 14.34
===============================================================================================
Input size (MB): 0.20
Forward/backward pass size (MB): 3.20
Params size (MB): 0.12
Estimated Total Size (MB): 3.52
===============================================================================================

Nimrod training

# data module config
cfg = OmegaConf.load('../config/data/image/mnist.yaml')
cfg.batch_size = 512
cfg.num_workers = 0
datamodule = instantiate(cfg)
datamodule.prepare_data()
datamodule.setup()
[18:25:38] INFO - Init ImageDataModule for mnist
[18:25:54] INFO - split train into train/val [0.8, 0.2]
[18:25:54] INFO - train: 48000 val: 12000, test: 10000
N_EPOCHS = 5

trainer = Trainer(
    accelerator="auto",
    max_epochs=N_EPOCHS,
    logger=TensorBoardLogger("tb_logs", name="mnist_convnet", default_hp_metric=True),
    # logger=CSVLogger("logs", name="mnist_convnet"),
    callbacks = [LearningRateMonitor(logging_interval="step")],
    check_val_every_n_epoch=1,
    log_every_n_steps=1
    )
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs

LR finder

tuner = Tuner(trainer)
lr_finder = tuner.lr_find(
    model,
    datamodule=datamodule,
    min_lr=1e-6,
    max_lr=1.0,
    num_training=100,  # number of iterations
    # attr_name="optimizer.lr",
)
fig = lr_finder.plot(suggest=True)
plt.show()
print(f"Suggested learning rate: {lr_finder.suggestion()}")
[13:35:46] INFO - mnist Dataset: init
[13:35:51] INFO - mnist Dataset: init
[13:35:54] INFO - Optimizer: Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.0001
    maximize: False
    weight_decay: 0
)
[13:35:54] INFO - Scheduler: <torch.optim.lr_scheduler.OneCycleLR object>
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/core/optimizer.py:316: The lr scheduler dict contains the key(s) ['monitor', 'strict'], but the keys will be ignored. You need to call `lr_scheduler.step()` manually in manual optimization.
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
`Trainer.fit` stopped: `max_steps=100` reached.
Learning rate set to 0.012022644346174135
Restoring states from the checkpoint path at /Users/slegroux/Projects/nimrod/nbs/.lr_find_c5598ba7-33b0-4845-a7c2-c94e6a45fa15.ckpt
Restored all states from the checkpoint at /Users/slegroux/Projects/nimrod/nbs/.lr_find_c5598ba7-33b0-4845-a7c2-c94e6a45fa15.ckpt

Suggested learning rate: 0.012022644346174135
print(trainer.max_epochs, len(datamodule.train_ds), datamodule.hparams.batch_size)
print(5*56000)
print(5*56000/2048)
print(5*56000//2048)
10 48000 512
280000
136.71875
136

1-cycle scheduling

N_EPOCHS = 5
lr_found = 0.012

# DATA
cfg = OmegaConf.load('../config/data/image/mnist.yaml')
cfg.batch_size = 512
cfg.num_workers = 0
datamodule = instantiate(cfg)
datamodule.prepare_data()
datamodule.setup()

checkpoint_callback = ModelCheckpoint(
    monitor='val/loss',  # Metric to monitor
    dirpath='checkpoints/',  # Directory to save checkpoints
    filename='epoch{epoch:02d}-val_loss{val/loss:.2f}',
    auto_insert_metric_name=False,
    save_top_k=1,  # Save only the best checkpoint
    mode='min'  # Mode can be 'min' or 'max' depending on the metric
)

# TRAINER 
trainer = Trainer(
    accelerator="auto",
    max_epochs=N_EPOCHS,
    # logger=TensorBoardLogger("tb_logs", name="mnist_convnet", default_hp_metric=True),
    logger=CSVLogger("logs", name="mnist_convnet"),
    callbacks = [LearningRateMonitor(logging_interval="step"), checkpoint_callback],
    check_val_every_n_epoch=1,
    log_every_n_steps=1
    )

print("estimated steps: ", trainer.estimated_stepping_batches, "accumulate_grad_batches: ", trainer.accumulate_grad_batches)

# MODEL
model_cfg = OmegaConf.load('../config/model/image/convnetx.yaml')

steps_per_epoch = len(datamodule.train_ds) // cfg.batch_size // trainer.accumulate_grad_batches #accumulate = 1 when not on
print("Steps per epoch: ", steps_per_epoch)

# model_cfg.scheduler.epochs = N_EPOCHS 
model_cfg.scheduler.total_steps = trainer.max_epochs * steps_per_epoch
model_cfg.scheduler.max_lr = lr_found#lr_finder.suggestion()

model = instantiate(model_cfg)

print("LR: ",model.lr)
trainer.fit(model, datamodule.train_dataloader(), datamodule.val_dataloader())

########################
csv_path = f"{trainer.logger.log_dir}/metrics.csv"
metrics = pd.read_csv(csv_path)
metrics.head()

##########################
plt.figure()
plt.plot(metrics['step'], metrics['train/loss_step'], 'b.-')
plt.plot(metrics['step'], metrics['val/loss'],'r.-')
plt.figure()
plt.plot(metrics['step'], metrics['lr-Adam'], 'g.-')
plt.show()
[18:22:41] INFO - Init ImageDataModule for mnist
[18:22:41] INFO - mnist Dataset: init
[18:22:46] INFO - mnist Dataset: init
[18:22:49] INFO - split train into train/val [0.8, 0.2]
[18:22:49] INFO - train: 48000 val: 12000, test: 10000
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Loading `train_dataloader` to estimate number of stepping batches.
[18:22:49] INFO - ConvNetX: init
[18:22:49] INFO - Classifier: init
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/utilities/parsing.py:208: Attribute 'nnet' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['nnet'])`.
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:654: Checkpoint directory /Users/slegroux/Projects/nimrod/nbs/checkpoints exists and is not empty.
[18:22:49] INFO - Optimizer: Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.0001
    maximize: False
    weight_decay: 0
)
[18:22:49] INFO - Scheduler: <torch.optim.lr_scheduler.OneCycleLR object>
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/core/optimizer.py:316: The lr scheduler dict contains the key(s) ['monitor', 'strict'], but the keys will be ignored. You need to call `lr_scheduler.step()` manually in manual optimization.

  | Name         | Type               | Params | Mode 
------------------------------------------------------------
0 | loss         | CrossEntropyLoss   | 0      | train
1 | train_acc    | MulticlassAccuracy | 0      | train
2 | val_acc      | MulticlassAccuracy | 0      | train
3 | test_acc     | MulticlassAccuracy | 0      | train
4 | train_loss   | MeanMetric         | 0      | train
5 | val_loss     | MeanMetric         | 0      | train
6 | test_loss    | MeanMetric         | 0      | train
7 | val_acc_best | MaxMetric          | 0      | train
8 | nnet         | ConvNet            | 30.3 K | train
------------------------------------------------------------
30.3 K    Trainable params
0         Non-trainable params
30.3 K    Total params
0.121     Total estimated model params size (MB)
34        Modules in train mode
0         Modules in eval mode
estimated steps:  -1 accumulate_grad_batches:  1
Steps per epoch:  93
LR:  0.0001
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
[18:23:08] WARNING - Max steps reached for 1-cycle LR scheduler
[18:23:08] WARNING - Max steps reached for 1-cycle LR scheduler
[18:23:08] WARNING - Max steps reached for 1-cycle LR scheduler
[18:23:08] WARNING - Max steps reached for 1-cycle LR scheduler
[18:23:08] WARNING - Max steps reached for 1-cycle LR scheduler
`Trainer.fit` stopped: `max_epochs=5` reached.

trainer.test(model, datamodule.test_dataloader(), ckpt_path="best")
Restoring states from the checkpoint path at /Users/slegroux/Projects/nimrod/nbs/checkpoints/epoch04-val_loss0.04.ckpt
Loaded model weights from the checkpoint at /Users/slegroux/Projects/nimrod/nbs/checkpoints/epoch04-val_loss0.04.ckpt
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric               DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         test/acc              0.9897000193595886     │
│         test/loss            0.029618283733725548    │
└───────────────────────────┴───────────────────────────┘
[{'test/loss': 0.029618283733725548, 'test/acc': 0.9897000193595886}]
best_checkpoint_path = checkpoint_callback.best_model_path
print(f"Best checkpoint path: {best_checkpoint_path}")
Best checkpoint path: /Users/slegroux/Projects/nimrod/nbs/checkpoints/epoch04-val_loss0.04.ckpt

Resume training

cfg = OmegaConf.load('../config/model/image/convnetx_adam.yaml')
sched = instantiate(cfg.scheduler)
model = ConvNetX.load_from_checkpoint(best_checkpoint_path, lr=0.1, scheduler=sched)

pprint(model.hparams)
[19:12:38] INFO - ConvNetX: init
[19:12:38] INFO - Classifier: init
"nnet":        ConvNet(
  (net): Sequential(
    (0): ConvLayer(
      (net): Sequential(
        (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU()
      )
    )
    (1): ConvLayer(
      (net): Sequential(
        (0): Conv2d(8, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU()
      )
    )
    (2): ConvLayer(
      (net): Sequential(
        (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU()
      )
    )
    (3): ConvLayer(
      (net): Sequential(
        (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU()
      )
    )
    (4): ConvLayer(
      (net): Sequential(
        (0): Conv2d(64, 10, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      )
    )
    (5): Flatten(start_dim=1, end_dim=-1)
  )
)
"num_classes": 10
"optimizer":   functools.partial(<class 'torch.optim.adam.Adam'>, lr=0.0001)
"scheduler":   functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, mode='min', factor=0.1, patience=5)
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/utilities/parsing.py:208: Attribute 'nnet' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['nnet'])`.
# batchnorm should allow us to try higher LR
# model_cfg = OmegaConf.load('../config/model/image/convnetx_adam.yaml')
# model_cfg.optimizer.lr = 0.1

# model = instantiate(model_cfg)
# opt = instantiate(model_cfg.optimizer)
# print(opt)
# sched = instantiate(model_cfg.scheduler)
# print(sched)


N_EPOCHS = 10

trainer = Trainer(
    accelerator="auto",
    max_epochs=N_EPOCHS,
    # logger=TensorBoardLogger("tb_logs", name="mnist_convnet", default_hp_metric=True),
    logger=CSVLogger("logs", name="mnist_convnet"),
    callbacks = [LearningRateMonitor(logging_interval="step")],
    check_val_every_n_epoch=1,
    log_every_n_steps=1
    )

# use standar adam scheduler

# retrieve last ckpt
trainer.fit(model, datamodule.train_dataloader(), datamodule.val_dataloader(), ckpt_path=best_checkpoint_path)

##############################
#| notest
csv_path = f"{trainer.logger.log_dir}/metrics.csv"
metrics = pd.read_csv(csv_path)
metrics.head()
plt.figure()
plt.plot(metrics['step'], metrics['train/loss_step'], 'b.-')
plt.plot(metrics['step'], metrics['val/loss'],'r.-')
plt.figure()
plt.plot(metrics['step'], metrics['lr-Adam'], 'g.-')
plt.show()
trainer.test(model, datamodule.test_dataloader(), ckpt_path="best")
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Restoring states from the checkpoint path at /Users/slegroux/Projects/nimrod/nbs/checkpoints/epoch04-val_loss0.04.ckpt
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/call.py:273: Be aware that when using `ckpt_path`, callbacks used to create the checkpoint need to be provided during `Trainer` instantiation. Please add the following callbacks: ["ModelCheckpoint{'monitor': 'val/loss', 'mode': 'min', 'every_n_train_steps': 0, 'every_n_epochs': 1, 'train_time_interval': None}"].
[19:13:20] INFO - Optimizer: Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.0001
    maximize: False
    weight_decay: 0
)
[19:13:20] INFO - Scheduler: <torch.optim.lr_scheduler.ReduceLROnPlateau object>
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/core/optimizer.py:316: The lr scheduler dict contains the key(s) ['monitor', 'strict'], but the keys will be ignored. You need to call `lr_scheduler.step()` manually in manual optimization.

  | Name         | Type               | Params | Mode 
------------------------------------------------------------
0 | loss         | CrossEntropyLoss   | 0      | train
1 | train_acc    | MulticlassAccuracy | 0      | train
2 | val_acc      | MulticlassAccuracy | 0      | train
3 | test_acc     | MulticlassAccuracy | 0      | train
4 | train_loss   | MeanMetric         | 0      | train
5 | val_loss     | MeanMetric         | 0      | train
6 | test_loss    | MeanMetric         | 0      | train
7 | val_acc_best | MaxMetric          | 0      | train
8 | nnet         | ConvNet            | 30.3 K | train
------------------------------------------------------------
30.3 K    Trainable params
0         Non-trainable params
30.3 K    Total params
0.121     Total estimated model params size (MB)
34        Modules in train mode
0         Modules in eval mode
Restored all states from the checkpoint at /Users/slegroux/Projects/nimrod/nbs/checkpoints/epoch04-val_loss0.04.ckpt
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
`Trainer.fit` stopped: `max_epochs=10` reached.

Restoring states from the checkpoint path at logs/mnist_convnet/version_62/checkpoints/epoch=9-step=935.ckpt
Loaded model weights from the checkpoint at logs/mnist_convnet/version_62/checkpoints/epoch=9-step=935.ckpt
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric               DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         test/acc              0.9896000027656555     │
│         test/loss            0.029625719413161278    │
└───────────────────────────┴───────────────────────────┘