Convolution Neural Networks

Conv filters

cfg = OmegaConf.load('../config/data/image/mnist.yaml')
dm = instantiate(cfg)
[14:49:25] INFO - Init ImageDataModule for mnist
[14:49:29] INFO - loading dataset mnist with args () from split train
[14:49:29] INFO - loading dataset mnist from split train
x, y = dm.train_ds[0]
plt.imshow(x.squeeze(), cmap='gray')
torch.Size([1, 28, 28])

top_kernel = torch.tensor( # torch.tensor infers datatype vs. torch.Tensor
    [[-1., -1., -1.],
     [0., 0., 0.],
     [1., 1., 1.]]

bottom_kernel = torch.tensor( # torch.tensor infers datatype vs. torch.Tensor
    [[1., 1., 1.],
     [0., 0., 0.],
     [-1., -1., -1.]]

left_kernel = torch.tensor( # torch.tensor infers datatype vs. torch.Tensor
    [[-1., 0., 1.],
     [-1., 0., 1.],
     [-1., 0., 1.]]
my_kernel = left_kernel

c = nn.Conv2d(1, 1, kernel_size=3, padding=1, bias=False)
with torch.no_grad():

y = c(x)
plt.imshow(y.squeeze().detach(), cmap='gray')

dc = nn.ConvTranspose2d(1, 1, kernel_size=3, padding=1, bias=False)
# with torch.no_grad():
#     dc.weight.copy_(my_kernel)

x_bar = dc(y)
plt.imshow(x_bar.squeeze().detach(), cmap='gray')
plt.title('Convolution transpose')
Conv Block

Using a convolution with a stride of 2 instead of max pooling essentially achieves the same goal of downsampling an image by reducing its spatial dimensions, but with the key difference that the convolution layer can learn more complex feature combinations from overlapping regions, while max pooling only selects the maximum value within a window, potentially losing information about the finer details within that region; making the convolution with stride approach often preferred for preserving more spatial information in a neural network.



 ConvBlock (in_channels:int=3, out_channels:int=16, kernel_size:int=3,
            stride:int=2, bias:bool=True, normalization:Optional[Type[torc
            'torch.nn.modules.batchnorm.BatchNorm2d'>, activation:Optional

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
Type Default Details
in_channels int 3 input channels
out_channels int 16 output channels
kernel_size int 3 kernel size
stride int 2 stride
bias bool True bias is False if BatchNorm
normalization Optional BatchNorm2d normalization
activation Optional ReLU activation


B, C, H, W = 64, 1, 28, 28
X = torch.rand(B, C, H,W)
# stride 2 layer downsample to (W/2, H/2)
model = ConvBlock(

# get first layer of sequential and init weights
layer_0 =[0] # get first layer of sequential
with torch.no_grad():[0].weight.copy_(top_kernel)

print("Y: ", model(X).shape)
# # flatten all dims except batch dim 1
Y = torch.flatten(model(X), 1)
summary(model, input_size=(B, C, H, W), depth=2)
[13:58:07] WARNING - setting conv bias back to False as Batchnorm is used
Y:  torch.Size([64, 16, 14, 14])
torch.Size([64, 3136])
Layer (type:depth-idx)                   Output Shape              Param #
ConvBlock                                [64, 16, 14, 14]          --
├─Sequential: 1-1                        [64, 16, 14, 14]          --
│    └─Conv2d: 2-1                       [64, 16, 14, 14]          144
│    └─BatchNorm2d: 2-2                  [64, 16, 14, 14]          32
│    └─ReLU: 2-3                         [64, 16, 14, 14]          --
Total params: 176
Trainable params: 176
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 1.81
Input size (MB): 0.20
Forward/backward pass size (MB): 3.21
Params size (MB): 0.00
Estimated Total Size (MB): 3.41
    ConvBlock(1, 8),
    ConvBlock(8, 16),
    ConvBlock(16, 32),
    ConvBlock(32, 16)
[13:58:09] WARNING - setting conv bias back to False as Batchnorm is used
[13:58:09] WARNING - setting conv bias back to False as Batchnorm is used
[13:58:09] WARNING - setting conv bias back to False as Batchnorm is used
[13:58:09] WARNING - setting conv bias back to False as Batchnorm is used
torch.Size([64, 16, 2, 2])


cfg = OmegaConf.load('../config/model/image/convblock.yaml')
net = instantiate(cfg.defaults)
B, C, H, W = 64, 1, 28, 28
X = torch.rand(B, C, H,W)
print("Y: ",net(X).shape)
Seed set to 42
[14:49:51] WARNING - setting conv bias back to False as Batchnorm is used
Layer (type:depth-idx)                   Param #
ConvBlock                                --
├─Sequential: 1-1                        --
│    └─Conv2d: 2-1                       144
│    └─BatchNorm2d: 2-2                  32
│    └─ReLU: 2-3                         --
Total params: 176
Trainable params: 176
Non-trainable params: 0
Y:  torch.Size([64, 16, 14, 14])

Pre-Activation Conv Block



 PreActivationConvBlock (in_channels:int=3, out_channels:int=16,
                         kernel_size:int=3, stride:int=2, bias:bool=True, 
                         'torch.nn.modules.batchnorm.BatchNorm2d'>, activa
                         ]]=<class 'torch.nn.modules.activation.ReLU'>)

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
Type Default Details
in_channels int 3 input channels
out_channels int 16 output channels
kernel_size int 3 kernel size
stride int 2 stride
bias bool True
normalization Optional BatchNorm2d
activation Optional ReLU
B, C, H, W = 64, 1, 28, 28
X = torch.rand(B, C, H,W)
# stride 2 layer downsample to (W/2, H/2)
model = PreActivationConvBlock(

# get last layer of sequential and init weights
layer_0 =[0] # get first layer of sequential
with torch.no_grad():[-1].weight.copy_(top_kernel)

print("Y: ", model(X).shape)
# # flatten all dims except batch dim 1
Y = torch.flatten(model(X), 1)
summary(model, input_size=(B, C, H, W), depth=2)
[13:58:16] WARNING - setting conv bias back to False as Batchnorm is used
Y:  torch.Size([64, 16, 14, 14])
torch.Size([64, 3136])
Layer (type:depth-idx)                   Output Shape              Param #
PreActivationConvBlock                   [64, 16, 14, 14]          --
├─Sequential: 1-1                        [64, 16, 14, 14]          --
│    └─BatchNorm2d: 2-1                  [64, 1, 28, 28]           2
│    └─ReLU: 2-2                         [64, 1, 28, 28]           --
│    └─Conv2d: 2-3                       [64, 16, 14, 14]          144
Total params: 146
Trainable params: 146
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 1.81
Input size (MB): 0.20
Forward/backward pass size (MB): 2.01
Params size (MB): 0.00
Estimated Total Size (MB): 2.21

Deconv Block



 DeconvBlock (in_channels:int=16, out_channels:int=3, kernel_size:int=3,
              bias:bool=True, normalization:Optional[Type[torch.nn.modules
              .module.Module]]=None, activation:Optional[Type[torch.nn.mod
              'torch.nn.modules.activation.ReLU'>, scale_factor:int=2,

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
Type Default Details
in_channels int 16 input channels
out_channels int 3 output channels
kernel_size int 3 kernel size
bias bool True
normalization Optional None
activation Optional ReLU
scale_factor int 2
use_transposed_conv bool False


B, C, H, W = 64, 3, 28, 28
X = torch.rand(B, C, H, W)
deconv = DeconvBlock(3, 8, scale_factor=2, kernel_size=3, use_transposed_conv=True)
print("Y: ",deconv(X).shape)
  (_net): Sequential(
    (0): ConvTranspose2d(3, 8, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
    (1): ReLU()
Y:  torch.Size([64, 8, 56, 56])


# one image
x, y = dm.train_ds[0]
C, H, W = x.shape
# make fake batch dimension
x = x.unsqueeze(0)
print("x:", x.shape)
plt.imshow(x.squeeze(), cmap='gray')
plt.title("original image")

my_kernel = left_kernel

c = ConvBlock(1,3, kernel_size=3, stride=1)
with torch.no_grad():[0].weight.copy_(my_kernel) # set kernel weights for convlayer 0 (actual convolution2d)

y = c(x)
print("y: ", y.shape)

plt.imshow(y.detach().squeeze().numpy().transpose(1, 2, 0), cmap='gray')
plt.title("filtered image")

dc = DeconvBlock(3, 1, scale_factor=2, kernel_size=3)
with torch.no_grad():
    dc._net[1].weight.copy_(my_kernel) # set kernel weights for convlayer 1 (actual convolution2d)
x_bar = dc(y)
print("x_bar: ", x_bar.shape)
plt.imshow(x_bar.detach().squeeze(), cmap='gray')
plt.title("Deconv image")
Simple convolution network for image recognition



 ConvNet (n_features:List[int]=[1, 8, 16, 32, 64, 128],
          num_classes:int=10, kernel_size:int=3, bias:bool=False,

*Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool*
Type Default Details
n_features List [1, 8, 16, 32, 64, 128] channel/feature expansion
num_classes int 10 num_classes
kernel_size int 3 kernel size
bias bool False conv2d bias
normalization Module BatchNorm2d normalization (before activation)
activation Module ReLU activation function


# data
B, C, H, W = 64, 3, 64, 64
X = torch.rand(B, C, H, W)

n_features = [3, 8, 16, 32, 64, 128] #28 14 7 4 2 1
n_features = [3, 8, 16, 32, 64, 128, 64] #64, 32, 16, 8, 4, 2, 1
num_classes = 20

convnet = ConvNet(
    n_features=n_features, # channel/feature expansion
    num_classes=num_classes, # num_classes
    kernel_size=3, # kernel size
    bias=False, # conv2d bias
    normalization=nn.BatchNorm2d, # normalization (before activation)
out = convnet(X)
print(summary(convnet, input_size=(X.shape), depth=2))
torch.Size([64, 20])
Layer (type:depth-idx)                   Output Shape              Param #
ConvNet                                  [64, 20]                  --
├─Sequential: 1-1                        [64, 20]                  --
│    └─ConvBlock: 2-1                    [64, 8, 64, 64]           232
│    └─ConvBlock: 2-2                    [64, 16, 32, 32]          1,184
│    └─ConvBlock: 2-3                    [64, 32, 16, 16]          4,672
│    └─ConvBlock: 2-4                    [64, 64, 8, 8]            18,560
│    └─ConvBlock: 2-5                    [64, 128, 4, 4]           73,984
│    └─ConvBlock: 2-6                    [64, 64, 2, 2]            73,856
│    └─ConvBlock: 2-7                    [64, 20, 1, 1]            11,560
│    └─Flatten: 2-8                      [64, 20]                  --
Total params: 184,048
Trainable params: 184,048
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 378.27
Input size (MB): 3.15
Forward/backward pass size (MB): 65.29
Params size (MB): 0.74
Estimated Total Size (MB): 69.18
# from config
cfg = OmegaConf.load('../config/model/image/convnet.yaml')
# print(cfg.defaults)
# convnet = instantiate(cfg.defaults)
convnet = instantiate(cfg.baseline)

# print(convnet(X).shape)
{'_target_': 'nimrod.models.conv.ConvNet', 'n_features': [1, 8, 16, 32, 64, 128], 'num_classes': 10, 'kernel_size': 3, 'bias': False, 'normalization': {'_target_': 'hydra.utils.get_class', 'path': 'torch.nn.BatchNorm2d'}, 'activation': {'_target_': 'hydra.utils.get_class', 'path': 'torch.nn.ReLU'}}



# data module config
cfg = OmegaConf.load('../config/data/image/fashion_mnist.yaml')

datamodule = instantiate(cfg, batch_size=BATCH_SIZE)

# one data point 
X,y = datamodule.test_ds[0]
print("X (C,H,W): ", X.shape, "y: ", y)

# a batch of data via dataloader
XX,YY = next(iter(datamodule.test_dataloader()))
print("XX (B,C,H,W): ", XX.shape, "YY: ", YY.shape)

print(len(datamodule.train_ds) // BATCH_SIZE)
[16:28:58] INFO - Init ImageDataModule for fashion_mnist
[16:29:17] INFO - split train into train/val [0.8, 0.2]
[16:29:17] INFO - train: 48000 val: 12000, test: 10000
X (C,H,W):  torch.Size([1, 32, 32]) y:  9
XX (B,C,H,W):  torch.Size([512, 1, 32, 32]) YY:  torch.Size([512])

Model & hardware

device = get_device()
cfg = OmegaConf.load('../config/model/image/convnet.yaml')
# print(cfg.defaults)
# convnet = instantiate(cfg.defaults)
convnet = instantiate(cfg.baseline)
model =

summary(model, input_size=(B, C, H, W), depth=4)
[16:29:17] INFO - Using device: mps
{'_target_': 'nimrod.models.conv.ConvNet', 'n_features': [1, 8, 16, 32, 64], 'num_classes': 10, 'kernel_size': 3, 'bias': True, 'normalization': None, 'activation': {'_target_': 'hydra.utils.get_class', 'path': 'torch.nn.ReLU'}}
Layer (type:depth-idx)                   Output Shape              Param #
ConvNet                                  [64, 40]                  --
├─Sequential: 1-1                        [64, 40]                  --
│    └─ConvLayer: 2-1                    [64, 8, 28, 28]           --
│    │    └─Sequential: 3-1              [64, 8, 28, 28]           --
│    │    │    └─Conv2d: 4-1             [64, 8, 28, 28]           80
│    │    │    └─ReLU: 4-2               [64, 8, 28, 28]           --
│    └─ConvLayer: 2-2                    [64, 16, 14, 14]          --
│    │    └─Sequential: 3-2              [64, 16, 14, 14]          --
│    │    │    └─Conv2d: 4-3             [64, 16, 14, 14]          1,168
│    │    │    └─ReLU: 4-4               [64, 16, 14, 14]          --
│    └─ConvLayer: 2-3                    [64, 32, 7, 7]            --
│    │    └─Sequential: 3-3              [64, 32, 7, 7]            --
│    │    │    └─Conv2d: 4-5             [64, 32, 7, 7]            4,640
│    │    │    └─ReLU: 4-6               [64, 32, 7, 7]            --
│    └─ConvLayer: 2-4                    [64, 64, 4, 4]            --
│    │    └─Sequential: 3-4              [64, 64, 4, 4]            --
│    │    │    └─Conv2d: 4-7             [64, 64, 4, 4]            18,496
│    │    │    └─ReLU: 4-8               [64, 64, 4, 4]            --
│    └─ConvLayer: 2-5                    [64, 10, 2, 2]            --
│    │    └─Sequential: 3-5              [64, 10, 2, 2]            --
│    │    │    └─Conv2d: 4-9             [64, 10, 2, 2]            5,770
│    └─Flatten: 2-6                      [64, 40]                  --
Total params: 30,154
Trainable params: 30,154
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 53.63
Input size (MB): 0.20
Forward/backward pass size (MB): 6.16
Params size (MB): 0.12
Estimated Total Size (MB): 6.49

LR finder

cfg = OmegaConf.load('../config/model/image/convnet.yaml')
model = instantiate(cfg.batchnorm)
print(summary(model, depth=4))

criterion = nn.CrossEntropyLoss()    
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4) #, weight_decay=1e-5)
# Initialize LR Finder
lr_finder = LRFinder(model, optimizer, criterion, device=device)
# Run LR range test
    start_lr=1e-5,      # Extremely small starting learning rate
    end_lr=10,          # Large ending learning rate
    num_iter=100,   # Number of iterations to test
    smooth_f=0.05,   # Smoothing factor for the loss
# Plot the learning rate vs loss
_, lr_found = lr_finder.plot(log_lr=True)
print('Suggested lr:', lr_found)
Layer (type:depth-idx)                   Param #
ConvNet                                  --
├─Sequential: 1-1                        --
│    └─ConvLayer: 2-1                    --
│    │    └─Sequential: 3-1              --
│    │    │    └─Conv2d: 4-1             72
│    │    │    └─BatchNorm2d: 4-2        16
│    │    │    └─ReLU: 4-3               --
│    └─ConvLayer: 2-2                    --
│    │    └─Sequential: 3-2              --
│    │    │    └─Conv2d: 4-4             1,152
│    │    │    └─BatchNorm2d: 4-5        32
│    │    │    └─ReLU: 4-6               --
│    └─ConvLayer: 2-3                    --
│    │    └─Sequential: 3-3              --
│    │    │    └─Conv2d: 4-7             4,608
│    │    │    └─BatchNorm2d: 4-8        64
│    │    │    └─ReLU: 4-9               --
│    └─ConvLayer: 2-4                    --
│    │    └─Sequential: 3-4              --
│    │    │    └─Conv2d: 4-10            18,432
│    │    │    └─BatchNorm2d: 4-11       128
│    │    │    └─ReLU: 4-12              --
│    └─ConvLayer: 2-5                    --
│    │    └─Sequential: 3-5              --
│    │    │    └─Conv2d: 4-13            73,728
│    │    │    └─BatchNorm2d: 4-14       256
│    │    │    └─ReLU: 4-15              --
│    └─ConvLayer: 2-6                    --
│    │    └─Sequential: 3-6              --
│    │    │    └─Conv2d: 4-16            11,530
│    └─Flatten: 2-7                      --
Total params: 110,018
Trainable params: 110,018
Non-trainable params: 0
Stopping early, the loss has diverged
Learning rate search finished. See the graph with {finder_name}.plot()
LR suggestion: steepest gradient
Suggested LR: 2.01E-03

Suggested lr: 0.0020092330025650463

1-cycle warm-up

device = get_device()
# data module config
cfg_dm = OmegaConf.load('../config/data/image/fashion_mnist.yaml')
cfg_dm.batch_size = 512
datamodule = instantiate(cfg_dm)

# device = 'cpu'
cfg_mdl = OmegaConf.load('../config/model/image/convnet.yaml')
convnet = instantiate(cfg_mdl.batchnorm)
model =


lr_found = 3e-4

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
steps_per_epoch = len(datamodule.train_ds) // cfg_dm.batch_size
total_steps = steps_per_epoch* N_EPOCHS
print(f"size training set: {len(datamodule.train_ds)}, bs: {cfg_dm.batch_size}, steps/epoch: {steps_per_epoch}, total steps: {total_steps}")
# scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.01, steps_per_epoch=steps_per_epochs, epochs=1)

scheduler = torch.optim.lr_scheduler.OneCycleLR(
        max_lr=lr_found,  # Peak learning rate
        # total_steps=len(datamodule.train_ds) * N_EPOCHS,  # Total training iterations
        pct_start=0.3,  # 30% of training increasing LR, 70% decreasing
        anneal_strategy='cos',  # Cosine annealing
        div_factor=10,  # Initial lr = max_lr / div_factor
        # final_div_factor=1e4,
        three_phase=False  # Two phase LR schedule (increase then decrease)


lrs = []
current_step = 0
train_loss_history = []
eval_loss_history = []
avg_train_loss_hist = []
avg_eval_loss_hist = []
max_acc = 0

for epoch in range(N_EPOCHS):
    i = 0
    for images, labels in datamodule.train_dataloader():
        if current_step >= total_steps:
            print(f"Reached total steps: {current_step}/{total_steps}")
        images, labels =,
        outputs = model(images)
        loss = criterion(outputs, labels)        
        current_step += 1
        # current_lr = scheduler.get_last_lr()[0]
        current_lr = optimizer.param_groups[0]['lr']
        if not (i % 100):
            print(f"Loss {loss.item():.4f}, Current LR: {current_lr:.10f}, Step: {current_step}/{total_steps}")
        i += 1

    with torch.no_grad():
        correct = 0
        total = 0
        for images, labels in datamodule.val_dataloader():
            # model expects input (B,H*W)
            images =
            labels =
            # Pass the input through the model
            outputs = model(images)
            # eval loss
            eval_loss = criterion(outputs, labels)
            # Get the predicted labels
            _, predicted = torch.max(, 1)

            # Update the total and correct counts
            total += labels.size(0)
            correct += (predicted == labels).sum()
            acc = 100 * correct / total
            if acc > max_acc:
                max_acc = acc

        # Print the accuracy
    print(f"Epoch {epoch + 1}: Last training Loss {loss.item():.4f}, Last Eval loss {eval_loss.item():.4f} Accuracy = {100 * correct / total:.2f}% Best Accuracy: {max_acc:.2f}")
    # print(f'Current LR: {optimizer.param_groups[0]["lr"]:.5f}')

Seed set to 42
Seed set to 42
[23:31:47] INFO - Init ImageDataModule for fashion_mnist
[23:31:52] INFO - loading dataset fashion_mnist with args () from split train
[23:32:00] INFO - loading dataset fashion_mnist with args () from split test
[23:32:03] INFO - split train into train/val [0.8, 0.2]
[23:32:03] INFO - train: 48000 val: 12000, test: 10000
 ConvNetX (nnet:__main__.ConvNet, num_classes:int,
Type Default Details
nnet ConvNet model
num_classes int number of classes
optimizer Callable optimizer
scheduler Optional None scheduler


cfg = OmegaConf.load('../config/model/image/convnetx.yaml')
feats_dim = [3, 8, 16, 32, 64, 128, 64]
cfg.nnet.n_features = feats_dim
cfg.nnet.num_classes = 200

model = instantiate(cfg.nnet)
B, C, H, W = 64, 3, 64, 64
X = torch.rand(B, C, H, W)
torch.Size([64, 200])
summary(model, input_size=(B, C, H, W), depth=2)
Layer (type:depth-idx)                   Output Shape              Param #
ConvNet                                  [64, 200]                 --
├─Sequential: 1-1                        [64, 200]                 --
│    └─ConvBlock: 2-1                    [64, 8, 64, 64]           232
│    └─ConvBlock: 2-2                    [64, 16, 32, 32]          1,184
│    └─ConvBlock: 2-3                    [64, 32, 16, 16]          4,672
│    └─ConvBlock: 2-4                    [64, 64, 8, 8]            18,560
│    └─ConvBlock: 2-5                    [64, 128, 4, 4]           73,984
│    └─ConvBlock: 2-6                    [64, 64, 2, 2]            73,856
│    └─ConvBlock: 2-7                    [64, 200, 1, 1]           115,600
│    └─Flatten: 2-8                      [64, 200]                 --
Total params: 288,088
Trainable params: 288,088
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 384.93
Input size (MB): 3.15
Forward/backward pass size (MB): 65.48
Params size (MB): 1.15
Estimated Total Size (MB): 69.78

Nimrod training


# data module config
cfg = OmegaConf.load('../config/data/image/fashion_mnist.yaml')
cfg.batch_size = 512
cfg.num_workers = 0
datamodule = instantiate(cfg)
[20:23:50] INFO - Init ImageDataModule for fashion_mnist
[20:24:08] INFO - split train into train/val [0.8, 0.2]
[20:24:08] INFO - train: 48000 val: 12000, test: 10000
cfg = OmegaConf.load('../config/optimizer/adam_w.yaml')
optimizer = instantiate(cfg)

cfg = OmegaConf.load('../config/scheduler/step_lr.yaml')
scheduler = instantiate(cfg)

cfg = OmegaConf.load('../config/model/image/convnetx.yaml')
model = instantiate(cfg)(optimizer=optimizer, scheduler=scheduler)

# # with 1-cycle sched
# cfg.nnet.n_features = [1, 8, 16, 32, 64, 128]
# cfg.scheduler.total_steps = len(datamodule.train_ds) * N_EPOCHS
# model = instantiate(cfg)
[14:53:05] INFO - ConvNetX: init
[14:53:05] INFO - Classifier: init
/user/s/slegroux/miniconda3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/utilities/ Attribute 'nnet' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['nnet'])`.
trainer = Trainer(
    logger=TensorBoardLogger("tb_logs", name="fashion_mnist_convnet", default_hp_metric=True),
    # logger=CSVLogger("logs", name="mnist_convnet"),
    callbacks = [LearningRateMonitor(logging_interval="step")],
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs

LR finder

tuner = Tuner(trainer)
lr_finder = tuner.lr_find(
    num_training=100,  # number of iterations
    # attr_name="",
fig = lr_finder.plot(suggest=True)
print(f"Suggested learning rate: {lr_finder.suggestion()}")
[20:59:14] INFO - Optimizer: <class 'torch.optim.adamw.AdamW'>
[20:59:14] INFO - Scheduler: <torch.optim.lr_scheduler.StepLR object>
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/ The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/ The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
`` stopped: `max_steps=100` reached.
Learning rate set to 0.0019952623149688807
Restoring states from the checkpoint path at /Users/slegroux/Projects/nimrod/nbs/.lr_find_61a6646e-2298-4940-9b72-9185e67e8d21.ckpt
Restored all states from the checkpoint at /Users/slegroux/Projects/nimrod/nbs/.lr_find_61a6646e-2298-4940-9b72-9185e67e8d21.ckpt

Suggested learning rate: 0.0019952623149688807
print(trainer.max_epochs, len(datamodule.train_ds), datamodule.hparams.batch_size)
10 48000 512

1-cycle scheduling

# lr_found = lr_finder.suggestion()
lr_found = 3e-4

cfg = OmegaConf.load('../config/data/image/mnist.yaml')
cfg.batch_size = 512
cfg.num_workers = 0
datamodule = instantiate(cfg)

checkpoint_callback = ModelCheckpoint(
    monitor='val/loss',  # Metric to monitor
    dirpath='checkpoints/',  # Directory to save checkpoints
    save_top_k=1,  # Save only the best checkpoint
    mode='min'  # Mode can be 'min' or 'max' depending on the metric

lr_monitor = LearningRateMonitor(logging_interval="step")

trainer = Trainer(
    # logger=TensorBoardLogger("tb_logs", name="mnist_convnet", default_hp_metric=True),
    logger=CSVLogger("logs", name="fashion_mnist_convnet"),
    callbacks = [lr_monitor, checkpoint_callback],

print("estimated steps: ", trainer.estimated_stepping_batches, "accumulate_grad_batches: ", trainer.accumulate_grad_batches)

model_cfg = OmegaConf.load('../config/model/image/convnetx.yaml')
model_cfg.scheduler.total_steps = trainer.max_epochs * len(datamodule.train_dataloader())
model_cfg.scheduler.max_lr = lr_found#lr_finder.suggestion()

model = instantiate(model_cfg)

print("LR: ",, datamodule.train_dataloader(), datamodule.val_dataloader())

csv_path = f"{trainer.logger.log_dir}/metrics.csv"
metrics = pd.read_csv(csv_path)

plt.plot(metrics['step'], metrics['train/loss_step'], 'b.-')
plt.plot(metrics['step'], metrics['val/loss'],'r.-')
plt.plot(metrics['step'], metrics['lr-AdamW'], 'g.-')
[23:33:22] INFO - Init ImageDataModule for mnist
[23:33:26] INFO - loading dataset mnist with args () from split train
[23:33:33] INFO - loading dataset mnist with args () from split test
[23:33:36] INFO - split train into train/val [0.8, 0.2]
[23:33:36] INFO - train: 48000 val: 12000, test: 10000
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Loading `train_dataloader` to estimate number of stepping batches.
estimated steps:  -1 accumulate_grad_batches:  1
trainer.test(model, datamodule.test_dataloader(), ckpt_path="best")
Restoring states from the checkpoint path at /Users/slegroux/Projects/nimrod/nbs/checkpoints/epoch00-val_loss0.13.ckpt
Loaded model weights from the checkpoint at /Users/slegroux/Projects/nimrod/nbs/checkpoints/epoch00-val_loss0.13.ckpt
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/ The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
┃        Test metric               DataLoader 0        ┃
│         test/acc              0.9707000255584717     │
│         test/loss             0.11260439455509186    │
[{'test/loss': 0.11260439455509186, 'test/acc': 0.9707000255584717}]
best_checkpoint_path = checkpoint_callback.best_model_path
print(f"Best checkpoint path: {best_checkpoint_path}")
Best checkpoint path: /Users/slegroux/Projects/nimrod/nbs/checkpoints/epoch00-val_loss0.13.ckpt

Resume training

cfg = OmegaConf.load('../config/scheduler/reduce_lr_on_plateau.yaml')
sched = instantiate(cfg)
# sched.total_steps = len(datamodule.train_ds) * N_EPOCHS
lr = trainer.optimizers[0].param_groups[0]['lr']
print(f"LR: {lr}")
model = ConvNetX.load_from_checkpoint(best_checkpoint_path,scheduler=sched, lr=lr)

[20:56:41] INFO - ConvNetX: init
[20:56:41] INFO - Classifier: init
LR: 7.642883799445691e-07
"nnet":        ConvNet(
  (net): Sequential(
    (0): ConvLayer(
      (net): Sequential(
        (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU()
    (1): ConvLayer(
      (net): Sequential(
        (0): Conv2d(8, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU()
    (2): ConvLayer(
      (net): Sequential(
        (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU()
    (3): ConvLayer(
      (net): Sequential(
        (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU()
    (4): ConvLayer(
      (net): Sequential(
        (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU()
    (5): ConvLayer(
      (net): Sequential(
        (0): Conv2d(128, 10, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (6): Flatten(start_dim=1, end_dim=-1)
"num_classes": 10
"optimizer":   functools.partial(<class 'torch.optim.adamw.AdamW'>, lr=0.0001, weight_decay=1e-05)
"scheduler":   functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, mode='min', factor=0.1, patience=10)
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/utilities/ Attribute 'nnet' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['nnet'])`.
# batchnorm should allow us to try higher LR
# model_cfg = OmegaConf.load('../config/model/image/convnetx_adam.yaml')
# = 0.1

# model = instantiate(model_cfg)
# opt = instantiate(model_cfg.optimizer)
# print(opt)
# sched = instantiate(model_cfg.scheduler)
# print(sched)


trainer = Trainer(
    # logger=TensorBoardLogger("tb_logs", name="mnist_convnet", default_hp_metric=True),
    logger=CSVLogger("logs", name="fashion_mnist_convnet"),
    callbacks = [LearningRateMonitor(logging_interval="step")],

# use standar adam scheduler

# retrieve last ckpt, datamodule.train_dataloader(), datamodule.val_dataloader(), ckpt_path=best_checkpoint_path)
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
Restoring states from the checkpoint path at /Users/slegroux/Projects/nimrod/nbs/checkpoints/epoch00-val_loss0.13.ckpt
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/ Be aware that when using `ckpt_path`, callbacks used to create the checkpoint need to be provided during `Trainer` instantiation. Please add the following callbacks: ["ModelCheckpoint{'monitor': 'val/loss', 'mode': 'min', 'every_n_train_steps': 0, 'every_n_epochs': 1, 'train_time_interval': None}"].
[20:56:49] INFO - Optimizer: AdamW (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.0001
    maximize: False
    weight_decay: 1e-05
[20:56:49] INFO - Scheduler: <torch.optim.lr_scheduler.ReduceLROnPlateau object>
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/core/ The lr scheduler dict contains the key(s) ['monitor'], but the keys will be ignored. You need to call `lr_scheduler.step()` manually in manual optimization.

  | Name         | Type               | Params | Mode 
0 | loss         | CrossEntropyLoss   | 0      | train
1 | train_acc    | MulticlassAccuracy | 0      | train
2 | val_acc      | MulticlassAccuracy | 0      | train
3 | test_acc     | MulticlassAccuracy | 0      | train
4 | train_loss   | MeanMetric         | 0      | train
5 | val_loss     | MeanMetric         | 0      | train
6 | test_loss    | MeanMetric         | 0      | train
7 | val_acc_best | MaxMetric          | 0      | train
8 | nnet         | ConvNet            | 110 K  | train
110 K     Trainable params
0         Non-trainable params
110 K     Total params
0.440     Total estimated model params size (MB)
39        Modules in train mode
0         Modules in eval mode
Restored all states from the checkpoint at /Users/slegroux/Projects/nimrod/nbs/checkpoints/epoch00-val_loss0.13.ckpt
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/ The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/ The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
[20:57:08] INFO - scheduler is an instance of Reduce plateau
[20:57:26] INFO - scheduler is an instance of Reduce plateau
`` stopped: `max_epochs=3` reached.
csv_path = f"{trainer.logger.log_dir}/metrics.csv"
metrics = pd.read_csv(csv_path)
plt.plot(metrics['step'], metrics['train/loss_step'], 'b.-')
plt.plot(metrics['step'], metrics['val/loss'],'r.-')
plt.plot(metrics['step'], metrics['lr-AdamW'], 'g.-')
trainer.test(model, datamodule.test_dataloader(), ckpt_path="best")

Restoring states from the checkpoint path at logs/fashion_mnist_convnet/version_14/checkpoints/epoch=2-step=282.ckpt
Loaded model weights from the checkpoint at logs/fashion_mnist_convnet/version_14/checkpoints/epoch=2-step=282.ckpt
/Users/slegroux/miniforge3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/ The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=11` in the `DataLoader` to improve performance.
┃        Test metric               DataLoader 0        ┃
│         test/acc              0.9711999893188477     │
│         test/loss             0.11154787242412567    │