Model Core Utils

core classes & helpers

Init

Apply init to layers with relu activations


source

weight_init

 weight_init (m:torch.nn.modules.module.Module, leaky:int=0)
Type Default Details
m Module the module to initialize
leaky int 0 if leaky relu used
x = torch.randn(1, 1, 32, 32)
in_channels = x.shape[1]
out_channels = 3
kernel_size = 3
stride = 1
c1 = nn.Conv2d(in_channels, out_channels, kernel_size, stride, kernel_size//2)
x1 = c1(x)
print(f"after conv: {x1.shape}")
print("x flat dim:", x1.flatten(2).shape)
l1 = nn.Linear(32*32, 64)
x2 = l1(x1.flatten(2))
print("after linear:", x2.shape)

leaky = 0.01
nnet = nn.Sequential(c1, nn.LeakyReLU(negative_slope=leaky), nn.Flatten(2), l1)
nnet.eval().cpu()
fig, ax = plt.subplots(1, 3)
ax[0].imshow(x.permute(0,2,3,1).squeeze())
ax[0].set_title("x")
y = nnet(x)
ax[1].imshow(y.detach().squeeze(0).permute(1,0).reshape(8, 8, 3))
ax[1].set_title("no init")
wi = partial(weight_init, leaky=leaky)
nnet.apply(wi)
y = nnet(x)
ax[2].imshow(y.detach().squeeze(0).permute(1,0).reshape(8, 8, 3))
ax[2].set_title("kaiming init")
after conv: torch.Size([1, 3, 32, 32])
x flat dim:
torch.Size([1, 3, 1024])
after linear:
torch.Size([1, 3, 64])
Text(0.5, 1.0, 'kaiming init')

Classifier Abstract Base Class


source

Classifier

 Classifier (nnet:torch.nn.modules.module.Module, num_classes:int,
             optimizer:Callable[...,torch.optim.optimizer.Optimizer],
             scheduler:Optional[Callable[...,Any]]=None)

Helper class that provides a standard way to create an ABC using inheritance.

Type Default Details
nnet Module
num_classes int
optimizer Callable partial of optimizer
scheduler Optional None partial of scheduler

source

plot_classifier_metrics_from_csv

 plot_classifier_metrics_from_csv (metrics_csv_path:str|os.PathLike)

Regressor Abstract Class


source

Regressor

 Regressor (nnet:lightning.pytorch.core.module.LightningModule,
            optimizer:Callable[...,torch.optim.optimizer.Optimizer],
            scheduler:Optional[Callable[...,Any]]=None)

Helper class that provides a standard way to create an ABC using inheritance.

Type Default Details
nnet LightningModule
optimizer Callable partial of optimizer
scheduler Optional None partial of scheduler

Diffuser Abstract Class


source

Diffuser

 Diffuser (nnet:lightning.pytorch.core.module.LightningModule,
           optimizer:Callable[...,torch.optim.optimizer.Optimizer],
           scheduler:Optional[Callable[...,Any]]=None)

Helper class that provides a standard way to create an ABC using inheritance.

Type Default Details
nnet LightningModule
optimizer Callable partial of optimizer
scheduler Optional None partial of scheduler

Sequential Model


source

SequentialModelX

 SequentialModelX (modules:List[torch.nn.modules.module.Module], *args,
                   **kwargs)

Helper class that provides a standard way to create an ABC using inheritance.

LR Finder Helper

# use LRFinder pythonm module (other version with lightning)

def find_optimal_lr(model, train_loader, criterion=None, optimizer=None, device='cuda'):
    # If no criterion provided, use default CrossEntropyLoss
    if criterion is None:
        criterion = nn.CrossEntropyLoss()
    
    # If no optimizer provided, use Adam
    if optimizer is None:
        optimizer = torch.optim.Adam(model.parameters(), lr=1e-7, weight_decay=1e-2)
    
    # Initialize LR Finder
    lr_finder = LRFinder(model, optimizer, criterion, device=device)
    
    # Run LR range test
    lr_finder.range_test(
        train_loader, 
        start_lr=1e-7,  # Very small starting learning rate
        end_lr=10,      # Large ending learning rate
        num_iter=100,   # Number of iterations to test
        smooth_f=0.05   # Smoothing factor for the loss
    )
    
    # Plot the learning rate vs loss
    lr_finder.plot(log_lr=True)
    
    # Suggest optimal learning rate
    suggested_lr = lr_finder.reset()
    
    print(f"Suggested Learning Rate: {suggested_lr}")
    
    return suggested_lr

source

lr_finder

 lr_finder (model:Callable[...,torch.nn.modules.module.Module],
            datamodule:nimrod.image.datasets.ImageDataModule,
            num_training:int=100, plot:bool=True)
Type Default Details
model Callable partial model (missing optim & sched)
datamodule ImageDataModule data module
num_training int 100 number of iterations
plot bool True plot the learning rate vs loss

1-cycle train helper


source

train_one_cycle

 train_one_cycle (model:Callable[...,torch.nn.modules.module.Module],
                  datamodule:nimrod.image.datasets.ImageDataModule,
                  max_lr:float=0.1, weight_decay=1e-05, n_epochs:int=5,
                  project_name:str='MNIST-Classifier', tags=['arch',
                  'dev'], test:bool=True, run_name:str=None,
                  model_summary:bool=True, logger_cb:str='wandb',
                  precision='32-true')

train one cycle, adamW optim with wandb logging & learning rate monitor by default

Type Default Details
model Callable partial model (missing optim & sched)
datamodule ImageDataModule
max_lr float 0.1
weight_decay float 1e-05
n_epochs int 5
project_name str MNIST-Classifier
tags list [‘arch’, ‘dev’]
test bool True
run_name str None
model_summary bool True
logger_cb str wandb
precision str 32-true 16-mixed, 32-true
# data
cfg = OmegaConf.load('../config/data/image/fashion_mnist.yaml')
cfg.data_dir = "../data/image"
cfg.batch_size = 128
cfg.num_workers = 0
dm = instantiate(cfg)
dm.prepare_data()
dm.setup()
[21:54:44] INFO - Init ImageDataModule for fashion_mnist
[21:54:46] INFO - loading dataset fashion_mnist with args () from split train
[21:54:46] INFO - loading dataset fashion_mnist from split train
Overwrite dataset info from restored data version if exists.
[21:54:48] INFO - Overwrite dataset info from restored data version if exists.
Loading Dataset info from ../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2
[21:54:48] INFO - Loading Dataset info from ../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2
Found cached dataset fashion_mnist (/user/s/slegroux/Projects/nimrod/nbs/../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2)
[21:54:48] INFO - Found cached dataset fashion_mnist (/user/s/slegroux/Projects/nimrod/nbs/../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2)
Loading Dataset info from /user/s/slegroux/Projects/nimrod/nbs/../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2
[21:54:48] INFO - Loading Dataset info from /user/s/slegroux/Projects/nimrod/nbs/../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2
[21:54:52] INFO - loading dataset fashion_mnist with args () from split test
[21:54:52] INFO - loading dataset fashion_mnist from split test
Overwrite dataset info from restored data version if exists.
[21:54:53] INFO - Overwrite dataset info from restored data version if exists.
Loading Dataset info from ../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2
[21:54:53] INFO - Loading Dataset info from ../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2
Found cached dataset fashion_mnist (/user/s/slegroux/Projects/nimrod/nbs/../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2)
[21:54:53] INFO - Found cached dataset fashion_mnist (/user/s/slegroux/Projects/nimrod/nbs/../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2)
Loading Dataset info from /user/s/slegroux/Projects/nimrod/nbs/../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2
[21:54:53] INFO - Loading Dataset info from /user/s/slegroux/Projects/nimrod/nbs/../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2
[21:54:53] INFO - split train into train/val [0.8, 0.2]
[21:54:53] INFO - train: 48000 val: 12000, test: 10000
# model
cfg_model = OmegaConf.load('../config/model/image/convnetx.yaml')
feats_dim = [1, 8, 16, 32, 64, 128]
# feats_dim = [1, 4, 8, 16, 8]
# feats_dim = [1, 16, 32, 64, 32]
cfg_model.nnet.n_features = feats_dim
model = instantiate(cfg_model) #partial
do_lr_finder = False

if do_lr_finder:
    suggested_lr = lr_finder(model=model, datamodule=dm, plot=True)
else:
    suggested_lr = 1e-3

# train
N_EPOCHS = 1

project_name = "FASHION-MNIST-Classifier"
run_name = f"{model.func.__name__}-bs:{dm.batch_size}-epochs:{N_EPOCHS}"
tags = [f"feats:{feats_dim}", f"bs:{dm.batch_size}", f"epochs:{N_EPOCHS}"]

trained_model, best_ckpt = train_one_cycle(
    model,
    dm,
    n_epochs=N_EPOCHS,
    max_lr=suggested_lr,
    project_name=project_name,
    tags=tags,
    run_name=run_name
    )
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
[21:54:54] INFO - ConvNetX: init
[21:54:54] INFO - Classifier: init
/user/s/slegroux/miniconda3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/utilities/parsing.py:209: Attribute 'nnet' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['nnet'])`.
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
ConvNet                                  [128, 10]                 --
├─Sequential: 1-1                        [128, 10]                 --
│    └─ConvLayer: 2-1                    [128, 8, 32, 32]          --
│    │    └─Sequential: 3-1              [128, 8, 32, 32]          88
│    └─ConvLayer: 2-2                    [128, 16, 16, 16]         --
│    │    └─Sequential: 3-2              [128, 16, 16, 16]         1,184
│    └─ConvLayer: 2-3                    [128, 32, 8, 8]           --
│    │    └─Sequential: 3-3              [128, 32, 8, 8]           4,672
│    └─ConvLayer: 2-4                    [128, 64, 4, 4]           --
│    │    └─Sequential: 3-4              [128, 64, 4, 4]           18,560
│    └─ConvLayer: 2-5                    [128, 128, 2, 2]          --
│    │    └─Sequential: 3-5              [128, 128, 2, 2]          73,984
│    └─ConvLayer: 2-6                    [128, 10, 1, 1]           --
│    │    └─Sequential: 3-6              [128, 10, 1, 1]           11,540
│    └─Flatten: 2-7                      [128, 10]                 --
==========================================================================================
Total params: 110,028
Trainable params: 110,028
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 161.97
==========================================================================================
Input size (MB): 0.52
Forward/backward pass size (MB): 32.53
Params size (MB): 0.44
Estimated Total Size (MB): 33.49
==========================================================================================
Tracking run with wandb version 0.19.1
Run data is saved locally in /tmp/wandb/run-20250206_215454-1fc1d7re
/user/s/slegroux/miniconda3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:654: Checkpoint directory /user/s/slegroux/Projects/nimrod/nbs/checkpoints/FASHION-MNIST-Classifier/ConvNetX-bs:128-epochs:1 exists and is not empty.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[21:54:54] INFO - Optimizer: <class 'torch.optim.adamw.AdamW'>
[21:54:54] INFO - Scheduler: <class 'torch.optim.lr_scheduler.OneCycleLR'>

  | Name         | Type               | Params | Mode 
------------------------------------------------------------
0 | nnet         | ConvNet            | 110 K  | train
1 | loss         | CrossEntropyLoss   | 0      | train
2 | train_acc    | MulticlassAccuracy | 0      | train
3 | val_acc      | MulticlassAccuracy | 0      | train
4 | test_acc     | MulticlassAccuracy | 0      | train
5 | train_loss   | MeanMetric         | 0      | train
6 | val_loss     | MeanMetric         | 0      | train
7 | test_loss    | MeanMetric         | 0      | train
8 | val_acc_best | MaxMetric          | 0      | train
------------------------------------------------------------
110 K     Trainable params
0         Non-trainable params
110 K     Total params
0.440     Total estimated model params size (MB)
41        Modules in train mode
0         Modules in eval mode
/user/s/slegroux/miniconda3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:425: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=23` in the `DataLoader` to improve performance.
/user/s/slegroux/miniconda3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:425: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=23` in the `DataLoader` to improve performance.
`Trainer.fit` stopped: `max_epochs=1` reached.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/user/s/slegroux/miniconda3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:425: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=23` in the `DataLoader` to improve performance.
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric               DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         test/acc              0.8658000230789185     │
│         test/loss             0.5964178442955017     │
└───────────────────────────┴───────────────────────────┘
[21:55:01] INFO - Best ckpt path: /user/s/slegroux/Projects/nimrod/nbs/checkpoints/FASHION-MNIST-Classifier/ConvNetX-bs:128-epochs:1/0-0.57.ckpt


Run history:


epoch ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr-AdamW ▁▁▂▂▃▃▄▅▅▅▆▆▇██████████▇▇▆▅▅▅▅▄▄▄▃▃▂▁▁▁▁
test/acc
test/loss
train/acc_epoch
train/acc_step ▁▂▄▅▅▇▇▇▆▇▇▇▇▇▇▇▇▇█▇▇█▇▇██▇▇███▇▇██▇██▇█
train/loss_epoch
train/loss_step █▇▆▆▆▄▃▄▃▃▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁
trainer/global_step ▁▁▁▁▂▃▃▃▃▃▄▄▄▄▄▄▄▄▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇███████
val/acc
val/acc_best
val/loss

Run summary:


epoch 1
lr-AdamW 0.0
test/acc 0.8658
test/loss 0.59642
train/acc_epoch 0.79585
train/acc_step 0.89844
train/loss_epoch 0.84797
train/loss_step 0.56174
trainer/global_step 375
val/acc 0.87608
val/acc_best 0.87608
val/loss 0.5739

View run ConvNetX-bs:128-epochs:1 at: https://wandb.ai/slegroux/FASHION-MNIST-Classifier/runs/1fc1d7re
View project at: https://wandb.ai/slegroux/FASHION-MNIST-Classifier
Synced 6 W&B file(s), 0 media file(s), 7 artifact file(s) and 0 other file(s)
Find logs at: /tmp/wandb/run-20250206_215454-1fc1d7re/logs
print(best_ckpt)
x = torch.randn(1, 1, 32, 32)
trained_model.eval()
trained_model(x)
/user/s/slegroux/Projects/nimrod/nbs/checkpoints/FASHION-MNIST-Classifier/ConvNetX-bs:128-epochs:1/0-0.58.ckpt
tensor([[0.0000, 0.0000, 0.5579, 0.1776, 0.0000, 0.3419, 0.0000, 0.0000, 2.2810,
         0.0000]], grad_fn=<ViewBackward0>)