Model Core Utils

core classes & helpers

Init

Apply init to layers with relu activations

weight_init

 weight_init (m:torch.nn.modules.module.Module, leaky:int=0)

	Type	Default	Details
m	Module		the module to initialize
leaky	int	0	if leaky relu used

x = torch.randn(1, 1, 32, 32)
in_channels = x.shape[1]
out_channels = 3
kernel_size = 3
stride = 1
c1 = nn.Conv2d(in_channels, out_channels, kernel_size, stride, kernel_size//2)
x1 = c1(x)
print(f"after conv: {x1.shape}")
print("x flat dim:", x1.flatten(2).shape)
l1 = nn.Linear(32*32, 64)
x2 = l1(x1.flatten(2))
print("after linear:", x2.shape)

leaky = 0.01
nnet = nn.Sequential(c1, nn.LeakyReLU(negative_slope=leaky), nn.Flatten(2), l1)
nnet.eval().cpu()
fig, ax = plt.subplots(1, 3)
ax[0].imshow(x.permute(0,2,3,1).squeeze())
ax[0].set_title("x")
y = nnet(x)
ax[1].imshow(y.detach().squeeze(0).permute(1,0).reshape(8, 8, 3))
ax[1].set_title("no init")
wi = partial(weight_init, leaky=leaky)
nnet.apply(wi)
y = nnet(x)
ax[2].imshow(y.detach().squeeze(0).permute(1,0).reshape(8, 8, 3))
ax[2].set_title("kaiming init")

after conv: torch.Size([1, 3, 32, 32])

x flat dim:
torch.Size([1, 3, 1024])

after linear:
torch.Size([1, 3, 64])

Text(0.5, 1.0, 'kaiming init')

Classifier Abstract Base Class

source

Classifier

 Classifier (nnet:torch.nn.modules.module.Module, num_classes:int,
             optimizer:Callable[...,torch.optim.optimizer.Optimizer],
             scheduler:Optional[Callable[...,Any]]=None)

Helper class that provides a standard way to create an ABC using inheritance.

	Type	Default	Details
nnet	Module
num_classes	int
optimizer	Callable		partial of optimizer
scheduler	Optional	None	partial of scheduler

source

plot_classifier_metrics_from_csv

 plot_classifier_metrics_from_csv (metrics_csv_path:str|os.PathLike)

Regressor Abstract Class

source

Regressor

 Regressor (nnet:lightning.pytorch.core.module.LightningModule,
            optimizer:Callable[...,torch.optim.optimizer.Optimizer],
            scheduler:Optional[Callable[...,Any]]=None)

Helper class that provides a standard way to create an ABC using inheritance.

	Type	Default	Details
nnet	LightningModule
optimizer	Callable		partial of optimizer
scheduler	Optional	None	partial of scheduler

Diffuser Abstract Class

source

Diffuser

 Diffuser (nnet:lightning.pytorch.core.module.LightningModule,
           optimizer:Callable[...,torch.optim.optimizer.Optimizer],
           scheduler:Optional[Callable[...,Any]]=None)

Helper class that provides a standard way to create an ABC using inheritance.

	Type	Default	Details
nnet	LightningModule
optimizer	Callable		partial of optimizer
scheduler	Optional	None	partial of scheduler

Sequential Model

source

SequentialModelX

 SequentialModelX (modules:List[torch.nn.modules.module.Module], *args,
                   **kwargs)

Helper class that provides a standard way to create an ABC using inheritance.

LR Finder Helper

# use LRFinder pythonm module (other version with lightning)

def find_optimal_lr(model, train_loader, criterion=None, optimizer=None, device='cuda'):
    # If no criterion provided, use default CrossEntropyLoss
    if criterion is None:
        criterion = nn.CrossEntropyLoss()
    
    # If no optimizer provided, use Adam
    if optimizer is None:
        optimizer = torch.optim.Adam(model.parameters(), lr=1e-7, weight_decay=1e-2)
    
    # Initialize LR Finder
    lr_finder = LRFinder(model, optimizer, criterion, device=device)
    
    # Run LR range test
    lr_finder.range_test(
        train_loader, 
        start_lr=1e-7,  # Very small starting learning rate
        end_lr=10,      # Large ending learning rate
        num_iter=100,   # Number of iterations to test
        smooth_f=0.05   # Smoothing factor for the loss
    )
    
    # Plot the learning rate vs loss
    lr_finder.plot(log_lr=True)
    
    # Suggest optimal learning rate
    suggested_lr = lr_finder.reset()
    
    print(f"Suggested Learning Rate: {suggested_lr}")
    
    return suggested_lr

source

lr_finder

 lr_finder (model:Callable[...,torch.nn.modules.module.Module],
            datamodule:nimrod.image.datasets.ImageDataModule,
            num_training:int=100, plot:bool=True)

	Type	Default	Details
model	Callable		partial model (missing optim & sched)
datamodule	ImageDataModule		data module
num_training	int	100	number of iterations
plot	bool	True	plot the learning rate vs loss

1-cycle train helper

source

train_one_cycle

 train_one_cycle (model:Callable[...,torch.nn.modules.module.Module],
                  datamodule:nimrod.image.datasets.ImageDataModule,
                  max_lr:float=0.1, weight_decay=1e-05, n_epochs:int=5,
                  project_name:str='MNIST-Classifier', tags=['arch',
                  'dev'], test:bool=True, run_name:str=None,
                  model_summary:bool=True, logger_cb:str='wandb',
                  precision='32-true')

train one cycle, adamW optim with wandb logging & learning rate monitor by default

	Type	Default	Details
model	Callable		partial model (missing optim & sched)
datamodule	ImageDataModule
max_lr	float	0.1
weight_decay	float	1e-05
n_epochs	int	5
project_name	str	MNIST-Classifier
tags	list	[‘arch’, ‘dev’]
test	bool	True
run_name	str	None
model_summary	bool	True
logger_cb	str	wandb
precision	str	32-true	16-mixed, 32-true

# data
cfg = OmegaConf.load('../config/data/image/fashion_mnist.yaml')
cfg.data_dir = "../data/image"
cfg.batch_size = 128
cfg.num_workers = 0
dm = instantiate(cfg)
dm.prepare_data()
dm.setup()

[21:54:44] INFO - Init ImageDataModule for fashion_mnist
[21:54:46] INFO - loading dataset fashion_mnist with args () from split train
[21:54:46] INFO - loading dataset fashion_mnist from split train
Overwrite dataset info from restored data version if exists.
[21:54:48] INFO - Overwrite dataset info from restored data version if exists.
Loading Dataset info from ../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2
[21:54:48] INFO - Loading Dataset info from ../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2
Found cached dataset fashion_mnist (/user/s/slegroux/Projects/nimrod/nbs/../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2)
[21:54:48] INFO - Found cached dataset fashion_mnist (/user/s/slegroux/Projects/nimrod/nbs/../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2)
Loading Dataset info from /user/s/slegroux/Projects/nimrod/nbs/../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2
[21:54:48] INFO - Loading Dataset info from /user/s/slegroux/Projects/nimrod/nbs/../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2
[21:54:52] INFO - loading dataset fashion_mnist with args () from split test
[21:54:52] INFO - loading dataset fashion_mnist from split test
Overwrite dataset info from restored data version if exists.
[21:54:53] INFO - Overwrite dataset info from restored data version if exists.
Loading Dataset info from ../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2
[21:54:53] INFO - Loading Dataset info from ../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2
Found cached dataset fashion_mnist (/user/s/slegroux/Projects/nimrod/nbs/../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2)
[21:54:53] INFO - Found cached dataset fashion_mnist (/user/s/slegroux/Projects/nimrod/nbs/../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2)
Loading Dataset info from /user/s/slegroux/Projects/nimrod/nbs/../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2
[21:54:53] INFO - Loading Dataset info from /user/s/slegroux/Projects/nimrod/nbs/../data/image/fashion_mnist/fashion_mnist/0.0.0/531be5e2ccc9dba0c201ad3ae567a4f3d16ecdd2
[21:54:53] INFO - split train into train/val [0.8, 0.2]
[21:54:53] INFO - train: 48000 val: 12000, test: 10000

# model
cfg_model = OmegaConf.load('../config/model/image/convnetx.yaml')
feats_dim = [1, 8, 16, 32, 64, 128]
# feats_dim = [1, 4, 8, 16, 8]
# feats_dim = [1, 16, 32, 64, 32]
cfg_model.nnet.n_features = feats_dim
model = instantiate(cfg_model) #partial
do_lr_finder = False

if do_lr_finder:
    suggested_lr = lr_finder(model=model, datamodule=dm, plot=True)
else:
    suggested_lr = 1e-3

# train
N_EPOCHS = 1

project_name = "FASHION-MNIST-Classifier"
run_name = f"{model.func.__name__}-bs:{dm.batch_size}-epochs:{N_EPOCHS}"
tags = [f"feats:{feats_dim}", f"bs:{dm.batch_size}", f"epochs:{N_EPOCHS}"]

trained_model, best_ckpt = train_one_cycle(
    model,
    dm,
    n_epochs=N_EPOCHS,
    max_lr=suggested_lr,
    project_name=project_name,
    tags=tags,
    run_name=run_name
    )

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
[21:54:54] INFO - ConvNetX: init
[21:54:54] INFO - Classifier: init
/user/s/slegroux/miniconda3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/utilities/parsing.py:209: Attribute 'nnet' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['nnet'])`.

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
ConvNet                                  [128, 10]                 --
├─Sequential: 1-1                        [128, 10]                 --
│    └─ConvLayer: 2-1                    [128, 8, 32, 32]          --
│    │    └─Sequential: 3-1              [128, 8, 32, 32]          88
│    └─ConvLayer: 2-2                    [128, 16, 16, 16]         --
│    │    └─Sequential: 3-2              [128, 16, 16, 16]         1,184
│    └─ConvLayer: 2-3                    [128, 32, 8, 8]           --
│    │    └─Sequential: 3-3              [128, 32, 8, 8]           4,672
│    └─ConvLayer: 2-4                    [128, 64, 4, 4]           --
│    │    └─Sequential: 3-4              [128, 64, 4, 4]           18,560
│    └─ConvLayer: 2-5                    [128, 128, 2, 2]          --
│    │    └─Sequential: 3-5              [128, 128, 2, 2]          73,984
│    └─ConvLayer: 2-6                    [128, 10, 1, 1]           --
│    │    └─Sequential: 3-6              [128, 10, 1, 1]           11,540
│    └─Flatten: 2-7                      [128, 10]                 --
==========================================================================================
Total params: 110,028
Trainable params: 110,028
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 161.97
==========================================================================================
Input size (MB): 0.52
Forward/backward pass size (MB): 32.53
Params size (MB): 0.44
Estimated Total Size (MB): 33.49
==========================================================================================

Tracking run with wandb version 0.19.1

Run data is saved locally in /tmp/wandb/run-20250206_215454-1fc1d7re

Syncing run ConvNetX-bs:128-epochs:1 to Weights & Biases (docs)

View project at https://wandb.ai/slegroux/FASHION-MNIST-Classifier

View run at https://wandb.ai/slegroux/FASHION-MNIST-Classifier/runs/1fc1d7re

/user/s/slegroux/miniconda3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:654: Checkpoint directory /user/s/slegroux/Projects/nimrod/nbs/checkpoints/FASHION-MNIST-Classifier/ConvNetX-bs:128-epochs:1 exists and is not empty.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[21:54:54] INFO - Optimizer: <class 'torch.optim.adamw.AdamW'>
[21:54:54] INFO - Scheduler: <class 'torch.optim.lr_scheduler.OneCycleLR'>

  | Name         | Type               | Params | Mode 
------------------------------------------------------------
0 | nnet         | ConvNet            | 110 K  | train
1 | loss         | CrossEntropyLoss   | 0      | train
2 | train_acc    | MulticlassAccuracy | 0      | train
3 | val_acc      | MulticlassAccuracy | 0      | train
4 | test_acc     | MulticlassAccuracy | 0      | train
5 | train_loss   | MeanMetric         | 0      | train
6 | val_loss     | MeanMetric         | 0      | train
7 | test_loss    | MeanMetric         | 0      | train
8 | val_acc_best | MaxMetric          | 0      | train
------------------------------------------------------------
110 K     Trainable params
0         Non-trainable params
110 K     Total params
0.440     Total estimated model params size (MB)
41        Modules in train mode
0         Modules in eval mode

/user/s/slegroux/miniconda3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:425: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=23` in the `DataLoader` to improve performance.
/user/s/slegroux/miniconda3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:425: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=23` in the `DataLoader` to improve performance.

`Trainer.fit` stopped: `max_epochs=1` reached.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/user/s/slegroux/miniconda3/envs/nimrod/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:425: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=23` in the `DataLoader` to improve performance.

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         test/acc          │    0.8658000230789185     │
│         test/loss         │    0.5964178442955017     │
└───────────────────────────┴───────────────────────────┘

[21:55:01] INFO - Best ckpt path: /user/s/slegroux/Projects/nimrod/nbs/checkpoints/FASHION-MNIST-Classifier/ConvNetX-bs:128-epochs:1/0-0.57.ckpt

Run history:

epoch	▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr-AdamW	▁▁▂▂▃▃▄▅▅▅▆▆▇██████████▇▇▆▅▅▅▅▄▄▄▃▃▂▁▁▁▁
test/acc	▁
test/loss	▁
train/acc_epoch	▁
train/acc_step	▁▂▄▅▅▇▇▇▆▇▇▇▇▇▇▇▇▇█▇▇█▇▇██▇▇███▇▇██▇██▇█
train/loss_epoch	▁
train/loss_step	█▇▆▆▆▄▃▄▃▃▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁
trainer/global_step	▁▁▁▁▂▃▃▃▃▃▄▄▄▄▄▄▄▄▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇███████
val/acc	▁
val/acc_best	▁
val/loss	▁

Run summary:

epoch	1
lr-AdamW	0.0
test/acc	0.8658
test/loss	0.59642
train/acc_epoch	0.79585
train/acc_step	0.89844
train/loss_epoch	0.84797
train/loss_step	0.56174
trainer/global_step	375
val/acc	0.87608
val/acc_best	0.87608
val/loss	0.5739

View run ConvNetX-bs:128-epochs:1 at: https://wandb.ai/slegroux/FASHION-MNIST-Classifier/runs/1fc1d7re
View project at: https://wandb.ai/slegroux/FASHION-MNIST-Classifier
Synced 6 W&B file(s), 0 media file(s), 7 artifact file(s) and 0 other file(s)

Find logs at: /tmp/wandb/run-20250206_215454-1fc1d7re/logs

print(best_ckpt)
x = torch.randn(1, 1, 32, 32)
trained_model.eval()
trained_model(x)

/user/s/slegroux/Projects/nimrod/nbs/checkpoints/FASHION-MNIST-Classifier/ConvNetX-bs:128-epochs:1/0-0.58.ckpt

tensor([[0.0000, 0.0000, 0.5579, 0.1776, 0.0000, 0.3419, 0.0000, 0.0000, 2.2810,
         0.0000]], grad_fn=<ViewBackward0>)