How to use
Preparation
Select the model architecture, whose performance you want to improve and the best training recipe for that architecture. Use an already trained model as the baseline. Start incorporating the custom ops of the GeoND library into the model architecture.
ParaboloidOutput
Replace a Linear output layer with ParaboloidOutput. Find the largest value for the input_factor argument that does not result in any nan. If using an SGD optimizer (GeoNDSGD), make sure to try setting momentum = 0.0 and nesterov = False, as doing so counter-intuitively converges faster and performs better. If the model overfits, try reducing the value for input_factor. Also see Fine tuning below.
Paraboloid
Replace a Linear layer with Paraboloid OR insert a Paraboloid layer before a Linear layer. Find proper values for the input_factor and output_factor arguments that do not result in any nan. If using an SGD optimizer (GeoNDSGD) AND the inserted layer is before a Linear output layer, make sure to try setting momentum = 0.0 and nesterov = False, as doing so counter-intuitively converges faster and performs better. Also see Fine tuning below.
ParaConv2d
Replace a Conv2d layer with ParaConv2d. Find proper values for the input_factor and output_factor arguments that do not result in any nan. Try using fewer convolution kernels (remember to also make appropriate changes to the next layer). Also see Fine tuning below.
Training
Make sure you are using the corresponing optimizer provided by the library (GeoNDSGD, GeoNDAdam, GeoNDAdamW). Follow the same training process as the baseline (dataset, input transformations, etc). Use a Cosine annealing learning rate schedule and train for at least 200 (300 recommended) epochs. This should result in either a better performing model, or a faster model with about the same performance as the baseline.
Fine tuning
There are several ways to fine tune a ParaboloidOutput, Paraboloid or ParaConv2d layer, in order to maximize the benefit:
Try training for more epochs.
Use different values for the
input_factorargument.Use different values for the
lr_factorargument.Use different values for the
wd_factorargument.
Evaluation
Compare the performance or speed of the new model with the baseline. If it is better than the baseline, congratulations on your new model. Use this model as the new baseline and repeat the process, replacing another layer or adding a new layer in a different place.