How to use
Preparation
Select the model architecture, whose performance you want to improve. Use an already trained model as the baseline. Start incorporating the custom ops of the GeoND library into the model architecture. This can mean replacing a Linear
layer with a Paraboloid
layer, replacing a Conv2d
layer with a ParaConv2d
layer, or inserting a Paraboloid
layer before a Linear
layer. Avoid making more than one change at once, as it will be difficult to determine how beneficial each individual change is.
Replacing layers
If the objective is improved performance, use the same number of neurons as the layer that was replaced. If the objective is improved speed, use less neurons, e.g., half of the neurons of the replaced layer. Remember to also make the appropriate reduction to all other layers that receive input from the replaced layer, this is where the increase in speed comes from. Avoid replacing the output layer with a Paraboloid
layer. We recommend replacing the first convolutional layer with a ParaConv2d
layer.
Adding layers
We recommend trying to add a Paraboloid
layer before the output layer first. It is possible that additions to other places will not significantly affect the performance, so leave them for last.
Stability
Since the paraboloid neuron computation involves square exponentiation, it is vitally important that the values of the input do not exceed the range of \([-0.01,0.01]\). Try to determine the maximum absolute value of any possible input, for example maxabs
, and set input_scale=0.01/maxabs
when using a Paraboloid
or ParaConv2d
layer. It is also important to make sure that the output of the layer does not include extreme values, so use the factor
argument to ensure that the output is reasonable. Failing to keep the magnitude of the numbers in check can result in a diverging error function.
Training
Make sure you are using the GeoNDSGD
optimizer. Follow the same training process as the baseline (dataset, input transformations, learning rate scheduling, etc). This should result in either a better performing model, or a faster model with about the same performance as the baseline.
Fine tuning
There are several ways to fine tune a Paraboloid
or Conv2d
layer, in order to maximize the benefit:
Try training for more epochs.
Use different values for the
lr_factor
argument.Use different values for the
wd_factor
argument.Use different values for the
pfactor
argument.As a last resort, try a different type of initialization.
Evaluation
Compare the performance or speed of the new model with the baseline. If it is better than the baseline, congratulations on your new model. Use this model as the new baseline and repeat the process, replacing another layer or adding a new layer to a different place.