How to use

Preparation

Select the model architecture, whose performance you want to improve. Use an already trained model as the baseline. Start incorporating the custom ops of the GeoND library into the model architecture. This can mean replacing a Linear layer with a Paraboloid layer, replacing a Conv2d layer with a ParaConv2d layer, or inserting a Paraboloid layer before a Linear layer. Avoid making more than one change at once, as it will be difficult to determine how beneficial each individual change is.

Replacing layers

If the objective is improved performance, use the same number of neurons as the layer that was replaced. If the objective is improved speed, use less neurons, e.g., half of the neurons of the replaced layer. Remember to also make the appropriate reduction to all other layers that receive input from the replaced layer, this is where the increase in speed comes from. Avoid replacing the output layer with a Paraboloid layer. We recommend replacing the first convolutional layer with a ParaConv2d layer.

Adding layers

We recommend trying to add a Paraboloid layer before the output layer first. It is possible that additions to other places will not significantly affect the performance, so leave them for last.

Stability

Since the paraboloid neuron computation involves square exponentiation, it is vitally important that the values of the input do not exceed the range of \([-0.01,0.01]\). Try to determine the maximum absolute value of any possible input, for example maxabs, and set input_scale=0.01/maxabs when using a Paraboloid or ParaConv2d layer. It is also important to make sure that the output of the layer does not include extreme values, so use the factor argument to ensure that the output is reasonable. Failing to keep the magnitude of the numbers in check can result in a diverging error function.

Training

Make sure you are using the GeoNDSGD optimizer. Follow the same training process as the baseline (dataset, input transformations, learning rate scheduling, etc). This should result in either a better performing model, or a faster model with about the same performance as the baseline.

Fine tuning

There are several ways to fine tune a Paraboloid or Conv2d layer, in order to maximize the benefit:

Try training for more epochs.
Use different values for the lr_factor argument.
Use different values for the wd_factor argument.
Use different values for the pfactor argument.
As a last resort, try a different type of initialization.

Evaluation

Compare the performance or speed of the new model with the baseline. If it is better than the baseline, congratulations on your new model. Use this model as the new baseline and repeat the process, replacing another layer or adding a new layer to a different place.