geondpt docs
- class geondpt.Paraboloid(input_features, output_features, bias=True, device=None, dtype=None, output_factor=0.1, input_factor=0.01, lr_factor=100.0, wd_factor=1.0, init='live', h_factor=0.01, p_factor=0.0001, grad_factor=1.0, init_from_numpy=None)
Bases:
ModulePasses the incoming data through a layer of paraboloid neurons.
Args:
- input_features
Size of each input sample.
- output_features
Size of each output sample.
- bias
This is to facilitate ease of replacement of Linear layers with Paraboloid ones, does not do anything. Default:
True.- output_factor
Multiplies the output of the module. Default:
0.1.- input_factor
Multiplies the input before passing it through the layer. Default:
0.01.- lr_factor
Multiplies the learning rate applied to the parameters by the optimizer. Default:
100.0.- wd_factor
Multiplies the weight decay applied to the parameters by the optimizer. Default:
1.0.- init
Selects the initialization method for the parameters. Valid options are
'spotlight','live','linear'. Default:'live'.- h_factor
Affects the
'spotlight'and'live'initializations. Multiplies the magnitude of the directrix vector. Default:0.01.- p_factor
Affects the
'spotlight'and'live'initializations. Determines the offset of the focus from the data subspace. Default:0.0001.- grad_factor
Multiplies the outgoing delta signal. Default:
1.0.- init_from_numpy
Initiates the parameter tensor directly from a numpy tensor. Default:
None.
Shape:
Input: \((*, H_{in})\) where \(*\) means any number of dimensions including none and \(H_{in} = \text{in_features}\).
Output: \((*, H_{out})\) where all but the last dimension are the same shape as the input and \(H_{out} = \text{out_features}\).
Example:
>>> import torch >>> import geondpt as gd >>> pb = gd.Paraboloid(20, 30) >>> input = torch.randn(128, 20) >>> output = pb(input) >>> print(output.size()) torch.Size([128, 30])
- class geondpt.ParaConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, bias=True, padding_mode='constant', device=None, dtype=None, output_factor=1.0, input_factor=1.0, lr_factor=100.0, wd_factor=0.1, skip_input_grad=False, init='spotlight', h_factor=0.01, p_factor=0.0001, grad_factor=1.0, init_from_numpy=None)
Bases:
ModuleApplies a 2D convolution over an input signal composed of several input planes using the paraboloid neuron computation.
The arguments
kernel_size,stride,padding,dilationcan either be:a single
int– in which case the same value is used for the height and width dimension.a
tupleof two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension.
This module currently does not support grouping.
- Args:
- in_channels
Number of channels in the input image.
- out_channels
Number of channels produced by the convolution.
- kernel_size
Size of the convolving kernel.
- stride
Stride of the convolution. Default:
1.- padding
Padding added to all four sides of the input. Default:
0.- dilation
Spacing between kernel elements. Default:
1.- bias
This is to facilitate ease of replacement of Linear layers with Paraboloid ones, does not do anything. Default:
True.- padding_mode
Same as torch.nn.functional.pad from PyTorch. Default:
'constant'.- output_factor
Multiplies the output of the module. Default:
0.1.- input_factor
Multiplies the input before passing it through the layer. Default:
1.0.- lr_factor
Multiplies the learning rate applied to the parameters by the optimizer. Default:
100.0.- wd_factor
Multiplies the weight decay applied to the parameters by the optimizer. Default:
0.1.- skip_input_grad
If set to
True, it skips the computation of the delta signal, should only be set for the very first layer of the network. Default:False.- init
Selects the initialization method for the parameters. Valid options are
'spotlight','linear'. Default:'spotlight'.- h_factor
Affects the
'spotlight'and'live'initializations. Multiplies the magnitude of the directrix vector. Default:0.01.- p_factor
Affects the
'spotlight'and'live'initializations. Determines the offset of the focus from the data subspace. Default:0.0001.- grad_factor
Multiplies the outgoing delta signal. Default:
1.0.- init_from_numpy
Initiates the parameter tensor directly from a numpy tensor. Default:
None.
Shape:
Input: \((N, C_{in}, H_{in}, W_{in})\)
Output: \((N, C_{out}, H_{out}, W_{out})\), where
\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor\]\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor\]
Example:
>>> pb = gd.ParaConv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1)) >>> input = torch.randn(20, 16, 50, 100) >>> output = pb(input)
- class geondpt.ParaboloidOutput(input_features, output_features, bias=True, device=None, dtype=None, output_factor=0.1, input_factor=0.5, lr_factor=1.0, wd_factor=1.0, init='spotlight', h_factor=0.01, p_factor=0.0001, grad_factor=1.0, init_from_numpy=None)
Bases:
ModulePasses the incoming data through a layer of paraboloid neurons. Same as Paraboloid, but configured to be used as the output layer. Use with weight decay but without momentum.
Args:
- input_features
Size of each input sample.
- output_features
Size of each output sample.
- bias
This is to facilitate ease of replacement of Linear layers with Paraboloid ones, does not do anything. Default:
True.- output_factor
Multiplies the output of the module. Default:
0.1.- input_factor
Multiplies the input before passing it through the layer. Default:
0.5.- lr_factor
Multiplies the learning rate applied to the parameters by the optimizer. Default:
1.0.- wd_factor
Multiplies the weight decay applied to the parameters by the optimizer. Default:
1.0.- init
Selects the initialization method for the parameters. Valid options are
'spotlight','live','linear'. Default:'spotlight'.- h_factor
Affects the
'spotlight'and'live'initializations. Multiplies the magnitude of the directrix vector. Default:0.01.- p_factor
Affects the
'spotlight'and'live'initializations. Determines the offset of the focus from the data subspace. Default:0.0001.- grad_factor
Multiplies the outgoing delta signal. Default:
1.0.- init_from_numpy
Initiates the parameter tensor directly from a numpy tensor. Default:
None.
Shape:
Input: \((*, H_{in})\) where \(*\) means any number of dimensions including none and \(H_{in} = \text{in_features}\).
Output: \((*, H_{out})\) where all but the last dimension are the same shape as the input and \(H_{out} = \text{out_features}\).
Example:
>>> import torch >>> import geondpt as gd >>> pb = gd.ParaboloidOutput(20, 30) >>> input = torch.randn(128, 20) >>> output = pb(input) >>> print(output.size()) torch.Size([128, 30])
- class geondpt.GeoNDSGD(params, lr=0.001, momentum=0, dampening=0, weight_decay=0, nesterov=False, *, maximize=False)
Bases:
OptimizerImplements stochastic gradient descent that properly handles weight decay for models that include paraboloid neurons. Some arguments, such as foreach, are removed, as they seem to not work properly at the moment or are not implemented yet. Otherwise same as torch.optim.SGD.
- Args:
- params
Iterable of parameters to optimize or dicts defining parameter groups.
- lr
Learning rate. Default: ``0.001’’.
- momentum
Momentum factor. Default: ``0’’.
- dampening
Dampening for momentum. Default: ``0’’.
- weight_decay
Weight decay. Default: ``0’’.
- nesterov
Enables Nesterov momentum. Default: ``False’’.
- maximize
Maximize the objective with respect to the params, instead of minimizing. Default: ``False’’.
Example:
>>> optimizer = gpt.GeoNDSGD(net.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4, nesterov = True) >>> optimizer.zero_grad() >>> loss_fn(model(input), target).backward() >>> optimizer.step()
- class geondpt.GeoNDAdam(params: Iterable[Tensor] | Iterable[dict[str, Any]] | Iterable[tuple[str, Tensor]], lr: float | Tensor = 0.001, betas: tuple[float | Tensor, float | Tensor, float | Tensor] = (0.9, 0.999, 0.9), eps: float = 1e-08, weight_decay: float = 0, amsgrad: bool = False, *, maximize: bool = False, capturable: bool = False, differentiable: bool = False, fused: bool | None = None, decoupled_weight_decay: bool = False)
Bases:
OptimizerImplements the Adam optimizer with changes to properly handle paraboloid parameters. Some arguments, such as foreach, are removed, as they seem to not work properly at the moment or are not implemented yet. Otherwise same as torch.optim.Adam.
- Args:
- params
Iterable of parameters to optimize or dicts defining parameter groups.
- lr
Learning rate. Default: ``1e-3’’.
- betas
Coefficients used for computing running averages of gradient and its square. The values correspond to (beta1, beta2, paraboloid_beta2). Default: ``(0.9, 0.999, 0.9)’’.
- eps
term added to the denominator to improve numerical stability. Default: ``1e-8’’.
- weight_decay
Weight decay. Default: ``0’’.
- amsgrad
whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond. Default: ``False’’.
- maximize
Maximize the objective with respect to the params, instead of minimizing. Default: ``False’’.
- class geondpt.GeoNDAdamW(params: Iterable[Tensor] | Iterable[dict[str, Any]] | Iterable[tuple[str, Tensor]], lr: float | Tensor = 0.001, betas: tuple[float | Tensor, float | Tensor] = (0.9, 0.999, 0.9), eps: float = 1e-08, weight_decay: float = 0.01, amsgrad: bool = False, *, maximize: bool = False, capturable: bool = False, differentiable: bool = False, fused: bool | None = None)
Bases:
GeoNDAdamImplements the AdamW optimizer with changes to properly handle paraboloid parameters. Some arguments, such as foreach, are removed, as they seem to not work properly at the moment or are not implemented yet. Otherwise same as torch.optim.Adam.
- Args:
- params
Iterable of parameters to optimize or dicts defining parameter groups.
- lr
Learning rate. Default: ``1e-3’’.
- betas
Coefficients used for computing running averages of gradient and its square. The values correspond to (beta1, beta2, paraboloid_beta2). Default: ``(0.9, 0.999, 0.9)’’.
- eps
term added to the denominator to improve numerical stability. Default: ``1e-8’’.
- weight_decay
Weight decay. Default: ``0’’.
- amsgrad
whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond. Default: ``False’’.
- maximize
Maximize the objective with respect to the params, instead of minimizing. Default: ``False’’.