geondpt docs

class geondpt.Paraboloid(input_features, output_features, bias=True, device=None, dtype=None, output_factor=0.1, input_factor=0.01, lr_factor=100.0, wd_factor=1.0, init='live', h_factor=0.01, p_factor=0.0001, grad_factor=1.0, init_from_numpy=None)

Bases: Module

Passes the incoming data through a layer of paraboloid neurons.

Args:

input_features
Size of each input sample.

output_features
Size of each output sample.

bias
This is to facilitate ease of replacement of Linear layers with Paraboloid ones, does not do anything. Default: True.

output_factor
Multiplies the output of the module. Default: 0.1.

input_factor
Multiplies the input before passing it through the layer. Default: 0.01.

lr_factor
Multiplies the learning rate applied to the parameters by the optimizer. Default: 100.0.

wd_factor
Multiplies the weight decay applied to the parameters by the optimizer. Default: 1.0.

init
Selects the initialization method for the parameters. Valid options are 'spotlight', 'live', 'linear'. Default: 'live'.

h_factor
Affects the 'spotlight' and 'live' initializations. Multiplies the magnitude of the directrix vector. Default: 0.01.

p_factor
Affects the 'spotlight' and 'live' initializations. Determines the offset of the focus from the data subspace. Default: 0.0001.

grad_factor
Multiplies the outgoing delta signal. Default: 1.0.

init_from_numpy
Initiates the parameter tensor directly from a numpy tensor. Default: None.

Shape:

Input: \((*, H_{in})\) where \(*\) means any number of dimensions including none and \(H_{in} = \text{in_features}\).

Output: \((*, H_{out})\) where all but the last dimension are the same shape as the input and \(H_{out} = \text{out_features}\).

Example:

>>> import torch
>>> import geondpt as gd
>>> pb = gd.Paraboloid(20, 30)
>>> input = torch.randn(128, 20)
>>> output = pb(input)
>>> print(output.size())
torch.Size([128, 30])

class geondpt.ParaConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, bias=True, padding_mode='constant', device=None, dtype=None, output_factor=1.0, input_factor=1.0, lr_factor=100.0, wd_factor=0.1, skip_input_grad=False, init='spotlight', h_factor=0.01, p_factor=0.0001, grad_factor=1.0, init_from_numpy=None)

Bases: Module

Applies a 2D convolution over an input signal composed of several input planes using the paraboloid neuron computation.

The arguments kernel_size, stride, padding, dilation can either be:

a single int – in which case the same value is used for the height and width dimension.

a tuple of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension.

This module currently does not support grouping.

Args:

in_channels
Number of channels in the input image.

out_channels
Number of channels produced by the convolution.

kernel_size
Size of the convolving kernel.

stride
Stride of the convolution. Default: 1.

padding
Padding added to all four sides of the input. Default: 0.

dilation
Spacing between kernel elements. Default: 1.

bias
This is to facilitate ease of replacement of Linear layers with Paraboloid ones, does not do anything. Default: True.

padding_mode
Same as torch.nn.functional.pad from PyTorch. Default: 'constant'.

output_factor
Multiplies the output of the module. Default: 0.1.

input_factor
Multiplies the input before passing it through the layer. Default: 1.0.

lr_factor
Multiplies the learning rate applied to the parameters by the optimizer. Default: 100.0.

wd_factor
Multiplies the weight decay applied to the parameters by the optimizer. Default: 0.1.

skip_input_grad
If set to True, it skips the computation of the delta signal, should only be set for the very first layer of the network. Default: False.

init
Selects the initialization method for the parameters. Valid options are 'spotlight', 'linear'. Default: 'spotlight'.

h_factor
Affects the 'spotlight' and 'live' initializations. Multiplies the magnitude of the directrix vector. Default: 0.01.

p_factor
Affects the 'spotlight' and 'live' initializations. Determines the offset of the focus from the data subspace. Default: 0.0001.

grad_factor
Multiplies the outgoing delta signal. Default: 1.0.

init_from_numpy
Initiates the parameter tensor directly from a numpy tensor. Default: None.

Shape:

Input: \((N, C_{in}, H_{in}, W_{in})\)

Output: \((N, C_{out}, H_{out}, W_{out})\), where

\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor\]

\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor\]

Example:

>>> pb = gd.ParaConv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
>>> input = torch.randn(20, 16, 50, 100)
>>> output = pb(input)

class geondpt.ParaboloidOutput(input_features, output_features, bias=True, device=None, dtype=None, output_factor=0.1, input_factor=0.5, lr_factor=1.0, wd_factor=1.0, init='spotlight', h_factor=0.01, p_factor=0.0001, grad_factor=1.0, init_from_numpy=None)

Bases: Module

Passes the incoming data through a layer of paraboloid neurons. Same as Paraboloid, but configured to be used as the output layer. Use with weight decay but without momentum.

Args:

input_features
Size of each input sample.

output_features
Size of each output sample.

bias
This is to facilitate ease of replacement of Linear layers with Paraboloid ones, does not do anything. Default: True.

output_factor
Multiplies the output of the module. Default: 0.1.

input_factor
Multiplies the input before passing it through the layer. Default: 0.5.

lr_factor
Multiplies the learning rate applied to the parameters by the optimizer. Default: 1.0.

wd_factor
Multiplies the weight decay applied to the parameters by the optimizer. Default: 1.0.

init
Selects the initialization method for the parameters. Valid options are 'spotlight', 'live', 'linear'. Default: 'spotlight'.

h_factor
Affects the 'spotlight' and 'live' initializations. Multiplies the magnitude of the directrix vector. Default: 0.01.

p_factor
Affects the 'spotlight' and 'live' initializations. Determines the offset of the focus from the data subspace. Default: 0.0001.

grad_factor
Multiplies the outgoing delta signal. Default: 1.0.

init_from_numpy
Initiates the parameter tensor directly from a numpy tensor. Default: None.

Shape:

Input: \((*, H_{in})\) where \(*\) means any number of dimensions including none and \(H_{in} = \text{in_features}\).

Output: \((*, H_{out})\) where all but the last dimension are the same shape as the input and \(H_{out} = \text{out_features}\).

Example:

>>> import torch
>>> import geondpt as gd
>>> pb = gd.ParaboloidOutput(20, 30)
>>> input = torch.randn(128, 20)
>>> output = pb(input)
>>> print(output.size())
torch.Size([128, 30])

class geondpt.GeoNDSGD(params, lr=0.001, momentum=0, dampening=0, weight_decay=0, nesterov=False, *, maximize=False)

Bases: Optimizer

Implements stochastic gradient descent that properly handles weight decay for models that include paraboloid neurons. Some arguments, such as foreach, are removed, as they seem to not work properly at the moment or are not implemented yet. Otherwise same as torch.optim.SGD.

Args:

params
Iterable of parameters to optimize or dicts defining parameter groups.

lr
Learning rate. Default: ``0.001’’.

momentum
Momentum factor. Default: ``0’’.

dampening
Dampening for momentum. Default: ``0’’.

weight_decay
Weight decay. Default: ``0’’.

nesterov
Enables Nesterov momentum. Default: ``False’’.

maximize
Maximize the objective with respect to the params, instead of minimizing. Default: ``False’’.

Example:

>>> optimizer = gpt.GeoNDSGD(net.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4, nesterov = True)
>>> optimizer.zero_grad()
>>> loss_fn(model(input), target).backward()
>>> optimizer.step()

class geondpt.GeoNDAdam(params: Iterable[Tensor] | Iterable[dict[str, Any]] | Iterable[tuple[str, Tensor]], lr: float | Tensor = 0.001, betas: tuple[float | Tensor, float | Tensor, float | Tensor] = (0.9, 0.999, 0.9), eps: float = 1e-08, weight_decay: float = 0, amsgrad: bool = False, *, maximize: bool = False, capturable: bool = False, differentiable: bool = False, fused: bool | None = None, decoupled_weight_decay: bool = False)

Bases: Optimizer

Implements the Adam optimizer with changes to properly handle paraboloid parameters. Some arguments, such as foreach, are removed, as they seem to not work properly at the moment or are not implemented yet. Otherwise same as torch.optim.Adam.

Args:

params: Iterable of parameters to optimize or dicts defining parameter groups.
lr: Learning rate. Default: ``1e-3’’.
betas: Coefficients used for computing running averages of gradient and its square. The values correspond to (beta1, beta2, paraboloid_beta2). Default: ``(0.9, 0.999, 0.9)’’.
eps: term added to the denominator to improve numerical stability. Default: ``1e-8’’.
weight_decay: Weight decay. Default: ``0’’.
amsgrad: whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond. Default: ``False’’.
maximize: Maximize the objective with respect to the params, instead of minimizing. Default: ``False’’.

class geondpt.GeoNDAdamW(params: Iterable[Tensor] | Iterable[dict[str, Any]] | Iterable[tuple[str, Tensor]], lr: float | Tensor = 0.001, betas: tuple[float | Tensor, float | Tensor] = (0.9, 0.999, 0.9), eps: float = 1e-08, weight_decay: float = 0.01, amsgrad: bool = False, *, maximize: bool = False, capturable: bool = False, differentiable: bool = False, fused: bool | None = None)

Bases: GeoNDAdam

Implements the AdamW optimizer with changes to properly handle paraboloid parameters. Some arguments, such as foreach, are removed, as they seem to not work properly at the moment or are not implemented yet. Otherwise same as torch.optim.Adam.

Args:

params: Iterable of parameters to optimize or dicts defining parameter groups.
lr: Learning rate. Default: ``1e-3’’.
betas: Coefficients used for computing running averages of gradient and its square. The values correspond to (beta1, beta2, paraboloid_beta2). Default: ``(0.9, 0.999, 0.9)’’.
eps: term added to the denominator to improve numerical stability. Default: ``1e-8’’.
weight_decay: Weight decay. Default: ``0’’.
amsgrad: whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond. Default: ``False’’.
maximize: Maximize the objective with respect to the params, instead of minimizing. Default: ``False’’.