geondpt docs

class geondpt.Paraboloid(input_features, output_features, bias=True, device=None, dtype=None, output_factor=0.1, input_factor=0.01, lr_factor=100.0, wd_factor=1.0, init='live', h_factor=0.01, p_factor=0.0001, grad_factor=1.0, init_from_numpy=None)

Bases: Module

Passes the incoming data through a layer of paraboloid neurons.

Args:

input_features

Size of each input sample.

output_features

Size of each output sample.

bias

This is to facilitate ease of replacement of Linear layers with Paraboloid ones, does not do anything. Default: True.

output_factor

Multiplies the output of the module. Default: 0.1.

input_factor

Multiplies the input before passing it through the layer. Default: 0.01.

lr_factor

Multiplies the learning rate applied to the parameters by the optimizer. Default: 100.0.

wd_factor

Multiplies the weight decay applied to the parameters by the optimizer. Default: 1.0.

init

Selects the initialization method for the parameters. Valid options are 'spotlight', 'live', 'linear'. Default: 'live'.

h_factor

Affects the 'spotlight' and 'live' initializations. Multiplies the magnitude of the directrix vector. Default: 0.01.

p_factor

Affects the 'spotlight' and 'live' initializations. Determines the offset of the focus from the data subspace. Default: 0.0001.

grad_factor

Multiplies the outgoing delta signal. Default: 1.0.

init_from_numpy

Initiates the parameter tensor directly from a numpy tensor. Default: None.


Shape:

  • Input: \((*, H_{in})\) where \(*\) means any number of dimensions including none and \(H_{in} = \text{in_features}\).

  • Output: \((*, H_{out})\) where all but the last dimension are the same shape as the input and \(H_{out} = \text{out_features}\).


Example:

>>> import torch
>>> import geondpt as gd
>>> pb = gd.Paraboloid(20, 30)
>>> input = torch.randn(128, 20)
>>> output = pb(input)
>>> print(output.size())
torch.Size([128, 30])
class geondpt.ParaConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, bias=True, padding_mode='constant', device=None, dtype=None, output_factor=1.0, input_factor=1.0, lr_factor=100.0, wd_factor=0.1, skip_input_grad=False, init='spotlight', h_factor=0.01, p_factor=0.0001, grad_factor=1.0, init_from_numpy=None)

Bases: Module

Applies a 2D convolution over an input signal composed of several input planes using the paraboloid neuron computation.

The arguments kernel_size, stride, padding, dilation can either be:

  • a single int – in which case the same value is used for the height and width dimension.

  • a tuple of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension.

This module currently does not support grouping.

Args:
in_channels

Number of channels in the input image.

out_channels

Number of channels produced by the convolution.

kernel_size

Size of the convolving kernel.

stride

Stride of the convolution. Default: 1.

padding

Padding added to all four sides of the input. Default: 0.

dilation

Spacing between kernel elements. Default: 1.

bias

This is to facilitate ease of replacement of Linear layers with Paraboloid ones, does not do anything. Default: True.

padding_mode

Same as torch.nn.functional.pad from PyTorch. Default: 'constant'.

output_factor

Multiplies the output of the module. Default: 0.1.

input_factor

Multiplies the input before passing it through the layer. Default: 1.0.

lr_factor

Multiplies the learning rate applied to the parameters by the optimizer. Default: 100.0.

wd_factor

Multiplies the weight decay applied to the parameters by the optimizer. Default: 0.1.

skip_input_grad

If set to True, it skips the computation of the delta signal, should only be set for the very first layer of the network. Default: False.

init

Selects the initialization method for the parameters. Valid options are 'spotlight', 'linear'. Default: 'spotlight'.

h_factor

Affects the 'spotlight' and 'live' initializations. Multiplies the magnitude of the directrix vector. Default: 0.01.

p_factor

Affects the 'spotlight' and 'live' initializations. Determines the offset of the focus from the data subspace. Default: 0.0001.

grad_factor

Multiplies the outgoing delta signal. Default: 1.0.

init_from_numpy

Initiates the parameter tensor directly from a numpy tensor. Default: None.


Shape:

  • Input: \((N, C_{in}, H_{in}, W_{in})\)

  • Output: \((N, C_{out}, H_{out}, W_{out})\), where

    \[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor\]
    \[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor\]

Example:

>>> pb = gd.ParaConv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
>>> input = torch.randn(20, 16, 50, 100)
>>> output = pb(input)
class geondpt.ParaboloidOutput(input_features, output_features, bias=True, device=None, dtype=None, output_factor=0.1, input_factor=0.5, lr_factor=1.0, wd_factor=1.0, init='spotlight', h_factor=0.01, p_factor=0.0001, grad_factor=1.0, init_from_numpy=None)

Bases: Module

Passes the incoming data through a layer of paraboloid neurons. Same as Paraboloid, but configured to be used as the output layer. Use with weight decay but without momentum.

Args:

input_features

Size of each input sample.

output_features

Size of each output sample.

bias

This is to facilitate ease of replacement of Linear layers with Paraboloid ones, does not do anything. Default: True.

output_factor

Multiplies the output of the module. Default: 0.1.

input_factor

Multiplies the input before passing it through the layer. Default: 0.5.

lr_factor

Multiplies the learning rate applied to the parameters by the optimizer. Default: 1.0.

wd_factor

Multiplies the weight decay applied to the parameters by the optimizer. Default: 1.0.

init

Selects the initialization method for the parameters. Valid options are 'spotlight', 'live', 'linear'. Default: 'spotlight'.

h_factor

Affects the 'spotlight' and 'live' initializations. Multiplies the magnitude of the directrix vector. Default: 0.01.

p_factor

Affects the 'spotlight' and 'live' initializations. Determines the offset of the focus from the data subspace. Default: 0.0001.

grad_factor

Multiplies the outgoing delta signal. Default: 1.0.

init_from_numpy

Initiates the parameter tensor directly from a numpy tensor. Default: None.


Shape:

  • Input: \((*, H_{in})\) where \(*\) means any number of dimensions including none and \(H_{in} = \text{in_features}\).

  • Output: \((*, H_{out})\) where all but the last dimension are the same shape as the input and \(H_{out} = \text{out_features}\).


Example:

>>> import torch
>>> import geondpt as gd
>>> pb = gd.ParaboloidOutput(20, 30)
>>> input = torch.randn(128, 20)
>>> output = pb(input)
>>> print(output.size())
torch.Size([128, 30])
class geondpt.GeoNDSGD(params, lr=0.001, momentum=0, dampening=0, weight_decay=0, nesterov=False, *, maximize=False)

Bases: Optimizer

Implements stochastic gradient descent that properly handles weight decay for models that include paraboloid neurons. Some arguments, such as foreach, are removed, as they seem to not work properly at the moment or are not implemented yet. Otherwise same as torch.optim.SGD.

Args:
params

Iterable of parameters to optimize or dicts defining parameter groups.

lr

Learning rate. Default: ``0.001’’.

momentum

Momentum factor. Default: ``0’’.

dampening

Dampening for momentum. Default: ``0’’.

weight_decay

Weight decay. Default: ``0’’.

nesterov

Enables Nesterov momentum. Default: ``False’’.

maximize

Maximize the objective with respect to the params, instead of minimizing. Default: ``False’’.


Example:

>>> optimizer = gpt.GeoNDSGD(net.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4, nesterov = True)
>>> optimizer.zero_grad()
>>> loss_fn(model(input), target).backward()
>>> optimizer.step()
class geondpt.GeoNDAdam(params: Iterable[Tensor] | Iterable[dict[str, Any]] | Iterable[tuple[str, Tensor]], lr: float | Tensor = 0.001, betas: tuple[float | Tensor, float | Tensor, float | Tensor] = (0.9, 0.999, 0.9), eps: float = 1e-08, weight_decay: float = 0, amsgrad: bool = False, *, maximize: bool = False, capturable: bool = False, differentiable: bool = False, fused: bool | None = None, decoupled_weight_decay: bool = False)

Bases: Optimizer

Implements the Adam optimizer with changes to properly handle paraboloid parameters. Some arguments, such as foreach, are removed, as they seem to not work properly at the moment or are not implemented yet. Otherwise same as torch.optim.Adam.

Args:
params

Iterable of parameters to optimize or dicts defining parameter groups.

lr

Learning rate. Default: ``1e-3’’.

betas

Coefficients used for computing running averages of gradient and its square. The values correspond to (beta1, beta2, paraboloid_beta2). Default: ``(0.9, 0.999, 0.9)’’.

eps

term added to the denominator to improve numerical stability. Default: ``1e-8’’.

weight_decay

Weight decay. Default: ``0’’.

amsgrad

whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond. Default: ``False’’.

maximize

Maximize the objective with respect to the params, instead of minimizing. Default: ``False’’.

class geondpt.GeoNDAdamW(params: Iterable[Tensor] | Iterable[dict[str, Any]] | Iterable[tuple[str, Tensor]], lr: float | Tensor = 0.001, betas: tuple[float | Tensor, float | Tensor] = (0.9, 0.999, 0.9), eps: float = 1e-08, weight_decay: float = 0.01, amsgrad: bool = False, *, maximize: bool = False, capturable: bool = False, differentiable: bool = False, fused: bool | None = None)

Bases: GeoNDAdam

Implements the AdamW optimizer with changes to properly handle paraboloid parameters. Some arguments, such as foreach, are removed, as they seem to not work properly at the moment or are not implemented yet. Otherwise same as torch.optim.Adam.

Args:
params

Iterable of parameters to optimize or dicts defining parameter groups.

lr

Learning rate. Default: ``1e-3’’.

betas

Coefficients used for computing running averages of gradient and its square. The values correspond to (beta1, beta2, paraboloid_beta2). Default: ``(0.9, 0.999, 0.9)’’.

eps

term added to the denominator to improve numerical stability. Default: ``1e-8’’.

weight_decay

Weight decay. Default: ``0’’.

amsgrad

whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond. Default: ``False’’.

maximize

Maximize the objective with respect to the params, instead of minimizing. Default: ``False’’.