NeuZephyr
Simple DL Framework
nz::opt::Momentum Class Reference

Momentum optimizer for deep learning models. More...

Inheritance diagram for nz::opt::Momentum:
Collaboration diagram for nz::opt::Momentum:

Public Member Functions

 Momentum (Tensor::value_type learning_rate, Tensor::value_type beta)
 Constructs a Momentum optimizer with a specified learning rate and momentum factor.
 
void step (Node *input) override
 Performs a single optimization step using the Momentum algorithm.
 
- Public Member Functions inherited from nz::opt::Optimizer
 Optimizer ()=default
 Default constructor for the Optimizer class.
 
virtual ~Optimizer ()=default
 Default destructor for the Optimizer class.
 

Detailed Description

Momentum optimizer for deep learning models.

The Momentum class implements the Momentum optimization algorithm, which is a variant of the Stochastic Gradient Descent (SGD). Momentum helps accelerate SGD in the relevant direction and dampens oscillations, improving convergence speed and stability. It achieves this by incorporating a velocity term that accumulates a fraction of the previous gradients, which is used to update the model parameters in the direction of the accumulated gradients.

This class extends the Optimizer base class and provides a concrete implementation of the step method, which updates the model's parameters (represented as Node objects) using the Momentum algorithm.

  • The optimizer maintains a velocity map, which tracks the velocity (accumulated gradients) for each model parameter (Node).
  • The velocity is updated using the formula: [ v_{t+1} = \beta v_t + (1 - \beta) g_t ] where ( v_t ) is the velocity, ( \beta ) is the momentum factor, and ( g_t ) is the current gradient.
  • The updated velocity is then used to adjust the model parameters using a learning rate, similar to SGD.
  • The optimizer uses GPU-accelerated computations through CUDA to efficiently update parameters, making it suitable for large-scale models.
Note
  • The optimizer assumes that the model parameters are represented by Node objects, and each node must have associated gradients.
  • The velocity is stored per Node object, and if a Node does not have an existing velocity, it is initialized to a zero tensor.
  • The optimizer utilizes GPU memory for velocity storage and gradient computation, requiring CUDA support.
  • Ensure that the model parameters have been properly initialized, and gradients are computed before calling this method.

Usage Example:

Momentum optimizer(0.01, 0.9);
graph.update(&optimizer); // Suppose "graph" is a computation graph waiting for gradient updates;
Momentum optimizer for deep learning models.
See also
Optimizer for the base class that defines the interface for all optimizers.
Nodes::Node for the class representing model parameters.
Author
Mgepahmge (https://github.com/Mgepahmge)
Date
2024/12/07

Definition at line 352 of file Optimizer.cuh.

Constructor & Destructor Documentation

◆ Momentum()

nz::opt::Momentum::Momentum ( Tensor::value_type learning_rate,
Tensor::value_type beta )
explicit

Constructs a Momentum optimizer with a specified learning rate and momentum factor.

This constructor initializes a Momentum optimizer with a given learning rate and momentum factor. The learning rate controls the step size in the gradient descent update, while the momentum factor helps accelerate the optimizer by incorporating previous gradients.

Parameters
learning_rateThe learning rate for the optimizer, which determines the step size for parameter updates.
betaThe momentum factor, which controls the influence of previous gradients on the current update. Typically a value between 0.0 and 1.0, where a value closer to 1 means more influence from previous gradients.
Note
  • The learning rate and momentum factor should be chosen based on the specific task and model being trained.
  • The optimizer assumes that the model parameters are represented as Node objects and that these nodes will have gradients available when the step method is called.
Author
Mgepahmge (https://github.com/Mgepahmge)
Date
2024/12/07

Definition at line 21 of file Optimizer.cu.

Member Function Documentation

◆ step()

void nz::opt::Momentum::step ( Node * input)
overridevirtual

Performs a single optimization step using the Momentum algorithm.

The step function updates the model parameters represented by the Node object using the Momentum optimization algorithm. It incorporates both the current gradients and the previous velocity term to update the model parameters. The momentum term helps accelerate the convergence of the optimizer by smoothing out updates and reducing oscillations.

This method performs the following steps:

  • Initializes the velocity vector for the Node if it is not already available. The velocity vector stores the running average of past gradients, scaled by the momentum factor.
  • Allocates memory for temporary variables on the GPU and computes the velocity update using a CUDA kernel.
  • Updates the velocity vector and the model parameters by applying the momentum update and gradient descent.
  • Frees the temporary GPU memory after the update is complete.
Parameters
inputA pointer to the Node object representing the model parameters. This object should have gradients stored in its output attribute, which will be used to update the parameters.
Note
  • The Node object is assumed to have a valid output tensor with its gradients already computed.
  • The velocity map stores the velocity for each Node to ensure the momentum is correctly applied per parameter.
  • The method leverages CUDA to perform parallel computations for efficiency during the optimization process.
  • The optimizer uses the momentum factor (beta) to control the influence of past gradients on the current update.
See also
Momentum for the class definition and constructor.
Author
Mgepahmge (https://github.com/Mgepahmge)
Date
2024/12/07

Implements nz::opt::Optimizer.

Definition at line 26 of file Optimizer.cu.

Here is the call graph for this function:

The documentation for this class was generated from the following files: