NeuZephyr
Simple DL Framework
nz::opt::RMSprop Class Reference

RMSprop optimizer for deep learning models. More...

Inheritance diagram for nz::opt::RMSprop:
Collaboration diagram for nz::opt::RMSprop:

Public Member Functions

 RMSprop (Tensor::value_type learning_rate, Tensor::value_type decay_rate)
 Constructs an RMSprop optimizer with specified learning rate and decay rate.
 
void step (Node *input) override
 Performs a single optimization step using the RMSprop algorithm.
 
- Public Member Functions inherited from nz::opt::Optimizer
 Optimizer ()=default
 Default constructor for the Optimizer class.
 
virtual ~Optimizer ()=default
 Default destructor for the Optimizer class.
 

Detailed Description

RMSprop optimizer for deep learning models.

The RMSprop class implements the RMSprop (Root Mean Square Propagation) optimization algorithm. RMSprop is designed to address the diminishing learning rate issue of AdaGrad by introducing a moving average of squared gradients. This helps stabilize the learning rate, making it suitable for non-stationary or dynamically changing loss functions.

This class extends the Optimizer base class and provides a concrete implementation of the step method, which updates the model's parameters (represented as Node objects) using the RMSprop algorithm.

  • RMSprop maintains an exponentially decaying average of squared gradients for each parameter.
  • The learning rate is adjusted based on this average, which helps prevent the learning rate from decaying too quickly.
  • The update rule for RMSprop can be expressed as:

    \[
         v_t = \beta v_{t-1} + (1 - \beta) g_t^2
         \]

    \[
         \theta_t = \theta_{t-1} - \frac{\eta}{\sqrt{v_t + \epsilon}} g_t
         \]

    where:
    • ( v_t ) is the moving average of squared gradients.
    • ( \beta ) is the decay rate (usually between 0.9 and 0.99).
    • ( g_t ) is the current gradient.
    • ( \eta ) is the learning rate.
    • ( \epsilon ) is a small value to ensure numerical stability.
  • RMSprop is widely used in training recurrent neural networks (RNNs) and other deep learning models where the loss function can change dynamically.
Note
  • The optimizer assumes that model parameters are represented by Node objects, and these nodes have gradients computed before calling the step method.
  • The v map stores the moving average of squared gradients for each parameter.
  • The epsilon term helps avoid division by zero and ensures numerical stability.

Usage Example:

RMSprop optimizer(0.001, 0.9);
graph.update(&optimizer); // Suppose "graph" is a computation graph waiting for gradient updates.
RMSprop optimizer for deep learning models.
See also
Optimizer for the base class that defines the interface for all optimizers.
Author
Mgepahmge (https://github.com/Mgepahmge)
Date
2024/12/07

Definition at line 577 of file Optimizer.cuh.

Constructor & Destructor Documentation

◆ RMSprop()

nz::opt::RMSprop::RMSprop ( Tensor::value_type learning_rate,
Tensor::value_type decay_rate )
explicit

Constructs an RMSprop optimizer with specified learning rate and decay rate.

The constructor initializes the RMSprop optimizer with the provided learning rate and decay rate. RMSprop is an adaptive learning rate optimization algorithm that maintains a moving average of squared gradients to scale the learning rate for each parameter individually.

This constructor sets the initial values for:

  • learning_rate: The step size used for parameter updates.
  • decay_rate: The factor used to update the moving average of squared gradients, which controls how much the previous gradients influence the current update.
Parameters
learning_rateThe learning rate used in the RMSprop algorithm to scale the gradient updates.
decay_rateThe decay rate (also called momentum term) used to compute the moving average of squared gradients. A higher value gives more weight to previous gradients, while a lower value emphasizes recent gradients.
Note
  • The default value of epsilon (1e-6) is used to avoid division by zero during parameter updates.
  • The decay rate should typically be a value between 0.9 and 0.99, with a default value of 0.9 commonly used in practice.
  • This constructor ensures that the optimizer is properly initialized with the necessary hyperparameters before calling the step method to perform optimization steps.
See also
RMSprop for the full class definition, and step for the optimization step implementation.
Author
Mgepahmge (https://github.com/Mgepahmge)
Date
2024/12/07

Definition at line 62 of file Optimizer.cu.

Member Function Documentation

◆ step()

void nz::opt::RMSprop::step ( Node * input)
overridevirtual

Performs a single optimization step using the RMSprop algorithm.

The step method updates the model parameters based on the gradients computed during the forward pass. It applies the RMSprop optimization algorithm, which uses a moving average of the squared gradients to adjust the learning rate for each parameter. This helps to maintain a stable and adaptive learning rate, preventing the gradient from becoming too large or too small during training.

The method checks if the squared gradient cache (v) for the given input node exists. If not, it initializes it to zero. Then, it applies the RMSprop update rule using the current gradient, the moving average of squared gradients, and the specified learning rate and decay rate.

This method is designed to be used with a model parameter represented as a Node object and assumes that the node has an associated output and gradient.

Parameters
inputA pointer to the Node object representing the model parameter to be updated. The node should have an output tensor and its gradient already computed.
Note
  • This method operates on the GPU using CUDA to accelerate the parameter update process.
  • It assumes that the input node has a valid gradient stored in its output object.
  • The squared gradient cache (v) is maintained for each node individually.
See also
RMSprop for the class definition and constructor.
Author
Mgepahmge (https://github.com/Mgepahmge)
Date
2024/12/07

Implements nz::opt::Optimizer.

Definition at line 67 of file Optimizer.cu.

Here is the call graph for this function:

The documentation for this class was generated from the following files: