NAdam optimizer for deep learning models. More...

Inheritance diagram for nz::opt::NAdam:

Collaboration diagram for nz::opt::NAdam:

Public Member Functions
	NAdam (Tensor::value_type learning_rate, Tensor::value_type beta1, Tensor::value_type beta2)
	Constructs a NAdam optimizer with specified hyperparameters.

void	step (Node *input) override
	Performs a single optimization step using the NAdam algorithm.

Public Member Functions inherited from nz::opt::Optimizer
	Optimizer ()=default
	Default constructor for the Optimizer class.

virtual	~Optimizer ()=default
	Default destructor for the Optimizer class.

Detailed Description

NAdam optimizer for deep learning models.

The NAdam class implements the Nesterov-accelerated Adaptive Moment Estimation (NAdam) optimization algorithm, which combines the benefits of the Adam optimizer with Nesterov momentum. NAdam improves upon Adam by incorporating Nesterov momentum into the first moment estimation, which can lead to faster convergence in some scenarios.

This class extends the Optimizer base class and provides a concrete implementation of the step method, which updates the model's parameters (represented as Node objects) using the NAdam algorithm.

The optimizer maintains three tensors for each parameter (Node):
- ( m_t ): The first moment estimate, which is the exponentially decaying average of past gradients.
- ( m_t' ): The modified first moment estimate, incorporating Nesterov momentum.
- ( v_t ): The second moment estimate, which is the exponentially decaying average of past squared gradients.
The moment estimates are updated using the following formulas: [ m_t = \beta_1 m_{t-1} + (1 - \beta_1) g_t ] [ m_t' = \beta_1 m_t + (1 - \beta_1) g_t ] [ v_t = \beta_2 v_{t-1} + (1 - \beta_2) g_t^2 ] where ( g_t ) is the current gradient, ( \beta_1 ) and ( \beta_2 ) are the decay rates for the first and second moments.
The model parameters are then updated using the bias-corrected moment estimates: [ \hat{m}_t = \frac{m_t}{1 - \beta_1^t}, \quad \hat{m}_t' = \frac{m_t'}{1 - \beta_1^t}, \quad \hat{v}_t = \frac{v_t}{1 - \beta_2^t} ] [ \theta_t = \theta_{t-1} - \eta \frac{\hat{m}_t'}{\sqrt{\hat{v}_t} + \epsilon} ] where ( \eta ) is the learning rate, ( \epsilon ) is a small constant to prevent division by zero.
The optimizer uses GPU-accelerated computations through CUDA to efficiently update parameters, making it suitable for large-scale models.

Note

The optimizer assumes that the model parameters are represented by Node objects, and each node must have associated gradients.
The first moment estimate (m), modified first moment estimate (m_modified), and second moment estimate (v) are stored per Node object. If a Node does not have existing moments, they are initialized to zero tensors.
The optimizer utilizes GPU memory for moment storage and gradient computation, requiring CUDA support.
Ensure that the model parameters have been properly initialized, and gradients are computed before calling this method.

Usage Example:

NAdam optimizer(0.001, 0.9, 0.999);

graph.update(&optimizer); // Suppose "graph" is a computation graph waiting for gradient updates.

nz::opt::NAdam

NAdam optimizer for deep learning models.

Definition Optimizer.cuh:844

See also: Optimizer for the base class that defines the interface for all optimizers.; Nodes::Node for the class representing model parameters.

Author: Mgepahmge (https://github.com/Mgepahmge)

Date: 2024/12/07

Definition at line 844 of file Optimizer.cuh.

Constructor & Destructor Documentation

◆ NAdam()

nz::opt::NAdam::NAdam	(	Tensor::value_type	learning_rate,
		Tensor::value_type	beta1,
		Tensor::value_type	beta2 )

explicit

Constructs a NAdam optimizer with specified hyperparameters.

Initializes the NAdam (Nesterov-accelerated Adaptive Moment Estimation) optimizer with user-defined learning rate and momentum parameters. NAdam combines the benefits of Nesterov accelerated gradient and Adam optimization techniques, providing adaptive learning rates for each parameter while incorporating momentum.

The constructor sets up the initial state of the optimizer, including the learning rate, exponential decay rates for moment estimates, and initializes the iteration counter to zero. This prepares the optimizer for the first optimization step in the training process.

Parameters

learning_rate	The base learning rate that controls the step size during optimization. A smaller value leads to more conservative updates, while a larger value allows for more aggressive parameter adjustments.
beta1	The exponential decay rate for the first moment estimate (moving average of gradients). Typically set close to 1 (e.g., 0.9) to control the influence of past gradients on the current update.
beta2	The exponential decay rate for the second moment estimate (moving average of squared gradients). Typically set close to 1 (e.g., 0.999) to adapt the learning rate for each parameter based on its historical gradient information.

Note

The iteration counter it is initialized to 0, which is critical for the first bias correction step in the NAdam algorithm.
Recommended default values are learning_rate = 0.001, beta1 = 0.9, beta2 = 0.999.
The hyperparameters significantly impact the optimization process and may require tuning based on the specific machine learning task.

See also: Adam, RMSprop Optimization algorithms with similar adaptive learning rate strategies

Author: Mgepahmge (https://github.com/Mgepahmge)

Date: 2024/12/07

Definition at line 104 of file Optimizer.cu.

Member Function Documentation

◆ step()

void nz::opt::NAdam::step ( Node * input )

overridevirtual

Performs a single optimization step using the NAdam algorithm.

This method updates the model parameters for a given input node using the Nesterov-accelerated Adaptive Moment Estimation (NAdam) optimization algorithm. It manages the adaptive learning rates and momentum for individual parameters by maintaining and updating first and second moment estimates.

The method performs several key operations:

Increments the iteration counter
Initializes moment and modified moment tensors if they don't exist for the input node
Prepares CUDA grid and block configurations for parallel parameter updates
Invokes a CUDA kernel to apply the NAdam update rule

The initialization of moment tensors ensures that each parameter has its own adaptive learning rate and momentum, allowing for more flexible and efficient optimization across different model parameters.

Parameters

input A pointer to the Node object representing the model parameter to be updated. The node must have a valid output tensor and its gradient already computed.

Note

This method assumes the input node has a valid gradient stored in its output object.
Moment tensors are created lazily (on-demand) for each unique input node.
The method uses CUDA for parallel computation of parameter updates.
The iteration counter is crucial for bias correction in the NAdam algorithm.

See also: NAdam::NAdam() Constructor for initializing optimizer parameters; krnl::NAdam CUDA kernel implementing the NAdam update rule

Author: Mgepahmge (https://github.com/Mgepahmge)

Date: 2024/12/07

Implements nz::opt::Optimizer.

Definition at line 111 of file Optimizer.cu.

Here is the call graph for this function:

The documentation for this class was generated from the following files:

D:/Users/Mgepahmge/Documents/C Program/NeuZephyr/include/NeuZephyr/Optimizer.cuh
D:/Users/Mgepahmge/Documents/C Program/NeuZephyr/src/Optimizer.cu

Public Member Functions

Detailed Description

Usage Example:

Constructor & Destructor Documentation

◆ NAdam()

Member Function Documentation

◆ step()