![]() |
NeuZephyr
Simple DL Framework
|
NAdam optimizer for deep learning models. More...
Public Member Functions | |
NAdam (Tensor::value_type learning_rate, Tensor::value_type beta1, Tensor::value_type beta2) | |
Constructs a NAdam optimizer with specified hyperparameters. | |
void | step (Node *input) override |
Performs a single optimization step using the NAdam algorithm. | |
![]() | |
Optimizer ()=default | |
Default constructor for the Optimizer class. | |
virtual | ~Optimizer ()=default |
Default destructor for the Optimizer class. | |
NAdam optimizer for deep learning models.
The NAdam
class implements the Nesterov-accelerated Adaptive Moment Estimation (NAdam) optimization algorithm, which combines the benefits of the Adam optimizer with Nesterov momentum. NAdam improves upon Adam by incorporating Nesterov momentum into the first moment estimation, which can lead to faster convergence in some scenarios.
This class extends the Optimizer
base class and provides a concrete implementation of the step
method, which updates the model's parameters (represented as Node
objects) using the NAdam algorithm.
Node
):Node
objects, and each node must have associated gradients.m
), modified first moment estimate (m_modified
), and second moment estimate (v
) are stored per Node
object. If a Node
does not have existing moments, they are initialized to zero tensors.Definition at line 844 of file Optimizer.cuh.
|
explicit |
Constructs a NAdam optimizer with specified hyperparameters.
Initializes the NAdam (Nesterov-accelerated Adaptive Moment Estimation) optimizer with user-defined learning rate and momentum parameters. NAdam combines the benefits of Nesterov accelerated gradient and Adam optimization techniques, providing adaptive learning rates for each parameter while incorporating momentum.
The constructor sets up the initial state of the optimizer, including the learning rate, exponential decay rates for moment estimates, and initializes the iteration counter to zero. This prepares the optimizer for the first optimization step in the training process.
learning_rate | The base learning rate that controls the step size during optimization. A smaller value leads to more conservative updates, while a larger value allows for more aggressive parameter adjustments. |
beta1 | The exponential decay rate for the first moment estimate (moving average of gradients). Typically set close to 1 (e.g., 0.9) to control the influence of past gradients on the current update. |
beta2 | The exponential decay rate for the second moment estimate (moving average of squared gradients). Typically set close to 1 (e.g., 0.999) to adapt the learning rate for each parameter based on its historical gradient information. |
it
is initialized to 0, which is critical for the first bias correction step in the NAdam algorithm.Definition at line 104 of file Optimizer.cu.
|
overridevirtual |
Performs a single optimization step using the NAdam algorithm.
This method updates the model parameters for a given input node using the Nesterov-accelerated Adaptive Moment Estimation (NAdam) optimization algorithm. It manages the adaptive learning rates and momentum for individual parameters by maintaining and updating first and second moment estimates.
The method performs several key operations:
The initialization of moment tensors ensures that each parameter has its own adaptive learning rate and momentum, allowing for more flexible and efficient optimization across different model parameters.
input | A pointer to the Node object representing the model parameter to be updated. The node must have a valid output tensor and its gradient already computed. |
Implements nz::opt::Optimizer.
Definition at line 111 of file Optimizer.cu.