NeuZephyr
Simple DL Framework
nz::opt::SGD Class Reference

Stochastic Gradient Descent (SGD) optimizer for deep learning models. More...

Inheritance diagram for nz::opt::SGD:
Collaboration diagram for nz::opt::SGD:

Public Member Functions

 SGD (Tensor::value_type learning_rate)
 Constructor for the SGD optimizer.
 
void step (Node *input) override
 Performs a single step of the Stochastic Gradient Descent (SGD) optimization.
 
- Public Member Functions inherited from nz::opt::Optimizer
 Optimizer ()=default
 Default constructor for the Optimizer class.
 
virtual ~Optimizer ()=default
 Default destructor for the Optimizer class.
 

Detailed Description

Stochastic Gradient Descent (SGD) optimizer for deep learning models.

The SGD class implements the Stochastic Gradient Descent optimization algorithm, which is one of the most basic and widely-used methods for optimizing deep learning model parameters. The algorithm updates the model's parameters by moving in the direction of the negative gradient scaled by a learning rate.

This class extends the Optimizer base class and provides a concrete implementation of the step method, which updates the parameters of the model (represented as Node objects) using the SGD algorithm.

  • The primary function of this optimizer is to adjust model parameters based on the gradients and a fixed learning rate. It performs updates to minimize the loss function during training.
  • The optimizer uses parallel processing on the GPU through CUDA to accelerate the parameter update process, making it suitable for training large models with many parameters.
  • While simple, SGD is effective for many machine learning tasks and serves as a foundation for more advanced optimizers such as Adam and RMSprop.
  • This optimizer works by updating the weights in the direction that reduces the loss, with the magnitude of the update controlled by the learning rate.
Note
  • The optimizer assumes that the model parameters are represented by Node objects, and these nodes must have associated gradients for the optimizer to function correctly.
  • It is specifically designed to work with deep learning frameworks that leverage GPU acceleration for efficient computation.

Usage Example:

SGD optimizer(0.01);
graph.update(&optimizer) // Suppose "graph" is a computation graph waiting for gradient updates;
Stochastic Gradient Descent (SGD) optimizer for deep learning models.
See also
Optimizer for the base class that defines the interface for all optimizers.
Author
Mgepahmge (https://github.com/Mgepahmge)
Date
2024/12/07

Definition at line 250 of file Optimizer.cuh.

Constructor & Destructor Documentation

◆ SGD()

nz::opt::SGD::SGD ( Tensor::value_type learning_rate)
explicit

Constructor for the SGD optimizer.

This constructor initializes the SGD optimizer with a specified learning rate. The learning rate is a crucial hyperparameter that determines the step size for each parameter update during training. A smaller learning rate leads to smaller updates, while a larger learning rate results in faster convergence but may risk overshooting the optimal solution.

Parameters
learning_rateThe learning rate to be used in the optimization process. It defines the magnitude of the updates to the model parameters.
Note
  • The learning rate should be chosen carefully, as it significantly impacts the model's convergence during training. A value that is too large may cause the optimization to diverge, while a value that is too small may lead to slow convergence.
See also
SGD for the optimizer class that uses this constructor.
Author
Mgepahmge (https://github.com/Mgepahmge)
Date
2024/12/07

Definition at line 10 of file Optimizer.cu.

Member Function Documentation

◆ step()

void nz::opt::SGD::step ( Node * input)
overridevirtual

Performs a single step of the Stochastic Gradient Descent (SGD) optimization.

This method updates the model parameters (represented by Node objects) using the Stochastic Gradient Descent algorithm. The parameters are updated based on the gradients computed during the backward pass, and the updates are scaled by the learning rate. The method uses CUDA to parallelize the parameter updates on the GPU, ensuring high performance for large-scale models.

The update process involves computing the negative gradient and scaling it by the learning rate to adjust the model parameters. This method is intended to be called during the training loop to update the parameters at each iteration.

Parameters
inputThe Node object that holds the model parameters and their gradients. This node must have a valid gradient computed during the backward pass.
Note
  • The method assumes that the input node contains a valid output tensor with computed gradients.
  • The computation is performed on the GPU using CUDA, so a CUDA-compatible environment is required.
  • Ensure that the model parameters have been properly initialized and gradients are computed before calling this method.
See also
SGD for the class that defines this method.
Nodes::Node for the class representing the model parameters.
Author
Mgepahmge (https://github.com/Mgepahmge)
Date
2024/12/07

Implements nz::opt::Optimizer.

Definition at line 14 of file Optimizer.cu.

Here is the call graph for this function:

The documentation for this class was generated from the following files: