NeuZephyr
Simple DL Framework
nz::nodes::calc::HardSwishNode Class Reference

Represents a Hard Swish activation function node in a computational graph. More...

Inheritance diagram for nz::nodes::calc::HardSwishNode:
Collaboration diagram for nz::nodes::calc::HardSwishNode:

Public Member Functions

 HardSwishNode (Node *input, Tensor::value_type alpha=1.0f, Tensor::value_type beta=0.5f)
 Constructor to initialize a HardSwishNode for applying the Hard Swish activation function.
 
void forward () override
 Forward pass for the HardSwishNode to apply the Hard Swish activation function.
 
void backward () override
 Backward pass for the HardSwishNode to compute gradients.
 
- Public Member Functions inherited from nz::nodes::Node
virtual void print (std::ostream &os) const
 Prints the type, data, and gradient of the node.
 
void dataInject (Tensor::value_type *data, bool grad=false) const
 Injects data into a relevant tensor object, optionally setting its gradient requirement.
 
template<typename Iterator >
void dataInject (Iterator begin, Iterator end, const bool grad=false) const
 Injects data from an iterator range into the output tensor of the InputNode, optionally setting its gradient requirement.
 
void dataInject (const std::initializer_list< Tensor::value_type > &data, bool grad=false) const
 Injects data from a std::initializer_list into the output tensor of the Node, optionally setting its gradient requirement.
 

Detailed Description

Represents a Hard Swish activation function node in a computational graph.

The HardSwishNode class applies the Hard Swish activation function to the input tensor. The Hard Swish function is a computationally efficient approximation of the Swish function and is defined as:

HardSwish(x) = x * max(0, min(1, alpha * x + beta))
void HardSwish(dim3 gridDim, dim3 blockDim, float *out, float *in, unsigned long long n, float alpha=0.2f, float beta=0.5f)
Kernel function to apply the Hard Swish activation function on the GPU.

where alpha and beta control the slope and offset of the linear part of the function.

Key features:

  • Forward Pass: Applies the Hard Swish activation function element-wise to the input tensor, blending the input with a clipped linear function.
  • Backward Pass: Computes the gradient of the loss with respect to the input tensor, handling linear and non-linear regions separately.
  • Shape Preservation: The output tensor has the same shape as the input tensor.
  • Gradient Management: Automatically tracks gradients if required by the input tensor.

This class is part of the nz::nodes namespace and is used in models to improve performance while maintaining computational efficiency.

Note
  • The alpha and beta parameters default to 1.0 and 0.5, respectively, but can be customized during construction.
  • Efficient GPU computations are performed for both forward and backward passes.

Usage Example:

// Example: Using HardSwishNode in a computational graph
InputNode input({3, 3}, true); // Create an input node with shape {3, 3}
float data[] = {-1.0f, 0.0f, 1.0f, 2.0f, -2.0f, 3.0f, -3.0f, 4.0f, -4.0f}; // Sample input values
input.output->dataInject(data); // Copy data to the input tensor
HardSwishNode hard_swish_node(&input, 1.0f, 0.5f); // Apply Hard Swish activation
hard_swish_node.forward(); // Perform the forward pass
hard_swish_node.backward(); // Propagate gradients in the backward pass
std::cout << "Output: " << *hard_swish_node.output << std::endl; // Print the result
HardSwishNode(Node *input, Tensor::value_type alpha=1.0f, Tensor::value_type beta=0.5f)
Constructor to initialize a HardSwishNode for applying the Hard Swish activation function.
Definition Nodes.cu:500
See also
forward() for the Hard Swish computation in the forward pass.
backward() for gradient computation in the backward pass.
Author
Mgepahmge (https://github.com/Mgepahmge)
Date
2024/12/05

Definition at line 2954 of file Nodes.cuh.

Constructor & Destructor Documentation

◆ HardSwishNode()

nz::nodes::calc::HardSwishNode::HardSwishNode ( Node * input,
Tensor::value_type alpha = 1.0f,
Tensor::value_type beta = 0.5f )
explicit

Constructor to initialize a HardSwishNode for applying the Hard Swish activation function.

The constructor initializes a HardSwishNode, which applies the Hard Swish activation function to an input tensor. It establishes a connection to the input node, initializes the output tensor, and sets the alpha and beta parameters as well as the node type.

Parameters
inputA pointer to the input node. Its output tensor will have the Hard Swish activation applied.
alphaThe slope parameter for the Hard Swish function. Controls the steepness of the curve.
betaThe offset parameter for the Hard Swish function. Shifts the function horizontally.

The Hard Swish activation function is defined as:

HardSwish(x) = x * max(0, min(1, alpha * x + beta))

Key operations performed by the constructor:

  • Adds the input node to the inputs vector, establishing the connection in the computational graph.
  • Determines if gradient tracking is required based on the input tensor's requiresGrad property.
  • Initializes the output tensor with the same shape as the input tensor and appropriate gradient tracking.
  • Sets the alpha and beta parameters, which control the shape of the Hard Swish function.
  • Sets the node type to "HardSwish" for identification in the computational graph.
Note
  • The Hard Swish function is a smooth approximation of the ReLU activation, combining properties of ReLU and Swish activations.
  • The alpha and beta parameters allow for customization of the activation function's behavior.
  • Gradient tracking for the output tensor is automatically set based on the input tensor's requirements.
See also
forward() for the implementation of the forward pass using these parameters.
backward() for the gradient computation in the backward pass.
Author
Mgepahmge (https://github.com/Mgepahmge)
Date
2024/12/05

Definition at line 500 of file Nodes.cu.

Member Function Documentation

◆ backward()

void nz::nodes::calc::HardSwishNode::backward ( )
overridevirtual

Backward pass for the HardSwishNode to compute gradients.

This method implements the backward pass of the Hard Swish activation function. It computes the gradient of the loss with respect to the input by applying the derivative of the Hard Swish function to the incoming gradient.

The derivative of the Hard Swish function is:

HardSwish'(x) = (2 * alpha * x + beta) * min(max(alpha * x + beta, 0), 1) +
max(0, min(1, alpha * x + beta)) +
x * (alpha * (x > -beta/alpha) * (x < (1-beta)/alpha))

where alpha and beta are the parameters that control the shape of the function.

Key operations:

  • Checks if the input tensor requires gradient computation.
  • If gradients are required:
    • Configures CUDA execution parameters (grid and block dimensions) for parallel processing.
    • Launches a CUDA kernel (HardSwishBackward) to compute gradients on the GPU.
    • Processes all elements of the input tensor in parallel.

CUDA kernel configuration:

  • Block size: 256 threads per block.
  • Grid size: Calculated to ensure coverage of all elements in the input tensor.
Note
  • This method is only executed if the input tensor requires gradient computation.
  • The method assumes that the CUDA kernel HardSwishBackward is defined elsewhere and correctly implements the derivative of the Hard Swish function.
  • The gradient computation leverages GPU parallelism for efficiency, especially for large tensors.
  • The computed gradients are accumulated in the input tensor's gradient buffer.
See also
forward() for the corresponding forward pass implementation.
HardSwishNode constructor for the initialization of alpha and beta parameters.
Author
Mgepahmge (https://github.com/Mgepahmge)
Date
2024/12/5

Implements nz::nodes::Node.

Definition at line 515 of file Nodes.cu.

Here is the call graph for this function:

◆ forward()

void nz::nodes::calc::HardSwishNode::forward ( )
overridevirtual

Forward pass for the HardSwishNode to apply the Hard Swish activation function.

This method implements the forward pass of the Hard Swish activation function. It applies the Hard Swish operation element-wise to the input tensor and stores the result in the output tensor.

The Hard Swish function is defined as:

HardSwish(x) = x * max(0, min(1, alpha * x + beta))

where alpha and beta are parameters that control the shape of the function.

Key operations:

  • Configures CUDA execution parameters (grid and block dimensions) for parallel processing.
  • Launches a CUDA kernel (HardSwish) to perform the Hard Swish computation on the GPU.
  • Processes all elements of the input tensor in parallel.

CUDA kernel configuration:

  • Block size: 256 threads per block.
  • Grid size: Calculated to ensure coverage of all elements in the output tensor.
Note
  • This method assumes that the CUDA kernel HardSwish is defined elsewhere and properly implements the Hard Swish function.
  • The output tensor is assumed to have the same shape as the input tensor.
  • This implementation leverages GPU parallelism for efficient computation, especially for large tensors.
See also
backward() for the corresponding backward pass implementation.
HardSwishNode constructor for the initialization of alpha and beta parameters.
Author
Mgepahmge (https://github.com/Mgepahmge)
Date
2024/12/5

Implements nz::nodes::Node.

Definition at line 509 of file Nodes.cu.

Here is the call graph for this function:

The documentation for this class was generated from the following files: