![]() |
NeuZephyr
Simple DL Framework
|
Represents a matrix multiplication operation node in a computational graph. More...
Public Member Functions | |
MatMulNode (Node *input_left, Node *input_right) | |
Constructor to initialize a MatMulNode for matrix multiplication. | |
void | forward () override |
Forward pass for the MatMulNode to perform matrix multiplication. | |
void | backward () override |
Backward pass for the MatMulNode to propagate gradients. | |
![]() | |
virtual void | print (std::ostream &os) const |
Prints the type, data, and gradient of the node. | |
void | dataInject (Tensor::value_type *data, bool grad=false) const |
Injects data into a relevant tensor object, optionally setting its gradient requirement. | |
template<typename Iterator > | |
void | dataInject (Iterator begin, Iterator end, const bool grad=false) const |
Injects data from an iterator range into the output tensor of the InputNode, optionally setting its gradient requirement. | |
void | dataInject (const std::initializer_list< Tensor::value_type > &data, bool grad=false) const |
Injects data from a std::initializer_list into the output tensor of the Node, optionally setting its gradient requirement. | |
Represents a matrix multiplication operation node in a computational graph.
The MatMulNode
class performs matrix multiplication between two input tensors. It implements the matrix multiplication operation in the forward pass, and propagates the gradients during the backward pass. This node is typically used to represent fully connected layers or other linear algebraic operations in a neural network or computational graph. The node now leverages Tensor Cores for efficient half-precision matrix multiplication, improving performance during forward and backward passes.
Key features:
forward()
method computes the matrix multiplication of two input tensors and stores the result in the output
tensor. The computation is accelerated using Tensor Cores with half-precision (FP16) to speed up matrix multiplication operations.backward()
method propagates the gradients from the output tensor to the input tensors using the chain rule of calculus.This class is part of the nz::nodes
namespace and is used for matrix operations in a computational graph.
Constructor to initialize a MatMulNode
for matrix multiplication.
This constructor initializes an MatMulNode
which performs matrix multiplication between the outputs of two input nodes. It ensures that the shapes of the two input tensors are compatible for matrix multiplication. Specifically, the number of columns of the left input tensor must match the number of rows of the right input tensor. If the shapes do not match, an exception is thrown. The constructor also initializes the output
tensor with the appropriate shape based on the input tensors and sets the requires_grad
flag based on the input tensors' gradient tracking requirements.
input_left | A pointer to the first input node. Its output tensor is used for the matrix multiplication. |
input_right | A pointer to the second input node. Its output tensor is used for the matrix multiplication. |
The constructor checks that the number of columns in the left input tensor (input_left->output->shape()[1]
) matches the number of rows in the right input tensor (input_right->output->shape()[0]
), as required for matrix multiplication. The output tensor is created with the shape (input_left->output->shape()[0], input_right->output->shape()[1])
, and the requires_grad
flag is set to true
if either of the input tensors requires gradients.
std::invalid_argument | If the shapes of the input tensors are not compatible for matrix multiplication. |
requires_grad
flag for the output tensor is set based on the gradient requirements of the input tensors.
|
overridevirtual |
Backward pass for the MatMulNode
to propagate gradients.
The backward()
method computes the gradients of the input tensors with respect to the output tensor for the matrix multiplication operation. During the backward pass, the gradients of the output tensor are propagated back to the two input tensors. The gradient computation follows the chain rule of calculus.
Specifically:
A
), the gradient is computed as dA = dC * B^T
, where dC
is the gradient of the output tensor and B^T
is the transpose of the right input tensor.B
), the gradient is computed as dB = A^T * dC
, where A^T
is the transpose of the left input tensor and dC
is the gradient of the output tensor.These gradients are computed on the GPU using CUDA kernels (GeneralMatrixMul
), which parallelize the matrix operations.
requiresGrad()
is true).GeneralMatrixMul
kernel is used for efficient gradient computation on the GPU.Implements nz::nodes::Node.
Definition at line 169 of file Nodes.cu.
|
overridevirtual |
Forward pass for the MatMulNode
to perform matrix multiplication.
The forward()
method computes the matrix multiplication between the two input tensors using CUDA, and stores the result in the output
tensor. The matrix multiplication is performed using the GeneralMatrixMul
kernel on the GPU, which efficiently computes the product of the two matrices in parallel.
This method is called during the forward pass of the neural network. It calculates the matrix product of the left input tensor (inputs[0]
) and the right input tensor (inputs[1]
), and stores the result in the output
tensor. The shape of the output
tensor is determined by the number of rows in the left input tensor and the number of columns in the right input tensor.
GeneralMatrixMul
performs the matrix multiplication using parallel computation on the GPU.M = A * B
, where A
is the left input tensor and B
is the right input tensor.TILE_SIZE
) and grid size are chosen to ensure efficient GPU parallelization of the operation.Implements nz::nodes::Node.