![]() |
NeuZephyr
Simple DL Framework
|
Contains data structures and utilities for tensor operations in machine learning workflows. More...
Classes | |
class | Dimension |
Represents a multi - dimensional shape, typically used in deep learning for tensor dimensions. More... | |
class | MappedTensor |
A class for representing multidimensional arrays in CUDA zero-copy memory, providing host-accessible container-like interfaces. More... | |
class | Tensor |
A class for representing and manipulating multidimensional arrays (tensors) in GPU memory. More... | |
Functions | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, T > | ReLU (T &input) |
Apply the Rectified Linear Unit (ReLU) activation function element-wise to an input tensor. | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, T > | Sigmoid (T &input) |
Apply the sigmoid activation function element-wise to an input tensor. | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, T > | Tanh (T &input) |
Apply the hyperbolic tangent (tanh) activation function element-wise to an input tensor. | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, T > | LeakyReLU (T &input, const float alpha=0.01f) |
Apply the Leaky Rectified Linear Unit (Leaky ReLU) activation function element-wise to an input tensor. | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, T > | Swish (T &input) |
Apply the Swish activation function element-wise to an input tensor. | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, T > | ELU (T &input, const float alpha=1.0f) |
Apply the Exponential Linear Unit (ELU) activation function element-wise to an input tensor. | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, T > | HardSigmoid (T &input, const float alpha=0.2f, const float beta=0.5f) |
Apply the Hard Sigmoid activation function element-wise to an input tensor. | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, T > | HardSwish (T &input, const float alpha=0.5f, const float beta=0.5f) |
Apply the Hard Swish activation function element-wise to an input tensor. | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, T > | Softmax (T &input) |
Compute the softmax function for a given input of type T. | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, T > | operator+ (T &lhs, const float rhs) |
Overload the addition operator to add a scalar float to a tensor of type T. | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, T > | operator+ (const float lhs, T &rhs) |
Overload the addition operator to add a tensor of type T to a scalar float. | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, T > | operator- (T &lhs, const float rhs) |
Overload the subtraction operator to subtract a scalar float from a tensor of type T. | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, T > | operator- (const float lhs, T &rhs) |
Overload the subtraction operator to subtract a tensor of type T from a scalar float. | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, T > | operator* (T &lhs, const float rhs) |
Overload the multiplication operator to multiply a tensor of type T by a scalar float. | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, T > | operator* (const float lhs, T &rhs) |
Overload the multiplication operator to multiply a scalar float by a tensor of type T. | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, T > | operator/ (T &lhs, const float rhs) |
Overload the division operator to divide a tensor of type T by a scalar float. | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, T > | operator/ (const float lhs, T &rhs) |
Overload the division operator to divide a scalar float by a tensor of type T. | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, void > | tensorMatrixAdd (T &out, const T &lhs, const T &rhs) |
Performs matrix addition operation on tensors with broadcast compatibility. | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, void > | tensorMatrixSub (T &out, const T &lhs, const T &rhs) |
Performs matrix subtraction operation on tensors with broadcast compatibility. | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, void > | tensorElementwiseDivide (T &out, const T &lhs, const T &rhs) |
Performs element - wise division operation on tensors with broadcast compatibility. | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, void > | tensorGeneralMatrixMul (T &out, const T &lhs, const T &rhs) |
Performs general matrix multiplication on tensors with broadcast compatibility. | |
template<typename T > | |
std::enable_if_t< is_valid_tensor_type< T >::value, T > | transpose (const T &in) |
Transposes a tensor with a valid tensor type. | |
std::ostream & | operator<< (std::ostream &os, const MappedTensor &tensor) |
Overload the << operator to print a MappedTensor object to an output stream. | |
std::istream & | operator>> (std::istream &is, MappedTensor &tensor) |
Overload the >> operator to read data from an input stream into a MappedTensor object. | |
std::ostream & | operator<< (std::ostream &os, const Tensor &tensor) |
Overloads the << operator to print the tensor's data to an output stream. | |
std::istream & | operator>> (std::istream &is, const Tensor &tensor) |
Overloads the >> operator to read a tensor's data from an input stream. | |
Contains data structures and utilities for tensor operations in machine learning workflows.
The nz::data
namespace provides foundational classes and functions for managing and manipulating tensors in GPU-based computations. It is designed for use in deep learning frameworks and other numerical computing applications.
Key components within this namespace include:
The namespace is intended to encapsulate all tensor-related functionality to ensure modularity and maintainability in the larger nz project.
std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::ELU | ( | T & | input, |
const float | alpha = 1.0f ) |
Apply the Exponential Linear Unit (ELU) activation function element-wise to an input tensor.
input | The input tensor (either Tensor or MappedTensor ) to which the ELU function will be applied (device-to-device). |
alpha | The alpha value for the ELU function. It controls the value to which the function saturates for negative inputs. The default value is 1.0f. |
Tensor
or MappedTensor
) with the ELU function applied element-wise.This function applies the ELU activation function, defined as ( f(x) = \begin{cases} x & \text{if } x \geq 0 \ \alpha (e^{x}- 1) & \text{if } x < 0 \end{cases} ), to each element of the input tensor. It first creates a new tensor result
with the same shape and gradient requirement as the input tensor. Then, it calls the iELU
function to perform the actual ELU operation on the data of the input tensor and store the results in the result
tensor. Finally, the result
tensor is returned.
Memory management: A new tensor result
is created, and its memory is managed by the tensor's own class (Tensor
or MappedTensor
). The memory of the input tensor remains unchanged. Exception handling: There is no explicit exception handling in this function. However, if the iELU
function or the tensor constructors throw exceptions, they will propagate up. Relationship with other components: This function depends on the iELU
function to perform the ELU operation and the tensor's constructor to create a new tensor.
[Exception | type thrown by iELU or tensor constructors] If there are issues during the operation, such as memory allocation failures or incorrect input data. |
input.size()
), as it needs to apply the ELU function to each element.alpha
value is recommended for better performance and to avoid the vanishing gradient problem.Definition at line 241 of file TensorOperations.cuh.
std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::HardSigmoid | ( | T & | input, |
const float | alpha = 0.2f, | ||
const float | beta = 0.5f ) |
Apply the Hard Sigmoid activation function element-wise to an input tensor.
input | The input tensor (either Tensor or MappedTensor ) to which the Hard Sigmoid function will be applied (device-to-device). |
alpha | The alpha value for the Hard Sigmoid function, controlling the slope of the linear part. The default value is 0.2f. |
beta | The beta value for the Hard Sigmoid function, controlling the bias of the linear part. The default value is 0.5f. |
Tensor
or MappedTensor
) with the Hard Sigmoid function applied element-wise.This function applies the Hard Sigmoid activation function, typically defined as ( f(x) = \max(0, \min(1, \alpha x + \beta)) ), to each element of the input tensor. It first creates a new tensor result
with the same shape and gradient requirement as the input tensor. Then, it calls the iHardSigmoid
function to perform the actual Hard Sigmoid operation on the data of the input tensor and store the results in the result
tensor. Finally, the result
tensor is returned.
Memory management: A new tensor result
is created, and its memory is managed by the tensor's own class (Tensor
or MappedTensor
). The memory of the input tensor remains unchanged. Exception handling: There is no explicit exception handling in this function. However, if the iHardSigmoid
function or the tensor constructors throw exceptions, they will propagate up. Relationship with other components: This function depends on the iHardSigmoid
function to perform the Hard Sigmoid operation and the tensor's constructor to create a new tensor.
[Exception | type thrown by iHardSigmoid or tensor constructors] If there are issues during the operation, such as memory allocation failures or incorrect input data. |
input.size()
), as it needs to apply the Hard Sigmoid function to each element.alpha
and beta
values can significantly affect the behavior of the Hard Sigmoid function.Definition at line 281 of file TensorOperations.cuh.
std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::HardSwish | ( | T & | input, |
const float | alpha = 0.5f, | ||
const float | beta = 0.5f ) |
Apply the Hard Swish activation function element-wise to an input tensor.
input | The input tensor (either Tensor or MappedTensor ) to which the Hard Swish function will be applied (device-to-device). |
alpha | The alpha value for the Hard Swish function, used to scale the input. The default value is 0.5f. |
beta | The beta value for the Hard Swish function, used as an offset. The default value is 0.5f. |
Tensor
or MappedTensor
) with the Hard Swish function applied element-wise.This function applies the Hard Swish activation function to each element of the input tensor. The Hard Swish function is often defined as ( f(x)=x \cdot \max(0, \min(1, \alpha x+\beta)) ). It first creates a new tensor result
with the same shape and gradient requirement as the input tensor. Then, it calls the iHardSwish
function to perform the actual Hard Swish operation on the data of the input tensor and store the results in the result
tensor. Finally, the result
tensor is returned.
Memory management: A new tensor result
is created, and its memory is managed by the tensor's own class (Tensor
or MappedTensor
). The memory of the input tensor remains unchanged. Exception handling: There is no explicit exception handling in this function. However, if the iHardSwish
function or the tensor constructors throw exceptions, they will propagate up. Relationship with other components: This function depends on the iHardSwish
function to perform the Hard Swish operation and the tensor's constructor to create a new tensor.
[Exception | type thrown by iHardSwish or tensor constructors] If there are issues during the operation, such as memory allocation failures or incorrect input data. |
input.size()
), as it needs to apply the Hard Swish function to each element.alpha
and beta
can be adjusted to fine - tune the behavior of the Hard Swish function.Definition at line 321 of file TensorOperations.cuh.
std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::LeakyReLU | ( | T & | input, |
const float | alpha = 0.01f ) |
Apply the Leaky Rectified Linear Unit (Leaky ReLU) activation function element-wise to an input tensor.
input | The input tensor (either Tensor or MappedTensor ) to which the Leaky ReLU function will be applied (device-to-device). |
alpha | The slope coefficient for negative values. It has a default value of 0.01f. |
Tensor
or MappedTensor
) with the Leaky ReLU function applied element-wise.This function applies the Leaky ReLU activation function, defined as ( f(x) = \begin{cases} x & \text{if } x \geq 0 \ \alpha x & \text{if } x < 0 \end{cases} ), to each element of the input tensor. It first creates a new tensor result
with the same shape and gradient requirement as the input tensor. Then, it calls the iLeakyReLU
function to perform the actual Leaky ReLU operation on the data of the input tensor and store the results in the result
tensor. Finally, the result
tensor is returned.
Memory management: A new tensor result
is created, and its memory is managed by the tensor's own class (Tensor
or MappedTensor
). The memory of the input tensor remains unchanged. Exception handling: There is no explicit exception handling in this function. However, if the iLeakyReLU
function or the tensor constructors throw exceptions, they will propagate up. Relationship with other components: This function depends on the iLeakyReLU
function to perform the Leaky ReLU operation and the tensor's constructor to create a new tensor.
[Exception | type thrown by iLeakyReLU or tensor constructors] If there are issues during the operation, such as memory allocation failures or incorrect input data. |
input.size()
), as it needs to apply the Leaky ReLU function to each element.alpha
should be a small positive number to avoid vanishing gradient problem for negative inputs.Definition at line 165 of file TensorOperations.cuh.
std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::operator* | ( | const float | lhs, |
T & | rhs ) |
Overload the multiplication operator to multiply a scalar float by a tensor of type T.
lhs | A constant float value representing the left - hand side scalar to multiply the tensor by. |
rhs | A reference to the right - hand side tensor of type T. The tensor data is used in the multiplication operation. |
This template operator overload first verifies if the type T is a valid tensor type using is_valid_tensor_type<T>::value
. If the type is valid, it constructs a new tensor result
with the same shape and gradient requirement as rhs
. Subsequently, it invokes the iScalarMul
function to multiply each element of rhs
data by the scalar lhs
. Finally, the newly created tensor result
is returned.
Memory management:
result
is created within the function, and its memory allocation depends on the constructor of type T. The memory of result
will be managed by its destructor when it goes out of scope.Exception handling:
iScalarMul
function or the constructor of type T throws an exception, it will be propagated to the caller.Relationship with other components:
iScalarMul
function to perform the actual multiplication operation.shape()
and requiresGrad()
member functions of type T.rhs
. This is because the iScalarMul
function needs to iterate over each element of the tensor.is_valid_tensor_type<T>::value
.rhs
has valid shape, gradient requirement, and size information.Definition at line 646 of file TensorOperations.cuh.
std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::operator* | ( | T & | lhs, |
const float | rhs ) |
Overload the multiplication operator to multiply a tensor of type T by a scalar float.
lhs | A reference to the left - hand side tensor of type T. The tensor data is used as the base for the multiplication operation. |
rhs | A constant float value representing the right - hand side scalar to multiply the tensor by. |
This template operator overload first checks if the type T is a valid tensor type using is_valid_tensor_type<T>::value
. If valid, it creates a new tensor result
with the same shape and gradient requirement as lhs
. To perform the multiplication, it calls the iScalarMul
function to multiply each element of lhs
data by the scalar rhs
. Finally, the newly created tensor result
is returned.
Memory management:
result
is created inside the function, which may allocate memory based on the constructor of type T. The memory of the result will be managed by its destructor when it goes out of scope.Exception handling:
iScalarMul
function or the constructor of type T throws an exception, it will propagate to the caller.Relationship with other components:
iScalarMul
function to perform the actual multiplication operation.shape()
and requiresGrad()
member functions of type T.lhs
. This is because the iScalarMul
function needs to iterate over each element of the tensor.is_valid_tensor_type<T>::value
.lhs
has valid shape, gradient requirement, and size information.Definition at line 604 of file TensorOperations.cuh.
std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::operator+ | ( | const float | lhs, |
T & | rhs ) |
Overload the addition operator to add a tensor of type T to a scalar float.
lhs | A constant float value representing the left - hand side scalar to be added to the tensor. |
rhs | A reference to the right - hand side tensor of type T. The tensor data is used to perform the addition operation. |
This function is a template operator overload. It first checks if the type T is a valid tensor type using is_valid_tensor_type<T>::value
. If the type is valid, it creates a new tensor result
with the same shape and gradient requirement as rhs
. Then, it calls the iScalarAdd
function to add the scalar lhs
to each element of the data in rhs
and stores the result in result
. Finally, the newly created tensor result
is returned.
Memory management:
result
is created inside the function, which may allocate memory according to the constructor of type T. The memory of the result will be managed by its destructor when it goes out of scope.Exception handling:
iScalarAdd
function or the constructor of type T throws an exception, it will be propagated to the caller.Relationship with other components:
iScalarAdd
function to perform the actual scalar - tensor addition.shape()
and requiresGrad()
member functions of type T.rhs
. This is because the iScalarAdd
function needs to iterate over each element of the tensor.is_valid_tensor_type<T>::value
.rhs
has valid shape, gradient requirement, and size information.Definition at line 478 of file TensorOperations.cuh.
std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::operator+ | ( | T & | lhs, |
const float | rhs ) |
Overload the addition operator to add a scalar float to a tensor of type T.
lhs | A reference to the left - hand side tensor of type T. The tensor data is modified in - place during the addition operation. |
rhs | A constant float value representing the right - hand side scalar to be added to the tensor. |
This function is a template operator overload that adds a scalar float value to a tensor. It first checks if the type T meets the requirements using is_valid_tensor_type<T>::value
. If the type is valid, it creates a new tensor result
with the same shape and gradient requirement as lhs
. Then, it calls the iScalarAdd
function to perform the actual addition operation, which adds the scalar rhs
to each element of the data in lhs
and stores the result in result
. Finally, the newly created tensor result
is returned.
Memory management:
result
is created inside the function, which may allocate memory depending on the implementation of the constructor of type T. The memory for the result will be managed by the destructor of the object when it goes out of scope.Exception handling:
iScalarAdd
function or the constructor of type T throws an exception, it will propagate up to the caller.Relationship with other components:
iScalarAdd
function to perform the actual scalar - tensor addition.shape()
and requiresGrad()
member functions of type T.lhs
. This is because the iScalarAdd
function needs to iterate over each element of the tensor.is_valid_tensor_type<T>::value
.lhs
has valid shape, gradient requirement, and size information.Definition at line 436 of file TensorOperations.cuh.
std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::operator- | ( | const float | lhs, |
T & | rhs ) |
Overload the subtraction operator to subtract a tensor of type T from a scalar float.
lhs | A constant float value representing the left - hand side scalar from which the tensor will be subtracted. |
rhs | A reference to the right - hand side tensor of type T. The tensor data is used in the subtraction operation. |
This template operator overload first checks if the type T is a valid tensor type using is_valid_tensor_type<T>::value
. If the type is valid, it creates a new tensor result
by negating the tensor rhs
. Then, it calls the iScalarAdd
function to add the scalar lhs
to each element of the negated tensor result
. Finally, the resulting tensor result
is returned.
Memory management:
result
is created inside the function, which may allocate memory according to the constructor of type T. The memory of the result will be managed by its destructor when it goes out of scope.Exception handling:
rhs
, the iScalarAdd
function, or the constructor of type T throws an exception, it will be propagated to the caller.Relationship with other components:
iScalarAdd
function to perform the addition of the scalar to the negated tensor.rhs
. This is because both the negation operation and the iScalarAdd
function need to iterate over each element of the tensor.is_valid_tensor_type<T>::value
.rhs
has valid shape, gradient requirement, and size information.Definition at line 562 of file TensorOperations.cuh.
std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::operator- | ( | T & | lhs, |
const float | rhs ) |
Overload the subtraction operator to subtract a scalar float from a tensor of type T.
lhs | A reference to the left - hand side tensor of type T. The tensor data is used as the base for the subtraction operation. |
rhs | A constant float value representing the right - hand side scalar to be subtracted from the tensor. |
This template operator overload first checks if the type T is a valid tensor type using is_valid_tensor_type<T>::value
. If valid, it creates a new tensor result
with the same shape and gradient requirement as lhs
. To perform the subtraction, it calls the iScalarAdd
function with -rhs
as the scalar to be added to each element of lhs
data. Finally, the newly created tensor result
is returned.
Memory management:
result
is created inside the function, which may allocate memory based on the constructor of type T. The memory of the result will be managed by its destructor when it goes out of scope.Exception handling:
iScalarAdd
function or the constructor of type T throws an exception, it will propagate to the caller.Relationship with other components:
iScalarAdd
function to perform the actual subtraction operation (by adding the negative of the scalar).shape()
and requiresGrad()
member functions of type T.lhs
. This is because the iScalarAdd
function needs to iterate over each element of the tensor.is_valid_tensor_type<T>::value
.lhs
has valid shape, gradient requirement, and size information.Definition at line 520 of file TensorOperations.cuh.
std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::operator/ | ( | const float | lhs, |
T & | rhs ) |
Overload the division operator to divide a scalar float by a tensor of type T.
lhs | A constant float value representing the left - hand side scalar dividend. |
rhs | A reference to the right - hand side tensor of type T. The tensor data is used as the divisor for the division operation. |
This template operator overload first verifies if the type T is a valid tensor type using is_valid_tensor_type<T>::value
. If valid, it creates a copy of the tensor rhs
named result
. Then it calls the recip
method of result
to compute the reciprocal of each element in the tensor. Finally, it uses the iScalarMul
function to multiply each element of the result
tensor by the scalar lhs
.
Memory management:
rhs
is created as result
, and its memory allocation depends on the copy - constructor of type T. The memory of result
will be managed by its destructor when it goes out of scope.Exception handling:
recip
method, iScalarMul
function, or the copy - constructor of type T throws an exception, it will be propagated to the caller.Relationship with other components:
recip
method of type T to compute the reciprocal of each element in the tensor.iScalarMul
function to perform the multiplication operation.rhs
. This is because both the recip
method and the iScalarMul
function need to iterate over each element of the tensor.is_valid_tensor_type<T>::value
.rhs
has valid shape, gradient requirement, and size information.rhs
is zero to avoid division by zero errors during the recip
operation.Definition at line 732 of file TensorOperations.cuh.
std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::operator/ | ( | T & | lhs, |
const float | rhs ) |
Overload the division operator to divide a tensor of type T by a scalar float.
lhs | A reference to the left - hand side tensor of type T. The tensor data is used as the dividend for the division operation. |
rhs | A constant float value representing the right - hand side scalar divisor. |
This template operator overload first checks if the type T is a valid tensor type using is_valid_tensor_type<T>::value
. If valid, it creates a new tensor result
with the same shape and gradient requirement as lhs
. Then it calls the iScalarDiv
function to divide each element of lhs
data by the scalar rhs
. Finally, the newly created tensor result
is returned.
Memory management:
result
is created inside the function, and its memory allocation depends on the constructor of type T. The memory of result
will be managed by its destructor when it goes out of scope.Exception handling:
iScalarDiv
function or the constructor of type T throws an exception, it will propagate to the caller.Relationship with other components:
iScalarDiv
function to perform the actual division operation.shape()
and requiresGrad()
member functions of type T.lhs
. This is because the iScalarDiv
function needs to iterate over each element of the tensor.is_valid_tensor_type<T>::value
.lhs
has valid shape, gradient requirement, and size information.rhs
is not zero to avoid division by zero errors.Definition at line 689 of file TensorOperations.cuh.
std::ostream & nz::data::operator<< | ( | std::ostream & | os, |
const MappedTensor & | tensor ) |
Overload the << operator to print a MappedTensor object to an output stream.
os | An output stream (host-to-host) where the MappedTensor data and gradient will be printed. |
tensor | A constant reference (host-to-host) to the MappedTensor object to be printed. |
os
after printing the tensor data and possibly its gradient.This function provides a convenient way to print a MappedTensor object using the << operator. It first calls the print
method of the MappedTensor to print the tensor's data. If the tensor requires gradients, it then prints a header "Gradient: " followed by the gradient data using the printGrad
method.
Memory management: The function does not allocate or deallocate any memory. It relies on the print
and printGrad
methods of the MappedTensor, which also do not perform memory allocation. Exception handling: If the tensor requires gradients and an exception occurs during the printGrad
call (e.g., due to an invalid state of the output stream or incorrect internal data), the exception will be propagated. If the tensor does not require gradients, the printGrad
call is skipped, and no exception related to gradient printing will be thrown. Relationship with other components: This function is related to the data presentation component of the MappedTensor. It integrates the print
and printGrad
methods to provide a unified way of printing the tensor and its gradient.
std::invalid_argument | Propagated from the printGrad method if the tensor requires gradients and there is an issue with gradient printing. |
_shape[0]
) and n is the number of columns (_shape[1]
) of the tensor, as it iterates over the tensor data and possibly the gradient data.os
is in a valid state before calling this function.Definition at line 45 of file MappedTensor.cu.
std::ostream & nz::data::operator<< | ( | std::ostream & | os, |
const Tensor & | tensor ) |
Overloads the <<
operator to print the tensor's data to an output stream.
This function is a friend of the Tensor
class and provides an overloaded version of the output stream operator (<<
) to print the contents of a tensor to the specified output stream (e.g., std::cout
or a file stream).
The tensor's data is first copied from GPU memory to host memory for printing, and then the data is printed in a 2D matrix format. Each row of the tensor is printed on a new line, and each element in a row is separated by a space. Each row is enclosed in square brackets.
os | The output stream to which the tensor will be printed. |
tensor | The tensor whose contents will be printed. |
os
) after the tensor has been printed, allowing for chaining of operations._data
) directly.cudaMemcpy
, which may introduce performance overhead for large tensors.std::istream & nz::data::operator>> | ( | std::istream & | is, |
const Tensor & | tensor ) |
Overloads the >>
operator to read a tensor's data from an input stream.
This function is a friend of the Tensor
class and provides an overloaded version of the input stream operator (>>
) to read the contents of a tensor from the specified input stream (e.g., std::cin
or a file stream).
The function reads the tensor's data element by element from the input stream and stores the values in a temporary buffer. Once all the data has been read, it is copied from the host memory back into the tensor's GPU memory using cudaMemcpy
.
is | The input stream from which the tensor's data will be read. |
tensor | The tensor to which the data will be read. |
is
) after reading the tensor's data, allowing for chaining of operations.std::istream & nz::data::operator>> | ( | std::istream & | is, |
MappedTensor & | tensor ) |
Overload the >> operator to read data from an input stream into a MappedTensor object.
is | An input stream (host-to-host) from which the data will be read. |
tensor | A reference (host-to-host) to the MappedTensor object where the data will be stored. |
is
after the reading operation.This function provides a convenient way to populate a MappedTensor object with data from an input stream. It iterates through the elements of the tensor and reads values from the input stream one by one, until either all elements of the tensor have been filled or the input stream fails to provide more data.
Memory management: The function does not allocate or deallocate any memory. It assumes that the _data
array of the MappedTensor has already been allocated with the appropriate size (_size
). Exception handling: If the input stream fails to provide data (e.g., due to end-of-file or an invalid input format), the loop will terminate, and the function will return the input stream in its current state. No exceptions are thrown by this function itself, but the >>
operator on the input stream may throw exceptions depending on its implementation. Relationship with other components: This function is related to the data input component of the MappedTensor. It integrates with the standard input stream to allow easy data population.
_size
), as it iterates through each element of the tensor once.Definition at line 81 of file MappedTensor.cu.
std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::ReLU | ( | T & | input | ) |
Apply the Rectified Linear Unit (ReLU) activation function element-wise to an input tensor.
input | The input tensor (either Tensor or MappedTensor ) to which the ReLU function will be applied (device to device). |
Tensor
or MappedTensor
) with the ReLU function applied element-wise.This function applies the ReLU activation function, defined as ( f(x) = \max(0, x) ), to each element of the input tensor. It first creates a new tensor result
with the same shape and gradient requirement as the input tensor. Then, it calls the iRELU
function to perform the actual ReLU operation on the data of the input tensor and store the results in the result
tensor. Finally, the result
tensor is returned.
Memory management: A new tensor result
is created, and its memory is managed by the tensor's own class (Tensor
or MappedTensor
). The memory of the input tensor remains unchanged. Exception handling: There is no explicit exception handling in this function. However, if the iRELU
function or the tensor constructors throw exceptions, they will propagate up. Relationship with other components: This function depends on the iRELU
function to perform the ReLU operation and the tensor's constructor to create a new tensor.
[Exception | type thrown by iRELU or tensor constructors] If there are issues during the operation, such as memory allocation failures or incorrect input data. |
input.size()
), as it needs to apply the ReLU function to each element.Definition at line 50 of file TensorOperations.cuh.
std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::Sigmoid | ( | T & | input | ) |
Apply the sigmoid activation function element-wise to an input tensor.
input | The input tensor (either Tensor or MappedTensor ) to which the sigmoid function will be applied (device-to-device). |
Tensor
or MappedTensor
) with the sigmoid function applied element-wise.This function applies the sigmoid activation function, defined as ( f(x)=\frac{1}{1 + e^{-x}} ), to each element of the input tensor. It first creates a new tensor result
with the same shape and gradient requirement as the input tensor. Then, it calls the iSigmoid
function to perform the actual sigmoid operation on the data of the input tensor and store the results in the result
tensor. Finally, the result
tensor is returned.
Memory management: A new tensor result
is created, and its memory is managed by the tensor's own class (Tensor
or MappedTensor
). The memory of the input tensor remains unchanged. Exception handling: There is no explicit exception handling in this function. However, if the iSigmoid
function or the tensor constructors throw exceptions, they will propagate up. Relationship with other components: This function depends on the iSigmoid
function to perform the sigmoid operation and the tensor's constructor to create a new tensor.
[Exception | type thrown by iSigmoid or tensor constructors] If there are issues during the operation, such as memory allocation failures or incorrect input data. |
input.size()
), as it needs to apply the sigmoid function to each element.Definition at line 88 of file TensorOperations.cuh.
std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::Softmax | ( | T & | input | ) |
Compute the softmax function for a given input of type T.
input | The input object of type T for which the softmax function will be computed. The input is passed by value, so a copy of the input is made inside the function. |
This function computes the softmax function for the given input. It first creates a new object result
with the same shape and gradient requirement as the input. Then, it calls the iSoftmax
function to perform the actual softmax computation. The iSoftmax
function takes the data pointers of the result and input, the exponential sum of the input, and the size of the input as parameters. Finally, the computed result is returned.
Memory management:
result
is created inside the function, which may allocate memory depending on the implementation of the constructor of type T. The memory for the result will be managed by the destructor of the object when it goes out of scope.Exception handling:
iSoftmax
function or the constructor of type T throws an exception, it will propagate up to the caller.Relationship with other components:
iSoftmax
function to perform the actual softmax computation.shape()
, requiresGrad()
, expSum()
, and size()
member functions of type T.iSoftmax
function. If the iSoftmax
function has a time complexity of O(n), where n is the size of the input, then the overall time complexity of this function is also O(n).input
has valid shape, gradient requirement, exponential sum, and size information.Definition at line 364 of file TensorOperations.cuh.
std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::Swish | ( | T & | input | ) |
Apply the Swish activation function element-wise to an input tensor.
input | The input tensor (either Tensor or MappedTensor ) to which the Swish function will be applied (device-to-device). |
Tensor
or MappedTensor
) with the Swish function applied element-wise.This function applies the Swish activation function, defined as ( f(x)=x\cdot\sigma(x) ), where (\sigma(x)=\frac{1}{1 + e^{-x}}) is the sigmoid function, to each element of the input tensor. It first creates a new tensor result
with the same shape and gradient requirement as the input tensor. Then, it calls the iSwish
function to perform the actual Swish operation on the data of the input tensor and store the results in the result
tensor. Finally, the result
tensor is returned.
Memory management: A new tensor result
is created, and its memory is managed by the tensor's own class (Tensor
or MappedTensor
). The memory of the input tensor remains unchanged. Exception handling: There is no explicit exception handling in this function. However, if the iSwish
function or the tensor constructors throw exceptions, they will propagate up. Relationship with other components: This function depends on the iSwish
function to perform the Swish operation and the tensor's constructor to create a new tensor.
[Exception | type thrown by iSwish or tensor constructors] If there are issues during the operation, such as memory allocation failures or incorrect input data. |
input.size()
), as it needs to apply the Swish function to each element.Definition at line 202 of file TensorOperations.cuh.
std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::Tanh | ( | T & | input | ) |
Apply the hyperbolic tangent (tanh) activation function element-wise to an input tensor.
input | The input tensor (either Tensor or MappedTensor ) to which the tanh function will be applied (device-to-device). |
Tensor
or MappedTensor
) with the tanh function applied element-wise.This function applies the hyperbolic tangent activation function, defined as ( f(x)=\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}} ), to each element of the input tensor. It first creates a new tensor result
with the same shape and gradient requirement as the input tensor. Then, it calls the iTanh
function to perform the actual tanh operation on the data of the input tensor and store the results in the result
tensor. Finally, the result
tensor is returned.
Memory management: A new tensor result
is created, and its memory is managed by the tensor's own class (Tensor
or MappedTensor
). The memory of the input tensor remains unchanged. Exception handling: There is no explicit exception handling in this function. However, if the iTanh
function or the tensor constructors throw exceptions, they will propagate up. Relationship with other components: This function depends on the iTanh
function to perform the tanh operation and the tensor's constructor to create a new tensor.
[Exception | type thrown by iTanh or tensor constructors] If there are issues during the operation, such as memory allocation failures or incorrect input data. |
input.size()
), as it needs to apply the tanh function to each element.Definition at line 126 of file TensorOperations.cuh.
std::enable_if_t< is_valid_tensor_type< T >::value, void > nz::data::tensorElementwiseDivide | ( | T & | out, |
const T & | lhs, | ||
const T & | rhs ) |
Performs element - wise division operation on tensors with broadcast compatibility.
This template function divides each element of the tensor lhs
by the corresponding element of the tensor rhs
and stores the result in the tensor out
. It is only enabled for types T
that satisfy is_valid_tensor_type<T>::value
. The shapes of the input tensors must be broadcast compatible, and their height and width dimensions must match.
T | The tensor type, which must satisfy is_valid_tensor_type<T>::value . |
out | The output tensor where the result of the element - wise division will be stored. Memory flow: host - to - function (reference), function - to - host (modified). |
lhs | The left - hand side tensor in the division operation. Memory flow: host - to - function. |
rhs | The right - hand side tensor in the division operation. Memory flow: host - to - function. |
Memory Management Strategy:
std::vector
objects (offsetC
, offsetA
, offsetB
) to store offset values. These vectors are automatically managed by their destructors.Exception Handling Mechanism:
std::invalid_argument
if the shapes of lhs
and rhs
are not broadcast compatible or if their height and width dimensions do not match.Relationship with Other Components:
shape()
method of the tensor type T
to access shape information, including broadcast compatibility, height, width, batch size, channel count, and strides.iElementwiseDivide
function to perform the actual element - wise division.std::invalid_argument | When the shapes of lhs and rhs are not broadcast compatible or their height and width dimensions do not match. |
out.shape()[0] * out.shape()[1]
), and n is the number of elements in a single matrix (lhs.shape().H() * lhs.shape().W()
).Definition at line 928 of file TensorOperations.cuh.
std::enable_if_t< is_valid_tensor_type< T >::value, void > nz::data::tensorGeneralMatrixMul | ( | T & | out, |
const T & | lhs, | ||
const T & | rhs ) |
Performs general matrix multiplication on tensors with broadcast compatibility.
This template function multiplies the tensor lhs
by the tensor rhs
and stores the result in the tensor out
. It is only enabled for types T
that satisfy is_valid_tensor_type<T>::value
. The shapes of the input tensors must be broadcast compatible, and the width of lhs
must be equal to the height of rhs
.
T | The tensor type, which must satisfy is_valid_tensor_type<T>::value . |
out | The output tensor that will hold the result of the matrix multiplication. Memory flow: host-to-function (reference), function-to-host (modified). |
lhs | The left-hand side tensor in the matrix multiplication. Memory flow: host-to-function. |
rhs | The right-hand side tensor in the matrix multiplication. Memory flow: host-to-function. |
Memory Management Strategy:
std::vector
objects (offsetC
, offsetA
, offsetB
) to store offset values. These vectors are automatically managed by their destructors.Exception Handling Mechanism:
std::invalid_argument
if the shapes of lhs
and rhs
are not broadcast compatible or if the width of lhs
is not equal to the height of rhs
.Relationship with Other Components:
shape()
method of the tensor type T
to obtain shape information, such as broadcast compatibility, height, width, batch size, channel count, and strides.iGeneralMatrixMul
function to perform the actual matrix multiplication.std::invalid_argument | When the shapes of lhs and rhs are not broadcast compatible or the width of lhs is not equal to the height of rhs . |
lhs
, k is the width of lhs
(equal to the height of rhs
), and n is the width of rhs
.Definition at line 1000 of file TensorOperations.cuh.
std::enable_if_t< is_valid_tensor_type< T >::value, void > nz::data::tensorMatrixAdd | ( | T & | out, |
const T & | lhs, | ||
const T & | rhs ) |
Performs matrix addition operation on tensors with broadcast compatibility.
This function is a template function that adds two tensors lhs
and rhs
and stores the result in out
. It only accepts tensor types for which is_valid_tensor_type<T>::value
is true
. The shapes of the input tensors must be broadcast compatible, and the height and width dimensions must match.
T | The tensor type. This type must satisfy is_valid_tensor_type<T>::value . |
out | The output tensor where the result of the addition will be stored. Memory flow: host-to-function (for reference), function-to-host (modifies the object). |
lhs | The left-hand side tensor of the addition. Memory flow: host-to-function. |
rhs | The right-hand side tensor of the addition. Memory flow: host-to-function. |
Memory Management Strategy:
std::vector
objects (offsetC
, offsetA
, offsetB
) to store offset values, and these vectors are automatically managed by their destructors.Exception Handling Mechanism:
std::invalid_argument
if the shapes of lhs
and rhs
are not broadcast compatible or if their height and width dimensions do not match.Relationship with Other Components:
shape()
method of the tensor type T
to access shape information, including broadcast compatibility, height, width, number of batches, number of channels, and strides.iMatrixAdd
function to perform the actual matrix addition operation.std::invalid_argument | When the shapes of lhs and rhs are not broadcast compatible or their height and width dimensions do not match. |
out.shape()[0] * out.shape()[1]
), and n is the number of elements in a single matrix (lhs.shape().H() * lhs.shape().W()
).Definition at line 787 of file TensorOperations.cuh.
std::enable_if_t< is_valid_tensor_type< T >::value, void > nz::data::tensorMatrixSub | ( | T & | out, |
const T & | lhs, | ||
const T & | rhs ) |
Performs matrix subtraction operation on tensors with broadcast compatibility.
This template function subtracts the tensor rhs
from the tensor lhs
and stores the result in the tensor out
. It is only enabled for types T
that satisfy is_valid_tensor_type<T>::value
. The shapes of the input tensors must be broadcast compatible, and their height and width dimensions must match.
T | The tensor type, which must meet the condition is_valid_tensor_type<T>::value . |
out | The output tensor that will hold the result of the subtraction. Memory flow: host-to-function (reference), function-to-host (modified). |
lhs | The left-hand side tensor in the subtraction operation. Memory flow: host-to-function. |
rhs | The right-hand side tensor in the subtraction operation. Memory flow: host-to-function. |
Memory Management Strategy:
std::vector
objects (offsetC
, offsetA
, offsetB
) to store offset values. These vectors are automatically managed by their destructors.Exception Handling Mechanism:
std::invalid_argument
if the shapes of lhs
and rhs
are not broadcast compatible or if their height and width dimensions do not match.Relationship with Other Components:
shape()
method of the tensor type T
to obtain shape information, such as broadcast compatibility, height, width, batch size, channel count, and strides.iMatrixSub
function to perform the actual matrix subtraction.std::invalid_argument | When the shapes of lhs and rhs are not broadcast compatible or their height and width dimensions do not match. |
out.shape()[0] * out.shape()[1]
), and n is the number of elements in a single matrix (lhs.shape().H() * lhs.shape().W()
).Definition at line 858 of file TensorOperations.cuh.
std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::transpose | ( | const T & | in | ) |
Transposes a tensor with a valid tensor type.
This template function transposes the input tensor in
and returns a new tensor result
. It is only enabled for types T
that satisfy is_valid_tensor_type<T>::value
.
T | The tensor type, which must satisfy is_valid_tensor_type<T>::value . |
in | The input tensor to be transposed. Memory flow: host - to - function. |
result
which is the transposed version of the input tensor in
. Memory flow: function - to - host.Memory Management Strategy:
result
is created inside the function to store the transposed data. The memory for this tensor is managed by the tensor type T
itself.std::vector
object offset
to store offset values. This vector is automatically managed by its destructor.Exception Handling Mechanism:
T
or the iTranspose
function.Relationship with Other Components:
shape()
method of the tensor type T
to access shape information, including dimensions and strides.iTranspose
function to perform the actual transpose operation.in.shape()[0] * in.shape()[1]
), and n is the product of the last two dimensions (in.shape()[2] * in.shape()[3]
).iTranspose
function is correctly implemented and that the tensor types support the necessary shape and data access methods.iTranspose
function may lead to incorrect results or runtime errors.Definition at line 1073 of file TensorOperations.cuh.