Contains data structures and utilities for tensor operations in machine learning workflows. More...

Classes
class	Dimension
	Represents a multi - dimensional shape, typically used in deep learning for tensor dimensions. More...

class	MappedTensor
	A class for representing multidimensional arrays in CUDA zero-copy memory, providing host-accessible container-like interfaces. More...

class	Tensor
	A class for representing and manipulating multidimensional arrays (tensors) in GPU memory. More...

Functions
template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, T >	ReLU (T &input)
	Apply the Rectified Linear Unit (ReLU) activation function element-wise to an input tensor.

template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, T >	Sigmoid (T &input)
	Apply the sigmoid activation function element-wise to an input tensor.

template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, T >	Tanh (T &input)
	Apply the hyperbolic tangent (tanh) activation function element-wise to an input tensor.

template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, T >	LeakyReLU (T &input, const float alpha=0.01f)
	Apply the Leaky Rectified Linear Unit (Leaky ReLU) activation function element-wise to an input tensor.

template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, T >	Swish (T &input)
	Apply the Swish activation function element-wise to an input tensor.

template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, T >	ELU (T &input, const float alpha=1.0f)
	Apply the Exponential Linear Unit (ELU) activation function element-wise to an input tensor.

template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, T >	HardSigmoid (T &input, const float alpha=0.2f, const float beta=0.5f)
	Apply the Hard Sigmoid activation function element-wise to an input tensor.

template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, T >	HardSwish (T &input, const float alpha=0.5f, const float beta=0.5f)
	Apply the Hard Swish activation function element-wise to an input tensor.

template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, T >	Softmax (T &input)
	Compute the softmax function for a given input of type T.

template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, T >	operator+ (T &lhs, const float rhs)
	Overload the addition operator to add a scalar float to a tensor of type T.

template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, T >	operator+ (const float lhs, T &rhs)
	Overload the addition operator to add a tensor of type T to a scalar float.

template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, T >	operator- (T &lhs, const float rhs)
	Overload the subtraction operator to subtract a scalar float from a tensor of type T.

template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, T >	operator- (const float lhs, T &rhs)
	Overload the subtraction operator to subtract a tensor of type T from a scalar float.

template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, T >	operator* (T &lhs, const float rhs)
	Overload the multiplication operator to multiply a tensor of type T by a scalar float.

template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, T >	operator* (const float lhs, T &rhs)
	Overload the multiplication operator to multiply a scalar float by a tensor of type T.

template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, T >	operator/ (T &lhs, const float rhs)
	Overload the division operator to divide a tensor of type T by a scalar float.

template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, T >	operator/ (const float lhs, T &rhs)
	Overload the division operator to divide a scalar float by a tensor of type T.

template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, void >	tensorMatrixAdd (T &out, const T &lhs, const T &rhs)
	Performs matrix addition operation on tensors with broadcast compatibility.

template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, void >	tensorMatrixSub (T &out, const T &lhs, const T &rhs)
	Performs matrix subtraction operation on tensors with broadcast compatibility.

template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, void >	tensorElementwiseDivide (T &out, const T &lhs, const T &rhs)
	Performs element - wise division operation on tensors with broadcast compatibility.

template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, void >	tensorGeneralMatrixMul (T &out, const T &lhs, const T &rhs)
	Performs general matrix multiplication on tensors with broadcast compatibility.

template<typename T >
std::enable_if_t< is_valid_tensor_type< T >::value, T >	transpose (const T &in)
	Transposes a tensor with a valid tensor type.

std::ostream &	operator<< (std::ostream &os, const MappedTensor &tensor)
	Overload the << operator to print a MappedTensor object to an output stream.

std::istream &	operator>> (std::istream &is, MappedTensor &tensor)
	Overload the >> operator to read data from an input stream into a MappedTensor object.

std::ostream &	operator<< (std::ostream &os, const Tensor &tensor)
	Overloads the `<<` operator to print the tensor's data to an output stream.

std::istream &	operator>> (std::istream &is, const Tensor &tensor)
	Overloads the `>>` operator to read a tensor's data from an input stream.

Detailed Description

Contains data structures and utilities for tensor operations in machine learning workflows.

The nz::data namespace provides foundational classes and functions for managing and manipulating tensors in GPU-based computations. It is designed for use in deep learning frameworks and other numerical computing applications.

Key components within this namespace include:

Tensor: A class representing multidimensional arrays (tensors) stored in GPU memory.
Utilities: Functions and operators for performing mathematical operations, memory management, and activation functions.

The namespace is intended to encapsulate all tensor-related functionality to ensure modularity and maintainability in the larger nz project.

Note: The components in this namespace rely on CUDA for GPU-based operations. Ensure that CUDA-compatible hardware and software are properly configured.

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2024/11/29

Function Documentation

◆ ELU()

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::ELU	(	T &	input,
		const float	alpha = 1.0f )

Apply the Exponential Linear Unit (ELU) activation function element-wise to an input tensor.

Parameters

input	The input tensor (either `Tensor` or `MappedTensor`) to which the ELU function will be applied (device-to-device).
alpha	The alpha value for the ELU function. It controls the value to which the function saturates for negative inputs. The default value is 1.0f.

Returns: A new tensor (of the same type as the input: Tensor or MappedTensor) with the ELU function applied element-wise.

This function applies the ELU activation function, defined as ( f(x) = \begin{cases} x & \text{if } x \geq 0 \ \alpha (e^{x}- 1) & \text{if } x < 0 \end{cases} ), to each element of the input tensor. It first creates a new tensor result with the same shape and gradient requirement as the input tensor. Then, it calls the iELU function to perform the actual ELU operation on the data of the input tensor and store the results in the result tensor. Finally, the result tensor is returned.

Memory management: A new tensor result is created, and its memory is managed by the tensor's own class (Tensor or MappedTensor). The memory of the input tensor remains unchanged. Exception handling: There is no explicit exception handling in this function. However, if the iELU function or the tensor constructors throw exceptions, they will propagate up. Relationship with other components: This function depends on the iELU function to perform the ELU operation and the tensor's constructor to create a new tensor.

Exceptions

[Exception type thrown by iELU or tensor constructors] If there are issues during the operation, such as memory allocation failures or incorrect input data.

Note

The time complexity of this function is O(n), where n is the number of elements in the input tensor (input.size()), as it needs to apply the ELU function to each element.
A positive alpha value is recommended for better performance and to avoid the vanishing gradient problem.

```cpp
// Assume T is either Tensor or MappedTensor
nz::data::T::shape_type shape = {2, 3};
nz::data::T input(shape, true);
nz::data::T output = ELU(input, 0.5f);
```

Definition at line 241 of file TensorOperations.cuh.

◆ HardSigmoid()

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::HardSigmoid	(	T &	input,
		const float	alpha = 0.2f,
		const float	beta = 0.5f )

Apply the Hard Sigmoid activation function element-wise to an input tensor.

Parameters

input	The input tensor (either `Tensor` or `MappedTensor`) to which the Hard Sigmoid function will be applied (device-to-device).
alpha	The alpha value for the Hard Sigmoid function, controlling the slope of the linear part. The default value is 0.2f.
beta	The beta value for the Hard Sigmoid function, controlling the bias of the linear part. The default value is 0.5f.

Returns: A new tensor (of the same type as the input: Tensor or MappedTensor) with the Hard Sigmoid function applied element-wise.

This function applies the Hard Sigmoid activation function, typically defined as ( f(x) = \max(0, \min(1, \alpha x + \beta)) ), to each element of the input tensor. It first creates a new tensor result with the same shape and gradient requirement as the input tensor. Then, it calls the iHardSigmoid function to perform the actual Hard Sigmoid operation on the data of the input tensor and store the results in the result tensor. Finally, the result tensor is returned.

Memory management: A new tensor result is created, and its memory is managed by the tensor's own class (Tensor or MappedTensor). The memory of the input tensor remains unchanged. Exception handling: There is no explicit exception handling in this function. However, if the iHardSigmoid function or the tensor constructors throw exceptions, they will propagate up. Relationship with other components: This function depends on the iHardSigmoid function to perform the Hard Sigmoid operation and the tensor's constructor to create a new tensor.

Exceptions

[Exception type thrown by iHardSigmoid or tensor constructors] If there are issues during the operation, such as memory allocation failures or incorrect input data.

Note

The time complexity of this function is O(n), where n is the number of elements in the input tensor (input.size()), as it needs to apply the Hard Sigmoid function to each element.
The choice of alpha and beta values can significantly affect the behavior of the Hard Sigmoid function.

```cpp
// Assume T is either Tensor or MappedTensor
nz::data::T::shape_type shape = {2, 3};
nz::data::T input(shape, true);
nz::data::T output = HardSigmoid(input, 0.3f, 0.6f);
```

Definition at line 281 of file TensorOperations.cuh.

◆ HardSwish()

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::HardSwish	(	T &	input,
		const float	alpha = 0.5f,
		const float	beta = 0.5f )

Apply the Hard Swish activation function element-wise to an input tensor.

Parameters

input	The input tensor (either `Tensor` or `MappedTensor`) to which the Hard Swish function will be applied (device-to-device).
alpha	The alpha value for the Hard Swish function, used to scale the input. The default value is 0.5f.
beta	The beta value for the Hard Swish function, used as an offset. The default value is 0.5f.

Returns: A new tensor (of the same type as the input: Tensor or MappedTensor) with the Hard Swish function applied element-wise.

This function applies the Hard Swish activation function to each element of the input tensor. The Hard Swish function is often defined as ( f(x)=x \cdot \max(0, \min(1, \alpha x+\beta)) ). It first creates a new tensor result with the same shape and gradient requirement as the input tensor. Then, it calls the iHardSwish function to perform the actual Hard Swish operation on the data of the input tensor and store the results in the result tensor. Finally, the result tensor is returned.

Memory management: A new tensor result is created, and its memory is managed by the tensor's own class (Tensor or MappedTensor). The memory of the input tensor remains unchanged. Exception handling: There is no explicit exception handling in this function. However, if the iHardSwish function or the tensor constructors throw exceptions, they will propagate up. Relationship with other components: This function depends on the iHardSwish function to perform the Hard Swish operation and the tensor's constructor to create a new tensor.

Exceptions

[Exception type thrown by iHardSwish or tensor constructors] If there are issues during the operation, such as memory allocation failures or incorrect input data.

Note

The time complexity of this function is O(n), where n is the number of elements in the input tensor (input.size()), as it needs to apply the Hard Swish function to each element.
The values of alpha and beta can be adjusted to fine - tune the behavior of the Hard Swish function.

```cpp
// Assume T is either Tensor or MappedTensor
nz::data::T::shape_type shape = {2, 3};
nz::data::T input(shape, true);
nz::data::T output = HardSwish(input, 0.4f, 0.7f);
```

Definition at line 321 of file TensorOperations.cuh.

◆ LeakyReLU()

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::LeakyReLU	(	T &	input,
		const float	alpha = 0.01f )

Apply the Leaky Rectified Linear Unit (Leaky ReLU) activation function element-wise to an input tensor.

Parameters

input	The input tensor (either `Tensor` or `MappedTensor`) to which the Leaky ReLU function will be applied (device-to-device).
alpha	The slope coefficient for negative values. It has a default value of 0.01f.

Returns: A new tensor (of the same type as the input: Tensor or MappedTensor) with the Leaky ReLU function applied element-wise.

This function applies the Leaky ReLU activation function, defined as ( f(x) = \begin{cases} x & \text{if } x \geq 0 \ \alpha x & \text{if } x < 0 \end{cases} ), to each element of the input tensor. It first creates a new tensor result with the same shape and gradient requirement as the input tensor. Then, it calls the iLeakyReLU function to perform the actual Leaky ReLU operation on the data of the input tensor and store the results in the result tensor. Finally, the result tensor is returned.

Memory management: A new tensor result is created, and its memory is managed by the tensor's own class (Tensor or MappedTensor). The memory of the input tensor remains unchanged. Exception handling: There is no explicit exception handling in this function. However, if the iLeakyReLU function or the tensor constructors throw exceptions, they will propagate up. Relationship with other components: This function depends on the iLeakyReLU function to perform the Leaky ReLU operation and the tensor's constructor to create a new tensor.

Exceptions

[Exception type thrown by iLeakyReLU or tensor constructors] If there are issues during the operation, such as memory allocation failures or incorrect input data.

Note

The time complexity of this function is O(n), where n is the number of elements in the input tensor (input.size()), as it needs to apply the Leaky ReLU function to each element.
The value of alpha should be a small positive number to avoid vanishing gradient problem for negative inputs.

```cpp
// Assume T is either Tensor or MappedTensor
nz::data::T::shape_type shape = {2, 3};
nz::data::T input(shape, true);
nz::data::T output = LeakyReLU(input, 0.02f);
```

Definition at line 165 of file TensorOperations.cuh.

◆ operator*() [1/2]

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::operator*	(	const float	lhs,
		T &	rhs )

Overload the multiplication operator to multiply a scalar float by a tensor of type T.

Parameters

lhs	A constant float value representing the left - hand side scalar to multiply the tensor by.
rhs	A reference to the right - hand side tensor of type T. The tensor data is used in the multiplication operation.

Returns: A new tensor of type T that is the result of multiplying each element of the tensor rhs by the scalar lhs.

This template operator overload first verifies if the type T is a valid tensor type using is_valid_tensor_type<T>::value. If the type is valid, it constructs a new tensor result with the same shape and gradient requirement as rhs. Subsequently, it invokes the iScalarMul function to multiply each element of rhs data by the scalar lhs. Finally, the newly created tensor result is returned.

Memory management:

A new tensor result is created within the function, and its memory allocation depends on the constructor of type T. The memory of result will be managed by its destructor when it goes out of scope.

Exception handling:

There is no explicit exception handling in this function. If the iScalarMul function or the constructor of type T throws an exception, it will be propagated to the caller.

Relationship with other components:

This function relies on the iScalarMul function to perform the actual multiplication operation.
It also depends on the shape() and requiresGrad() member functions of type T.

Note

The time complexity of this function is O(n), where n is the size of the tensor rhs. This is because the iScalarMul function needs to iterate over each element of the tensor.
Ensure that the type T is a valid tensor type as determined by is_valid_tensor_type<T>::value.
Ensure that the tensor rhs has valid shape, gradient requirement, and size information.

```cpp
// Assume Tensor is a valid tensor type with shape(), requiresGrad() member functions
nz::data::Tensor tensor({2, 3}, true);
// Assume tensor is filled with some values
nz::data::Tensor result = 2.0f * tensor;
```

Definition at line 646 of file TensorOperations.cuh.

◆ operator*() [2/2]

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::operator*	(	T &	lhs,
		const float	rhs )

Overload the multiplication operator to multiply a tensor of type T by a scalar float.

Parameters

lhs	A reference to the left - hand side tensor of type T. The tensor data is used as the base for the multiplication operation.
rhs	A constant float value representing the right - hand side scalar to multiply the tensor by.

Returns: A new tensor of type T that is the result of multiplying each element of the tensor lhs by the scalar rhs.

This template operator overload first checks if the type T is a valid tensor type using is_valid_tensor_type<T>::value. If valid, it creates a new tensor result with the same shape and gradient requirement as lhs. To perform the multiplication, it calls the iScalarMul function to multiply each element of lhs data by the scalar rhs. Finally, the newly created tensor result is returned.

Memory management:

A new tensor result is created inside the function, which may allocate memory based on the constructor of type T. The memory of the result will be managed by its destructor when it goes out of scope.

Exception handling:

There is no explicit exception handling in this function. If the iScalarMul function or the constructor of type T throws an exception, it will propagate to the caller.

Relationship with other components:

This function depends on the iScalarMul function to perform the actual multiplication operation.
It also depends on the shape() and requiresGrad() member functions of type T.

Note

The time complexity of this function is O(n), where n is the size of the tensor lhs. This is because the iScalarMul function needs to iterate over each element of the tensor.
Ensure that the type T is a valid tensor type as determined by is_valid_tensor_type<T>::value.
Ensure that the tensor lhs has valid shape, gradient requirement, and size information.

```cpp
// Assume Tensor is a valid tensor type with shape(), requiresGrad() member functions
nz::data::Tensor tensor({2, 3}, true);
// Assume tensor is filled with some values
nz::data::Tensor result = tensor * 2.0f;
```

Definition at line 604 of file TensorOperations.cuh.

◆ operator+() [1/2]

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::operator+	(	const float	lhs,
		T &	rhs )

Overload the addition operator to add a tensor of type T to a scalar float.

Parameters

lhs	A constant float value representing the left - hand side scalar to be added to the tensor.
rhs	A reference to the right - hand side tensor of type T. The tensor data is used to perform the addition operation.

Returns: A new tensor of type T that is the result of adding the scalar lhs to each element of the tensor rhs.

This function is a template operator overload. It first checks if the type T is a valid tensor type using is_valid_tensor_type<T>::value. If the type is valid, it creates a new tensor result with the same shape and gradient requirement as rhs. Then, it calls the iScalarAdd function to add the scalar lhs to each element of the data in rhs and stores the result in result. Finally, the newly created tensor result is returned.

Memory management:

A new tensor result is created inside the function, which may allocate memory according to the constructor of type T. The memory of the result will be managed by its destructor when it goes out of scope.

Exception handling:

There is no explicit exception handling in this function. If the iScalarAdd function or the constructor of type T throws an exception, it will be propagated to the caller.

Relationship with other components:

This function depends on the iScalarAdd function to perform the actual scalar - tensor addition.
It also depends on the shape() and requiresGrad() member functions of type T.

Note

The time complexity of this function is O(n), where n is the size of the tensor rhs. This is because the iScalarAdd function needs to iterate over each element of the tensor.
Ensure that the type T is a valid tensor type as determined by is_valid_tensor_type<T>::value.
Ensure that the tensor rhs has valid shape, gradient requirement, and size information.

```cpp
// Assume Tensor is a valid tensor type with shape(), requiresGrad() member functions
nz::data::Tensor tensor({2, 3}, true);
// Assume tensor is filled with some values
nz::data::Tensor result = 2.0f + tensor;
```

Definition at line 478 of file TensorOperations.cuh.

◆ operator+() [2/2]

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::operator+	(	T &	lhs,
		const float	rhs )

Overload the addition operator to add a scalar float to a tensor of type T.

Parameters

lhs	A reference to the left - hand side tensor of type T. The tensor data is modified in - place during the addition operation.
rhs	A constant float value representing the right - hand side scalar to be added to the tensor.

Returns: A new tensor of type T that is the result of adding the scalar rhs to each element of the tensor lhs.

This function is a template operator overload that adds a scalar float value to a tensor. It first checks if the type T meets the requirements using is_valid_tensor_type<T>::value. If the type is valid, it creates a new tensor result with the same shape and gradient requirement as lhs. Then, it calls the iScalarAdd function to perform the actual addition operation, which adds the scalar rhs to each element of the data in lhs and stores the result in result. Finally, the newly created tensor result is returned.

Memory management:

A new tensor result is created inside the function, which may allocate memory depending on the implementation of the constructor of type T. The memory for the result will be managed by the destructor of the object when it goes out of scope.

Exception handling:

There is no explicit exception handling in this function. However, if the iScalarAdd function or the constructor of type T throws an exception, it will propagate up to the caller.

Relationship with other components:

This function depends on the iScalarAdd function to perform the actual scalar - tensor addition.
It also depends on the shape() and requiresGrad() member functions of type T.

Note

The time complexity of this function is O(n), where n is the size of the tensor lhs. This is because the iScalarAdd function needs to iterate over each element of the tensor.
Ensure that the type T is a valid tensor type as determined by is_valid_tensor_type<T>::value.
Ensure that the tensor lhs has valid shape, gradient requirement, and size information.

```cpp
// Assume Tensor is a valid tensor type with shape(), requiresGrad() member functions
nz::data::Tensor tensor({2, 3}, true);
// Assume tensor is filled with some values
nz::data::Tensor result = tensor + 2.0f;
```

Definition at line 436 of file TensorOperations.cuh.

◆ operator-() [1/2]

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::operator-	(	const float	lhs,
		T &	rhs )

Overload the subtraction operator to subtract a tensor of type T from a scalar float.

Parameters

lhs	A constant float value representing the left - hand side scalar from which the tensor will be subtracted.
rhs	A reference to the right - hand side tensor of type T. The tensor data is used in the subtraction operation.

Returns: A new tensor of type T that is the result of subtracting each element of the tensor rhs from the scalar lhs.

This template operator overload first checks if the type T is a valid tensor type using is_valid_tensor_type<T>::value. If the type is valid, it creates a new tensor result by negating the tensor rhs. Then, it calls the iScalarAdd function to add the scalar lhs to each element of the negated tensor result. Finally, the resulting tensor result is returned.

Memory management:

A new tensor result is created inside the function, which may allocate memory according to the constructor of type T. The memory of the result will be managed by its destructor when it goes out of scope.

Exception handling:

There is no explicit exception handling in this function. If the negation operation of rhs, the iScalarAdd function, or the constructor of type T throws an exception, it will be propagated to the caller.

Relationship with other components:

This function depends on the negation operator of type T to obtain the negated tensor.
It also depends on the iScalarAdd function to perform the addition of the scalar to the negated tensor.

Note

The time complexity of this function is O(n), where n is the size of the tensor rhs. This is because both the negation operation and the iScalarAdd function need to iterate over each element of the tensor.
Ensure that the type T is a valid tensor type as determined by is_valid_tensor_type<T>::value.
Ensure that the tensor rhs has valid shape, gradient requirement, and size information.

```cpp
// Assume Tensor is a valid tensor type
nz::data::Tensor tensor({2, 3}, true);
// Assume tensor is filled with some values
nz::data::Tensor result = 2.0f - tensor;
```

Definition at line 562 of file TensorOperations.cuh.

◆ operator-() [2/2]

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::operator-	(	T &	lhs,
		const float	rhs )

Overload the subtraction operator to subtract a scalar float from a tensor of type T.

Parameters

lhs	A reference to the left - hand side tensor of type T. The tensor data is used as the base for the subtraction operation.
rhs	A constant float value representing the right - hand side scalar to be subtracted from the tensor.

Returns: A new tensor of type T that is the result of subtracting the scalar rhs from each element of the tensor lhs.

This template operator overload first checks if the type T is a valid tensor type using is_valid_tensor_type<T>::value. If valid, it creates a new tensor result with the same shape and gradient requirement as lhs. To perform the subtraction, it calls the iScalarAdd function with -rhs as the scalar to be added to each element of lhs data. Finally, the newly created tensor result is returned.

Memory management:

A new tensor result is created inside the function, which may allocate memory based on the constructor of type T. The memory of the result will be managed by its destructor when it goes out of scope.

Exception handling:

There is no explicit exception handling in this function. If the iScalarAdd function or the constructor of type T throws an exception, it will propagate to the caller.

Relationship with other components:

This function depends on the iScalarAdd function to perform the actual subtraction operation (by adding the negative of the scalar).
It also depends on the shape() and requiresGrad() member functions of type T.

Note

The time complexity of this function is O(n), where n is the size of the tensor lhs. This is because the iScalarAdd function needs to iterate over each element of the tensor.
Ensure that the type T is a valid tensor type as determined by is_valid_tensor_type<T>::value.
Ensure that the tensor lhs has valid shape, gradient requirement, and size information.

```cpp
// Assume Tensor is a valid tensor type with shape(), requiresGrad() member functions
nz::data::Tensor tensor({2, 3}, true);
// Assume tensor is filled with some values
nz::data::Tensor result = tensor - 2.0f;
```

Definition at line 520 of file TensorOperations.cuh.

◆ operator/() [1/2]

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::operator/	(	const float	lhs,
		T &	rhs )

Overload the division operator to divide a scalar float by a tensor of type T.

Parameters

lhs	A constant float value representing the left - hand side scalar dividend.
rhs	A reference to the right - hand side tensor of type T. The tensor data is used as the divisor for the division operation.

Returns: A new tensor of type T that is the result of dividing the scalar lhs by each element of the tensor rhs.

This template operator overload first verifies if the type T is a valid tensor type using is_valid_tensor_type<T>::value. If valid, it creates a copy of the tensor rhs named result. Then it calls the recip method of result to compute the reciprocal of each element in the tensor. Finally, it uses the iScalarMul function to multiply each element of the result tensor by the scalar lhs.

Memory management:

A copy of the tensor rhs is created as result, and its memory allocation depends on the copy - constructor of type T. The memory of result will be managed by its destructor when it goes out of scope.

Exception handling:

There is no explicit exception handling in this function. If the recip method, iScalarMul function, or the copy - constructor of type T throws an exception, it will be propagated to the caller.

Relationship with other components:

This function depends on the recip method of type T to compute the reciprocal of each element in the tensor.
It also depends on the iScalarMul function to perform the multiplication operation.

Note

The time complexity of this function is O(n), where n is the size of the tensor rhs. This is because both the recip method and the iScalarMul function need to iterate over each element of the tensor.
Ensure that the type T is a valid tensor type as determined by is_valid_tensor_type<T>::value.
Ensure that the tensor rhs has valid shape, gradient requirement, and size information.
Ensure that no element in the tensor rhs is zero to avoid division by zero errors during the recip operation.

```cpp
// Assume Tensor is a valid tensor type with shape(), requiresGrad() and recip() member functions
nz::data::Tensor tensor({2, 3}, true);
// Assume tensor is filled with some non - zero values
nz::data::Tensor result = 2.0f / tensor;
```

Definition at line 732 of file TensorOperations.cuh.

◆ operator/() [2/2]

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::operator/	(	T &	lhs,
		const float	rhs )

Overload the division operator to divide a tensor of type T by a scalar float.

Parameters

lhs	A reference to the left - hand side tensor of type T. The tensor data is used as the dividend for the division operation.
rhs	A constant float value representing the right - hand side scalar divisor.

Returns: A new tensor of type T that is the result of dividing each element of the tensor lhs by the scalar rhs.

This template operator overload first checks if the type T is a valid tensor type using is_valid_tensor_type<T>::value. If valid, it creates a new tensor result with the same shape and gradient requirement as lhs. Then it calls the iScalarDiv function to divide each element of lhs data by the scalar rhs. Finally, the newly created tensor result is returned.

Memory management:

A new tensor result is created inside the function, and its memory allocation depends on the constructor of type T. The memory of result will be managed by its destructor when it goes out of scope.

Exception handling:

There is no explicit exception handling in this function. If the iScalarDiv function or the constructor of type T throws an exception, it will propagate to the caller.

Relationship with other components:

This function depends on the iScalarDiv function to perform the actual division operation.
It also depends on the shape() and requiresGrad() member functions of type T.

Note

The time complexity of this function is O(n), where n is the size of the tensor lhs. This is because the iScalarDiv function needs to iterate over each element of the tensor.
Ensure that the type T is a valid tensor type as determined by is_valid_tensor_type<T>::value.
Ensure that the tensor lhs has valid shape, gradient requirement, and size information.
Ensure that the scalar rhs is not zero to avoid division by zero errors.

```cpp
// Assume Tensor is a valid tensor type with shape(), requiresGrad() member functions
nz::data::Tensor tensor({2, 3}, true);
// Assume tensor is filled with some values
nz::data::Tensor result = tensor / 2.0f;
```

Definition at line 689 of file TensorOperations.cuh.

◆ operator<<() [1/2]

std::ostream & nz::data::operator<<	(	std::ostream &	os,
		const MappedTensor &	tensor )

Overload the << operator to print a MappedTensor object to an output stream.

Parameters

os	An output stream (host-to-host) where the MappedTensor data and gradient will be printed.
tensor	A constant reference (host-to-host) to the MappedTensor object to be printed.

Returns: A reference to the output stream os after printing the tensor data and possibly its gradient.

This function provides a convenient way to print a MappedTensor object using the << operator. It first calls the print method of the MappedTensor to print the tensor's data. If the tensor requires gradients, it then prints a header "Gradient: " followed by the gradient data using the printGrad method.

Memory management: The function does not allocate or deallocate any memory. It relies on the print and printGrad methods of the MappedTensor, which also do not perform memory allocation. Exception handling: If the tensor requires gradients and an exception occurs during the printGrad call (e.g., due to an invalid state of the output stream or incorrect internal data), the exception will be propagated. If the tensor does not require gradients, the printGrad call is skipped, and no exception related to gradient printing will be thrown. Relationship with other components: This function is related to the data presentation component of the MappedTensor. It integrates the print and printGrad methods to provide a unified way of printing the tensor and its gradient.

Exceptions

std::invalid_argument Propagated from the printGrad method if the tensor requires gradients and there is an issue with gradient printing.

Note

The overall time complexity of this function is O(m * n) if the tensor does not require gradients and O(2 * m * n) if it does, where m is the number of rows (_shape[0]) and n is the number of columns (_shape[1]) of the tensor, as it iterates over the tensor data and possibly the gradient data.
Ensure that the output stream os is in a valid state before calling this function.

```cpp
nz::data::MappedTensor::shape_type shape = {2, 3};
nz::data::MappedTensor tensor(shape, true);
tensor.dataInject({1, 2, 3, 4, 5, 6}, false);
tensor.dataInject({7, 8, 9, 10, 11, 12}, true);
std::cout << tensor;
```

Definition at line 45 of file MappedTensor.cu.

◆ operator<<() [2/2]

std::ostream & nz::data::operator<<	(	std::ostream &	os,
		const Tensor &	tensor )

Overloads the << operator to print the tensor's data to an output stream.

This function is a friend of the Tensor class and provides an overloaded version of the output stream operator (<<) to print the contents of a tensor to the specified output stream (e.g., std::cout or a file stream).

The tensor's data is first copied from GPU memory to host memory for printing, and then the data is printed in a 2D matrix format. Each row of the tensor is printed on a new line, and each element in a row is separated by a space. Each row is enclosed in square brackets.

Parameters

os	The output stream to which the tensor will be printed.
tensor	The tensor whose contents will be printed.

Returns: The output stream (os) after the tensor has been printed, allowing for chaining of operations.

Note

This operator works by accessing the tensor's private data members (e.g., _data) directly.
The tensor's data is assumed to be in a valid state (i.e., properly allocated in GPU memory) before printing.
The function copies the tensor's data from device (GPU) memory to host (CPU) memory using cudaMemcpy, which may introduce performance overhead for large tensors.

```cpp
Tensor tensor({2, 3});
tensor.fill(1.0f);  // Fill the tensor with 1.0f
std::cout << tensor << std::endl;  // Prints the tensor to standard output in matrix format
```

Definition at line 39 of file Tensor.cu.

◆ operator>>() [1/2]

std::istream & nz::data::operator>>	(	std::istream &	is,
		const Tensor &	tensor )

Overloads the >> operator to read a tensor's data from an input stream.

This function is a friend of the Tensor class and provides an overloaded version of the input stream operator (>>) to read the contents of a tensor from the specified input stream (e.g., std::cin or a file stream).

The function reads the tensor's data element by element from the input stream and stores the values in a temporary buffer. Once all the data has been read, it is copied from the host memory back into the tensor's GPU memory using cudaMemcpy.

Parameters

is	The input stream from which the tensor's data will be read.
tensor	The tensor to which the data will be read.

Returns: The input stream (is) after reading the tensor's data, allowing for chaining of operations.

Note

This operator works by reading data from the input stream and storing it in a temporary buffer on the host.
The function assumes that the input data matches the size of the tensor. If the data is malformed or does not match, the behavior may be undefined.
After reading, the data is copied from host memory back into the tensor's GPU memory.

```cpp
Tensor tensor({2, 3});
std::cin >> tensor;  // Reads the tensor's data from standard input
```

Definition at line 76 of file Tensor.cu.

◆ operator>>() [2/2]

std::istream & nz::data::operator>>	(	std::istream &	is,
		MappedTensor &	tensor )

Overload the >> operator to read data from an input stream into a MappedTensor object.

Parameters

is	An input stream (host-to-host) from which the data will be read.
tensor	A reference (host-to-host) to the MappedTensor object where the data will be stored.

Returns: A reference to the input stream is after the reading operation.

This function provides a convenient way to populate a MappedTensor object with data from an input stream. It iterates through the elements of the tensor and reads values from the input stream one by one, until either all elements of the tensor have been filled or the input stream fails to provide more data.

Memory management: The function does not allocate or deallocate any memory. It assumes that the _data array of the MappedTensor has already been allocated with the appropriate size (_size). Exception handling: If the input stream fails to provide data (e.g., due to end-of-file or an invalid input format), the loop will terminate, and the function will return the input stream in its current state. No exceptions are thrown by this function itself, but the >> operator on the input stream may throw exceptions depending on its implementation. Relationship with other components: This function is related to the data input component of the MappedTensor. It integrates with the standard input stream to allow easy data population.

Note

The time complexity of this function is O(n), where n is the size of the tensor (_size), as it iterates through each element of the tensor once.
Ensure that the input stream contains valid data in the correct format to avoid unexpected behavior.

```cpp
nz::data::MappedTensor::shape_type shape = {2, 3};
nz::data::MappedTensor tensor(shape, false);
std::istringstream iss("1 2 3 4 5 6");
iss >> tensor;
```

Definition at line 81 of file MappedTensor.cu.

◆ ReLU()

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::ReLU ( T & input )

Apply the Rectified Linear Unit (ReLU) activation function element-wise to an input tensor.

Parameters

input The input tensor (either Tensor or MappedTensor) to which the ReLU function will be applied (device to device).

Returns: A new tensor (of the same type as the input: Tensor or MappedTensor) with the ReLU function applied element-wise.

This function applies the ReLU activation function, defined as ( f(x) = \max(0, x) ), to each element of the input tensor. It first creates a new tensor result with the same shape and gradient requirement as the input tensor. Then, it calls the iRELU function to perform the actual ReLU operation on the data of the input tensor and store the results in the result tensor. Finally, the result tensor is returned.

Memory management: A new tensor result is created, and its memory is managed by the tensor's own class (Tensor or MappedTensor). The memory of the input tensor remains unchanged. Exception handling: There is no explicit exception handling in this function. However, if the iRELU function or the tensor constructors throw exceptions, they will propagate up. Relationship with other components: This function depends on the iRELU function to perform the ReLU operation and the tensor's constructor to create a new tensor.

Exceptions

[Exception type thrown by iRELU or tensor constructors] If there are issues during the operation, such as memory allocation failures or incorrect input data.

Note

The time complexity of this function is O(n), where n is the number of elements in the input tensor (input.size()), as it needs to apply the ReLU function to each element.

```cpp
// Assume T is either Tensor or MappedTensor
nz::data::T::shape_type shape = {2, 3};
nz::data::T input(shape, true);
nz::data::T output = ReLU(input);
```

Definition at line 50 of file TensorOperations.cuh.

◆ Sigmoid()

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::Sigmoid ( T & input )

Apply the sigmoid activation function element-wise to an input tensor.

Parameters

input The input tensor (either Tensor or MappedTensor) to which the sigmoid function will be applied (device-to-device).

Returns: A new tensor (of the same type as the input: Tensor or MappedTensor) with the sigmoid function applied element-wise.

This function applies the sigmoid activation function, defined as ( f(x)=\frac{1}{1 + e^{-x}} ), to each element of the input tensor. It first creates a new tensor result with the same shape and gradient requirement as the input tensor. Then, it calls the iSigmoid function to perform the actual sigmoid operation on the data of the input tensor and store the results in the result tensor. Finally, the result tensor is returned.

Memory management: A new tensor result is created, and its memory is managed by the tensor's own class (Tensor or MappedTensor). The memory of the input tensor remains unchanged. Exception handling: There is no explicit exception handling in this function. However, if the iSigmoid function or the tensor constructors throw exceptions, they will propagate up. Relationship with other components: This function depends on the iSigmoid function to perform the sigmoid operation and the tensor's constructor to create a new tensor.

Exceptions

[Exception type thrown by iSigmoid or tensor constructors] If there are issues during the operation, such as memory allocation failures or incorrect input data.

Note

The time complexity of this function is O(n), where n is the number of elements in the input tensor (input.size()), as it needs to apply the sigmoid function to each element.

```cpp
// Assume T is either Tensor or MappedTensor
nz::data::T::shape_type shape = {2, 3};
nz::data::T input(shape, true);
nz::data::T output = Sigmoid(input);
```

Definition at line 88 of file TensorOperations.cuh.

◆ Softmax()

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::Softmax ( T & input )

Compute the softmax function for a given input of type T.

Parameters

input The input object of type T for which the softmax function will be computed. The input is passed by value, so a copy of the input is made inside the function.

Returns: An object of type T representing the result of the softmax function applied to the input.

This function computes the softmax function for the given input. It first creates a new object result with the same shape and gradient requirement as the input. Then, it calls the iSoftmax function to perform the actual softmax computation. The iSoftmax function takes the data pointers of the result and input, the exponential sum of the input, and the size of the input as parameters. Finally, the computed result is returned.

Memory management:

A new object result is created inside the function, which may allocate memory depending on the implementation of the constructor of type T. The memory for the result will be managed by the destructor of the object when it goes out of scope.

Exception handling:

There is no explicit exception handling in this function. However, if the iSoftmax function or the constructor of type T throws an exception, it will propagate up to the caller.

Relationship with other components:

This function depends on the iSoftmax function to perform the actual softmax computation.
It also depends on the shape(), requiresGrad(), expSum(), and size() member functions of type T.

Note

The time complexity of this function depends on the implementation of the iSoftmax function. If the iSoftmax function has a time complexity of O(n), where n is the size of the input, then the overall time complexity of this function is also O(n).
Ensure that the input object input has valid shape, gradient requirement, exponential sum, and size information.

```cpp
// Assume Tensor is a valid type with shape(), requiresGrad(), expSum(), and size() member functions
nz::data::Tensor input({2, 3}, true);
// Assume input is filled with some values
nz::data::Tensor result = Softmax(input);
```

Definition at line 364 of file TensorOperations.cuh.

◆ Swish()

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::Swish ( T & input )

Apply the Swish activation function element-wise to an input tensor.

Parameters

input The input tensor (either Tensor or MappedTensor) to which the Swish function will be applied (device-to-device).

Returns: A new tensor (of the same type as the input: Tensor or MappedTensor) with the Swish function applied element-wise.

This function applies the Swish activation function, defined as ( f(x)=x\cdot\sigma(x) ), where (\sigma(x)=\frac{1}{1 + e^{-x}}) is the sigmoid function, to each element of the input tensor. It first creates a new tensor result with the same shape and gradient requirement as the input tensor. Then, it calls the iSwish function to perform the actual Swish operation on the data of the input tensor and store the results in the result tensor. Finally, the result tensor is returned.

Memory management: A new tensor result is created, and its memory is managed by the tensor's own class (Tensor or MappedTensor). The memory of the input tensor remains unchanged. Exception handling: There is no explicit exception handling in this function. However, if the iSwish function or the tensor constructors throw exceptions, they will propagate up. Relationship with other components: This function depends on the iSwish function to perform the Swish operation and the tensor's constructor to create a new tensor.

Exceptions

[Exception type thrown by iSwish or tensor constructors] If there are issues during the operation, such as memory allocation failures or incorrect input data.

Note

The time complexity of this function is O(n), where n is the number of elements in the input tensor (input.size()), as it needs to apply the Swish function to each element.

```cpp
// Assume T is either Tensor or MappedTensor
nz::data::T::shape_type shape = {2, 3};
nz::data::T input(shape, true);
nz::data::T output = Swish(input);
```

Definition at line 202 of file TensorOperations.cuh.

◆ Tanh()

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::Tanh ( T & input )

Apply the hyperbolic tangent (tanh) activation function element-wise to an input tensor.

Parameters

input The input tensor (either Tensor or MappedTensor) to which the tanh function will be applied (device-to-device).

Returns: A new tensor (of the same type as the input: Tensor or MappedTensor) with the tanh function applied element-wise.

This function applies the hyperbolic tangent activation function, defined as ( f(x)=\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}} ), to each element of the input tensor. It first creates a new tensor result with the same shape and gradient requirement as the input tensor. Then, it calls the iTanh function to perform the actual tanh operation on the data of the input tensor and store the results in the result tensor. Finally, the result tensor is returned.

Memory management: A new tensor result is created, and its memory is managed by the tensor's own class (Tensor or MappedTensor). The memory of the input tensor remains unchanged. Exception handling: There is no explicit exception handling in this function. However, if the iTanh function or the tensor constructors throw exceptions, they will propagate up. Relationship with other components: This function depends on the iTanh function to perform the tanh operation and the tensor's constructor to create a new tensor.

Exceptions

[Exception type thrown by iTanh or tensor constructors] If there are issues during the operation, such as memory allocation failures or incorrect input data.

Note

The time complexity of this function is O(n), where n is the number of elements in the input tensor (input.size()), as it needs to apply the tanh function to each element.

```cpp
// Assume T is either Tensor or MappedTensor
nz::data::T::shape_type shape = {2, 3};
nz::data::T input(shape, true);
nz::data::T output = Tanh(input);
```

Definition at line 126 of file TensorOperations.cuh.

◆ tensorElementwiseDivide()

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, void > nz::data::tensorElementwiseDivide	(	T &	out,
		const T &	lhs,
		const T &	rhs )

Performs element - wise division operation on tensors with broadcast compatibility.

This template function divides each element of the tensor lhs by the corresponding element of the tensor rhs and stores the result in the tensor out. It is only enabled for types T that satisfy is_valid_tensor_type<T>::value. The shapes of the input tensors must be broadcast compatible, and their height and width dimensions must match.

Template Parameters

T	The tensor type, which must satisfy `is_valid_tensor_type<T>::value`.

Parameters

out	The output tensor where the result of the element - wise division will be stored. Memory flow: host - to - function (reference), function - to - host (modified).
lhs	The left - hand side tensor in the division operation. Memory flow: host - to - function.
rhs	The right - hand side tensor in the division operation. Memory flow: host - to - function.

Returns: None

Memory Management Strategy:

The function does not allocate or free memory for the tensors. It creates local std::vector objects (offsetC, offsetA, offsetB) to store offset values. These vectors are automatically managed by their destructors.

Exception Handling Mechanism:

Throws std::invalid_argument if the shapes of lhs and rhs are not broadcast compatible or if their height and width dimensions do not match.

Relationship with Other Components:

Depends on the shape() method of the tensor type T to access shape information, including broadcast compatibility, height, width, batch size, channel count, and strides.
Relies on the iElementwiseDivide function to perform the actual element - wise division.

Exceptions

std::invalid_argument When the shapes of lhs and rhs are not broadcast compatible or their height and width dimensions do not match.

Note

The time complexity of this function is O(m * n), where m is the product of the batch and channel dimensions of the output tensor (out.shape()[0] * out.shape()[1]), and n is the number of elements in a single matrix (lhs.shape().H() * lhs.shape().W()).

```cpp
// Assume we have a valid tensor type Tensor
Tensor out;
Tensor lhs;
Tensor rhs;
try {
    tensorElementwiseDivide(out, lhs, rhs);
} catch (const std::invalid_argument& e) {
    std::cerr << e.what() << std::endl;
}
```

Definition at line 928 of file TensorOperations.cuh.

◆ tensorGeneralMatrixMul()

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, void > nz::data::tensorGeneralMatrixMul	(	T &	out,
		const T &	lhs,
		const T &	rhs )

Performs general matrix multiplication on tensors with broadcast compatibility.

This template function multiplies the tensor lhs by the tensor rhs and stores the result in the tensor out. It is only enabled for types T that satisfy is_valid_tensor_type<T>::value. The shapes of the input tensors must be broadcast compatible, and the width of lhs must be equal to the height of rhs.

Template Parameters

T	The tensor type, which must satisfy `is_valid_tensor_type<T>::value`.

Parameters

out	The output tensor that will hold the result of the matrix multiplication. Memory flow: host-to-function (reference), function-to-host (modified).
lhs	The left-hand side tensor in the matrix multiplication. Memory flow: host-to-function.
rhs	The right-hand side tensor in the matrix multiplication. Memory flow: host-to-function.

Returns: None

Memory Management Strategy:

The function does not allocate or free memory for the tensors themselves. It creates local std::vector objects (offsetC, offsetA, offsetB) to store offset values. These vectors are automatically managed by their destructors.

Exception Handling Mechanism:

Throws std::invalid_argument if the shapes of lhs and rhs are not broadcast compatible or if the width of lhs is not equal to the height of rhs.

Relationship with Other Components:

Depends on the shape() method of the tensor type T to obtain shape information, such as broadcast compatibility, height, width, batch size, channel count, and strides.
Relies on the iGeneralMatrixMul function to perform the actual matrix multiplication.

Exceptions

std::invalid_argument When the shapes of lhs and rhs are not broadcast compatible or the width of lhs is not equal to the height of rhs.

Note

The time complexity of this function is O(m * k * n), where m is the height of lhs, k is the width of lhs (equal to the height of rhs), and n is the width of rhs.

```cpp
// Assume we have a valid tensor type Tensor
Tensor out;
Tensor lhs;
Tensor rhs;
try {
    tensorGeneralMatrixMul(out, lhs, rhs);
} catch (const std::invalid_argument& e) {
    std::cerr << e.what() << std::endl;
}
```

Definition at line 1000 of file TensorOperations.cuh.

◆ tensorMatrixAdd()

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, void > nz::data::tensorMatrixAdd	(	T &	out,
		const T &	lhs,
		const T &	rhs )

Performs matrix addition operation on tensors with broadcast compatibility.

This function is a template function that adds two tensors lhs and rhs and stores the result in out. It only accepts tensor types for which is_valid_tensor_type<T>::value is true. The shapes of the input tensors must be broadcast compatible, and the height and width dimensions must match.

Template Parameters

T	The tensor type. This type must satisfy `is_valid_tensor_type<T>::value`.

Parameters

out	The output tensor where the result of the addition will be stored. Memory flow: host-to-function (for reference), function-to-host (modifies the object).
lhs	The left-hand side tensor of the addition. Memory flow: host-to-function.
rhs	The right-hand side tensor of the addition. Memory flow: host-to-function.

Returns: None

Memory Management Strategy:

This function does not allocate or free any additional memory for the tensors. It only uses local std::vector objects (offsetC, offsetA, offsetB) to store offset values, and these vectors are automatically managed by their destructors.

Exception Handling Mechanism:

Throws std::invalid_argument if the shapes of lhs and rhs are not broadcast compatible or if their height and width dimensions do not match.

Relationship with Other Components:

Depends on the shape() method of the tensor type T to access shape information, including broadcast compatibility, height, width, number of batches, number of channels, and strides.
Relies on the iMatrixAdd function to perform the actual matrix addition operation.

Exceptions

std::invalid_argument When the shapes of lhs and rhs are not broadcast compatible or their height and width dimensions do not match.

Note

The time complexity of this function is O(m * n), where m is the product of the batch and channel dimensions of the output tensor (out.shape()[0] * out.shape()[1]), and n is the number of elements in a single matrix (lhs.shape().H() * lhs.shape().W()).

```cpp
// Assume we have a valid tensor type Tensor
Tensor out;
Tensor lhs;
Tensor rhs;
try {
    tensorMatrixAdd(out, lhs, rhs);
} catch (const std::invalid_argument& e) {
    std::cerr << e.what() << std::endl;
}
```

Definition at line 787 of file TensorOperations.cuh.

◆ tensorMatrixSub()

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, void > nz::data::tensorMatrixSub	(	T &	out,
		const T &	lhs,
		const T &	rhs )

Performs matrix subtraction operation on tensors with broadcast compatibility.

This template function subtracts the tensor rhs from the tensor lhs and stores the result in the tensor out. It is only enabled for types T that satisfy is_valid_tensor_type<T>::value. The shapes of the input tensors must be broadcast compatible, and their height and width dimensions must match.

Template Parameters

T	The tensor type, which must meet the condition `is_valid_tensor_type<T>::value`.

Parameters

out	The output tensor that will hold the result of the subtraction. Memory flow: host-to-function (reference), function-to-host (modified).
lhs	The left-hand side tensor in the subtraction operation. Memory flow: host-to-function.
rhs	The right-hand side tensor in the subtraction operation. Memory flow: host-to-function.

Returns: None

Memory Management Strategy:

The function does not allocate or free memory for the tensors themselves. It creates local std::vector objects (offsetC, offsetA, offsetB) to store offset values. These vectors are automatically managed by their destructors.

Exception Handling Mechanism:

Throws std::invalid_argument if the shapes of lhs and rhs are not broadcast compatible or if their height and width dimensions do not match.

Relationship with Other Components:

Depends on the shape() method of the tensor type T to obtain shape information, such as broadcast compatibility, height, width, batch size, channel count, and strides.
Relies on the iMatrixSub function to perform the actual matrix subtraction.

Exceptions

std::invalid_argument When the shapes of lhs and rhs are not broadcast compatible or their height and width dimensions do not match.

Note

The time complexity of this function is O(m * n), where m is the product of the batch and channel dimensions of the output tensor (out.shape()[0] * out.shape()[1]), and n is the number of elements in a single matrix (lhs.shape().H() * lhs.shape().W()).

```cpp
// Assume we have a valid tensor type Tensor
Tensor out;
Tensor lhs;
Tensor rhs;
try {
    tensorMatrixSub(out, lhs, rhs);
} catch (const std::invalid_argument& e) {
    std::cerr << e.what() << std::endl;
}
```

Definition at line 858 of file TensorOperations.cuh.

◆ transpose()

template<typename T >

std::enable_if_t< is_valid_tensor_type< T >::value, T > nz::data::transpose ( const T & in )

Transposes a tensor with a valid tensor type.

This template function transposes the input tensor in and returns a new tensor result. It is only enabled for types T that satisfy is_valid_tensor_type<T>::value.

Template Parameters

T	The tensor type, which must satisfy `is_valid_tensor_type<T>::value`.

Parameters

in	The input tensor to be transposed. Memory flow: host - to - function.

Returns: A new tensor result which is the transposed version of the input tensor in. Memory flow: function - to - host.

Memory Management Strategy:

A new tensor result is created inside the function to store the transposed data. The memory for this tensor is managed by the tensor type T itself.
The function creates a local std::vector object offset to store offset values. This vector is automatically managed by its destructor.

Exception Handling Mechanism:

This function does not throw any exceptions explicitly. However, exceptions may be thrown by the constructor of the tensor type T or the iTranspose function.

Relationship with Other Components:

Depends on the shape() method of the tensor type T to access shape information, including dimensions and strides.
Relies on the iTranspose function to perform the actual transpose operation.

Note

The time complexity of this function is O(m * n), where m is the product of the first two dimensions of the input tensor (in.shape()[0] * in.shape()[1]), and n is the product of the last two dimensions (in.shape()[2] * in.shape()[3]).
Ensure that the iTranspose function is correctly implemented and that the tensor types support the necessary shape and data access methods.

Warning

Incorrect implementation of the iTranspose function may lead to incorrect results or runtime errors.

```cpp
// Assume we have a valid tensor type Tensor
Tensor in;
Tensor transposed = transpose(in);
```

Definition at line 1073 of file TensorOperations.cuh.

Classes

Functions

Detailed Description

Function Documentation

◆ ELU()

◆ HardSigmoid()

◆ HardSwish()

◆ LeakyReLU()

◆ operator*() [1/2]

◆ operator*() [2/2]

◆ operator+() [1/2]

◆ operator+() [2/2]

◆ operator-() [1/2]

◆ operator-() [2/2]

◆ operator/() [1/2]

◆ operator/() [2/2]

◆ operator<<() [1/2]

◆ operator<<() [2/2]

◆ operator>>() [1/2]

◆ operator>>() [2/2]

◆ ReLU()

◆ Sigmoid()

◆ Softmax()

◆ Swish()

◆ Tanh()

◆ tensorElementwiseDivide()

◆ tensorGeneralMatrixMul()

◆ tensorMatrixAdd()

◆ tensorMatrixSub()

◆ transpose()