NeuZephyr
Simple DL Framework
nz::data::MappedTensor Class Reference

A class for representing multidimensional arrays in CUDA zero-copy memory, providing host-accessible container-like interfaces. More...

Public Member Functions

Constructors and Destructors
 MappedTensor (const shape_type &shape, bool requires_grad=false)
 Constructs a MappedTensor object.
 
 MappedTensor ()
 Constructs a default MappedTensor object.
 
 MappedTensor (const MappedTensor &other)
 Copy constructor for the MappedTensor class.
 
 MappedTensor (MappedTensor &&other) noexcept
 Move constructor for the MappedTensor class.
 
MappedTensoroperator= (const MappedTensor &other)
 Copy assignment operator for the MappedTensor class.
 
MappedTensoroperator= (MappedTensor &&other) noexcept(false)
 Move assignment operator for the MappedTensor class.
 
 ~MappedTensor () noexcept(false)
 Destructor for the MappedTensor class.
 
Getters and Setters
iterator begin () const
 Returns an iterator pointing to the first element of the MappedTensor.
 
iterator end () const
 Returns an iterator pointing to the past - the - end element of the MappedTensor.
 
bool requiresGrad () const noexcept
 Checks whether the MappedTensor requires gradient computation.
 
value_type * data () const noexcept
 Retrieves a pointer to the underlying data array of the MappedTensor.
 
value_type * grad () const
 Retrieves the gradient pointer of the MappedTensor.
 
size_type size () const noexcept
 Retrieves the total number of elements in the MappedTensor.
 
shape_type shape () const noexcept
 Retrieves the shape of the MappedTensor.
 
void setRequiresGrad (bool requires_grad)
 Sets the gradient requirement flag for the MappedTensor and manages the associated gradient memory accordingly.
 
void setShape (const shape_type &shape)
 Sets a new shape for the MappedTensor and adjusts its data and gradient memory accordingly.
 
void dataInject (float *data, size_type size, bool isGrad=false) const
 Inject data into either the tensor's main data or its gradient.
 
template<typename Iterator >
void dataInject (Iterator begin, Iterator end, const bool isGrad=false) const
 Inject data from an iterator range into either the tensor's data or its gradient.
 
void dataInject (const std::initializer_list< value_type > &data, const bool isGrad=false) const
 Inject data from a std::initializer_list into either the tensor's data or its gradient.
 
auto operator[] (size_type index) const -> value_type &
 Overload the [] operator to access an element of the MappedTensor by index.
 
Printer
std::ostream & print (std::ostream &os) const
 Print the tensor data in a matrix-like format to an output stream.
 
std::ostream & printGrad (std::ostream &os) const
 Print the gradient of the tensor in a matrix-like format to an output stream.
 
Modifiers
void clear () const
 Clear the data stored in the MappedTensor by setting all elements to zero.
 
void clearGrad () const
 Clear the gradient data of the MappedTensor if it requires gradients.
 
void reshape (const shape_type &shape)
 Reshape the MappedTensor to a new shape.
 
void randomize (size_type seed=0, bool isGrad=false) const
 Randomize the data or gradients of the MappedTensor using a given seed.
 
void fill (value_type value, bool isGrad=false) const
 Fill the data or gradients of the MappedTensor with a given value.
 
void fillMatrix (value_type value, size_type batch, size_type channels, bool isGrad=false)
 Fills a specific matrix within the MappedTensor with a given value.
 
void transpose ()
 Transpose the MappedTensor and its gradients (if required).
 
Math
MappedTensor operator+ (const MappedTensor &other) const
 Perform element-wise addition between two MappedTensors.
 
MappedTensor operator- (const MappedTensor &other) const
 Perform element-wise subtraction between two MappedTensors.
 
MappedTensor operator* (const MappedTensor &other) const
 Perform matrix multiplication between two MappedTensors.
 
MappedTensor operator- () const
 Perform element-wise negation on the MappedTensor.
 
bool operator== (const MappedTensor &other) const
 Checks if two MappedTensor objects are equal.
 
bool operator!= (const MappedTensor &other) const
 Checks if two MappedTensor objects are not equal.
 
MappedTensor operator/ (const MappedTensor &other) const
 Perform element-wise division between two MappedTensors.
 
void recip ()
 Compute the reciprocal of each element in the MappedTensor.
 
value_type sum () const
 Calculate the sum of all elements in the MappedTensor.
 
value_type sum (size_t batch, size_t channel) const
 Calculate the sum of elements in a specific batch and channel of a MappedTensor.
 
value_type expSum () const
 Calculate the sum of the exponential values of all elements in the MappedTensor.
 
value_type expSum (size_t batch, size_t channel) const
 Computes the sum of exponential values for a given batch and channel in the MappedTensor.
 
void syncGrad () const
 Synchronizes the gradient data if gradient computation is required.
 
void syncData () const
 Synchronizes the tensor data by waiting for all CUDA stream write operations on it to finish.
 
void sync () const
 Synchronizes the tensor data and its gradient.
 

Friends

DL_API std::ostream & operator<< (std::ostream &os, const MappedTensor &tensor)
 Overload the << operator to print a MappedTensor object to an output stream.
 
DL_API std::istream & operator>> (std::istream &is, MappedTensor &tensor)
 Overload the >> operator to read data from an input stream into a MappedTensor object.
 

Detailed Description

A class for representing multidimensional arrays in CUDA zero-copy memory, providing host-accessible container-like interfaces.

The MappedTensor class offers similar functionality to the Tensor class but with data stored in pinned zero-copy memory. This design enables direct host/device memory access patterns and container-style operations at the cost of reduced computational performance compared to regular GPU memory.

Type Definitions:

  • size_type: Alias for unsigned long long, supports 64-bit indexing for large tensors.
  • value_type: Alias for float, consistent with standard numerical computation types.
  • shape_type: Alias for std::vector<int>, represents tensor dimensions (e.g., {256, 256} for an image tensor).
  • iterator: Alias for value_type*, provides STL-style iterator access to tensor elements.

Key Differentiators from Tensor:

  • Zero-Copy Memory: Utilizes CUDA pinned memory accessible by both host and device without explicit transfers
  • Host-Side Interoperability: Supports STL-style iterators, range-based loops, and direct data access like std::vector
  • Container Compatibility: Works seamlessly with standard algorithms (std::copy, std::transform, etc.)
  • Performance Tradeoff: Optimized for accessibility over speed, suitable for IO-bound operations

Recommended Use Cases:

  • Frequent host-device data exchange scenarios
  • Prototyping with direct host-side data manipulation
  • Situations requiring container semantics with GPU data

Usage Example:

using namespace nz::data;
// Create mapped tensor with 3x3 shape
MappedTensor mtensor({3, 3});
// Host-accessible modification via iterators
std::fill(mtensor.begin(), mtensor.end(), 1.0f);
// Direct host-side data processing
for(auto& val : mtensor) {
val = std::sqrt(val);
}
// Seamless GPU computation
Tensor result = ReLU(mtensor); // Works with existing Tensor operations
A class for representing multidimensional arrays in CUDA zero-copy memory, providing host-accessible ...
A class for representing and manipulating multidimensional arrays (tensors) in GPU memory.
Definition Tensor.cuh:134
Contains data structures and utilities for tensor operations in machine learning workflows.
Definition Dimension.cuh:9
std::enable_if_t< is_valid_tensor_type< T >::value, T > ReLU(T &input)
Apply the Rectified Linear Unit (ReLU) activation function element-wise to an input tensor.
Note
  • Memory Characteristics: Zero-copy memory typically offers higher allocation latency but unified access
  • Concurrency Considerations: Ensure proper synchronization between host/device accesses
  • Performance Guidance: Prefer Tensor for compute-intensive kernels, use MappedTensor for data pipelines
  • Lifecycle Management: Pinned memory requires careful resource management - prefer RAII patterns
Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2024/11/29

Definition at line 66 of file MappedTensor.cuh.

Constructor & Destructor Documentation

◆ MappedTensor() [1/4]

nz::data::MappedTensor::MappedTensor ( const shape_type & shape,
bool requires_grad = false )
explicit

Constructs a MappedTensor object.

Parameters
shapeThe shape of the tensor. This is a reference (host-to-host) to the shape_type object which defines the dimensions of the tensor.
requires_gradA boolean value indicating whether the tensor requires gradient computation.
Returns
None (constructor).

This constructor initializes the MappedTensor object with the given shape and gradient requirement. It calculates the size of the tensor based on the provided shape. Memory for the data buffer is allocated using cudaMallocHost. If the requires_grad flag is set to true, memory for the gradient buffer is also allocated using cudaMallocHost. If requires_grad is false, the gradient buffer pointer is set to nullptr. The memory allocated by cudaMallocHost should be freed by the corresponding cudaFreeHost call when it is no longer needed. There is no explicit exception handling in this constructor, but the CHECK macro is assumed to handle errors related to CUDA memory allocation. This constructor is a fundamental part of the MappedTensor class and is used to initialize new tensor objects.

Note
  • The CHECK macro is assumed to handle CUDA errors properly. Ensure that the CUDA environment is properly configured before using this constructor.
Warning
  • CUDA memory allocation may fail if there is not enough available memory on the host. Ensure that the host system has sufficient memory before creating a MappedTensor object.
```cpp
```
Represents a multi - dimensional shape, typically used in deep learning for tensor dimensions.
Definition Dimension.cuh:57
shape_type shape() const noexcept
Retrieves the shape of the MappedTensor.

Definition at line 88 of file MappedTensor.cu.

◆ MappedTensor() [2/4]

nz::data::MappedTensor::MappedTensor ( )

Constructs a default MappedTensor object.

Parameters
None
Returns
None (constructor).

This default constructor initializes the MappedTensor object by delegating to the parameterized constructor with a shape of {0, 0} and a requires_grad value of false. After the delegation, it explicitly sets the _data and _grad pointers to nullptr. The memory management strategy relies on the parameterized constructor's behavior. Since the shape is {0, 0}, it's likely that no actual memory will be allocated for data and gradient in this case. There is no explicit exception - handling in this constructor, but it depends on the error - handling of the parameterized constructor (presumably through the CHECK macro). This constructor provides a way to create a default - initialized MappedTensor object.

Note
  • Ensure that the parameterized constructor is implemented correctly as this constructor depends on it.
  • Since the _data and _grad pointers are set to nullptr, this object may not be suitable for direct use without re - initializing.
```cpp
```

Definition at line 100 of file MappedTensor.cu.

◆ MappedTensor() [3/4]

nz::data::MappedTensor::MappedTensor ( const MappedTensor & other)

Copy constructor for the MappedTensor class.

Parameters
otherA constant reference (host-to-host) to another MappedTensor object from which the data and properties will be copied.
Returns
None (constructor).

This copy constructor initializes a new MappedTensor object by delegating to the parameterized constructor with the shape and gradient requirement of the other MappedTensor. It then copies the data from the other object to the newly created object using cudaMemcpy. If the requires_grad flag is set to true, it also copies the gradient data. The memory for the new object is already allocated by the delegated constructor. The cudaMemcpy operations are used to transfer data between device memory locations. There is no explicit exception handling in this constructor, but the CHECK macro is assumed to handle errors related to CUDA memory copy operations. This constructor is important for creating a deep copy of a MappedTensor object.

Note
  • The CHECK macro is assumed to handle CUDA errors properly. Ensure that the CUDA environment is properly configured before using this copy constructor.
  • The cudaMemcpy operations may fail if there is not enough available memory or if the memory pointers are invalid.
Warning
  • CUDA memory copy operations may be time - consuming, especially for large tensors. Be aware of the performance implications when using this copy constructor.
```cpp
nz::data::MappedTensor original(shape, true);
nz::data::MappedTensor copy(original);
```

Definition at line 105 of file MappedTensor.cu.

Here is the call graph for this function:

◆ MappedTensor() [4/4]

nz::data::MappedTensor::MappedTensor ( MappedTensor && other)
noexcept

Move constructor for the MappedTensor class.

Parameters
otherAn rvalue reference (host-to-host) to a MappedTensor object from which resources will be moved.
Returns
None (constructor).

This move constructor transfers ownership of the resources (such as the data and gradient buffers) from the other MappedTensor object to the newly constructed object. It copies the shape, size, and requires_grad flag from the other object. Then, it takes over the pointers to the data and gradient buffers, leaving the other object in a valid but empty state. The other object's data and gradient pointers are set to nullptr, its size is set to 0, requires_grad is set to false, and the shape is set to {0, 0}. This operation is noexcept, meaning it does not throw exceptions. The memory management strategy is that the ownership of the previously allocated memory by the other object is transferred, and no new memory is allocated in this constructor. There is no need for explicit exception handling as the operation is guaranteed not to throw. This constructor is useful for efficient resource transfer during operations like returning a MappedTensor object from a function.

Note
  • After the move operation, the other object should not be used as it is left in a valid but empty state.
  • This move constructor ensures efficient resource utilization by avoiding unnecessary memory copying.
```cpp
nz::data::MappedTensor original(shape, true);
nz::data::MappedTensor moved(std::move(original));
```

Definition at line 114 of file MappedTensor.cu.

◆ ~MappedTensor()

nz::data::MappedTensor::~MappedTensor ( )

Destructor for the MappedTensor class.

This destructor is responsible for releasing the host memory allocated for the MappedTensor object.

  • First, it checks if the _requires_grad flag is set to true and the _grad pointer is not nullptr. If so, it uses cudaFreeHost to free the host memory associated with the gradient data. The CHECK macro is assumed to handle any CUDA errors that may occur during this operation.
  • Then, it checks if the _data pointer is not nullptr. If true, it uses cudaFreeHost to free the host memory associated with the tensor data, again relying on the CHECK macro to handle potential CUDA errors.

The memory management strategy of this destructor ensures that any host memory allocated by the MappedTensor object is properly freed when the object goes out of scope, preventing memory leaks. There is no explicit exception handling in the destructor itself, but the CHECK macro is assumed to manage CUDA - related errors.

Note
  • The CHECK macro is assumed to handle CUDA errors properly. Ensure that the CUDA environment is properly configured before the destructor is called.
  • The destructor is automatically called when the MappedTensor object goes out of scope or is explicitly deleted.
```cpp
{
// tensor goes out of scope here, and the destructor is called automatically
}
```

Definition at line 179 of file MappedTensor.cu.

Here is the call graph for this function:

Member Function Documentation

◆ begin()

MappedTensor::iterator nz::data::MappedTensor::begin ( ) const
nodiscard

Returns an iterator pointing to the first element of the MappedTensor.

Returns
An iterator (host-to-host) of type MappedTensor::iterator pointing to the first element of the tensor's data.

This function provides a way to access the first element of the MappedTensor in a sequential manner. It simply returns the pointer _data as an iterator, allowing users to traverse the tensor's elements using standard iterator operations.

The memory management strategy is to rely on the existing memory allocation of the MappedTensor object. The iterator points to the same memory location as _data, and no new memory is allocated or freed in this function. There is no explicit exception handling in this function, as it is a simple pointer return operation and is not expected to throw exceptions under normal circumstances. This function is often used in combination with other standard library algorithms and range - based for loops to iterate over the tensor's elements.

Note
  • Ensure that the MappedTensor object is properly initialized before calling this function, as an uninitialized object may lead to undefined behavior.
  • The returned iterator is valid as long as the MappedTensor object exists and its underlying data is not reallocated or modified in a way that invalidates the pointer.
```cpp
nz::data::MappedTensor::iterator it = tensor.begin();
```

Definition at line 188 of file MappedTensor.cu.

Here is the call graph for this function:

◆ clear()

void nz::data::MappedTensor::clear ( ) const

Clear the data stored in the MappedTensor by setting all elements to zero.

Parameters
None
Returns
None

This function uses CUDA's cudaMemset to set all elements of the _data array to zero. It is designed to quickly reset the tensor's data.

Memory management: The function does not allocate or deallocate memory. It simply modifies the existing data in the _data array. Exception handling: The CHECK macro is used to handle potential CUDA errors. If cudaMemset fails, the CHECK macro will handle the error according to its implementation, which may include logging and terminating the program. Relationship with other components: This function is related to the data management component of the MappedTensor. It provides a way to reset the tensor's data to a known state.

Exceptions
Noneexplicitly, but the CHECK macro may handle and report CUDA errors.
Note
  • The time complexity of this function is O(n), where n is the number of elements in the tensor (_size), as cudaMemset needs to set each element to zero.
  • Ensure that the CUDA environment is properly initialized before calling this function.
Warning
  • If the CUDA environment is not set up correctly, cudaMemset may fail, and the CHECK macro will handle the error, which may lead to program termination.
```cpp
tensor.dataInject({1, 2, 3, 4, 5, 6}, false);
tensor.clear();
```

Definition at line 334 of file MappedTensor.cu.

Here is the call graph for this function:

◆ clearGrad()

void nz::data::MappedTensor::clearGrad ( ) const

Clear the gradient data of the MappedTensor if it requires gradients.

Parameters
None
Returns
None

This function is used to reset the gradient data of a MappedTensor to zero. It first checks if the tensor requires gradients. If it does, it uses CUDA's cudaMemset to set all elements of the _grad array to zero. Otherwise, it throws a std::runtime_error.

Memory management: The function does not allocate or deallocate memory. It only modifies the existing _grad array. Exception handling: If the tensor does not require gradients, a std::runtime_error is thrown. If cudaMemset fails, the CHECK macro will handle the CUDA error according to its implementation. Relationship with other components: This function is related to the gradient management component of the MappedTensor. It provides a way to reset the gradient data for tensors that support gradient computation.

Exceptions
std::runtime_errorIf the tensor does not require gradients.
Note
  • The time complexity of this function is O(n), where n is the number of elements in the tensor (_size), as cudaMemset needs to set each element of the gradient array to zero.
  • Ensure that the CUDA environment is properly initialized before calling this function if the tensor requires gradients.
Warning
  • Attempting to clear gradients for a tensor that does not require gradients will result in a runtime error.
  • If the CUDA environment is not set up correctly and the tensor requires gradients, cudaMemset may fail, and the CHECK macro will handle the error, which may lead to program termination.
```cpp
try {
tensor.clearGrad();
} catch (const std::runtime_error& e) {
std::cerr << e.what() << std::endl;
}
```

Definition at line 338 of file MappedTensor.cu.

Here is the call graph for this function:

◆ data()

MappedTensor::value_type * nz::data::MappedTensor::data ( ) const
nodiscardnoexcept

Retrieves a pointer to the underlying data array of the MappedTensor.

Returns
A pointer (host-to-host) of type value_type* that points to the first element of the MappedTensor's data array.

This function offers direct access to the raw data stored within the MappedTensor. It is useful for operations that require low - level manipulation of the data, such as interacting with external libraries that expect a raw pointer.

The memory management strategy is to return a pointer to the existing memory allocated for the MappedTensor. No new memory is allocated or freed during this function call. The caller should not attempt to deallocate the returned pointer, as the memory is managed by the MappedTensor object. There is no exception handling mechanism in this function because it is declared noexcept, meaning it will not throw any exceptions under normal circumstances. This function can be used in combination with other functions that operate on raw data pointers, facilitating seamless integration with other parts of the system.

Note
  • This function is a const member function, so it can be called on const MappedTensor objects.
  • The [[nodiscard]] attribute indicates that the return value should not be ignored, as it provides access to the core data of the MappedTensor.
  • Ensure that the MappedTensor object is valid when using the returned pointer, as an invalid object may lead to undefined behavior.
```cpp
nz::data::MappedTensor::value_type* ptr = tensor.data();
```

Definition at line 202 of file MappedTensor.cu.

◆ dataInject() [1/3]

void nz::data::MappedTensor::dataInject ( const std::initializer_list< value_type > & data,
const bool isGrad = false ) const

Inject data from a std::initializer_list into either the tensor's data or its gradient.

Parameters
dataA std::initializer_list<value_type> (host-to-host) containing the data to be injected.
isGradA boolean value (host-to-host) indicating whether to inject the data into the gradient (true) or the tensor data (false).
Returns
void

This function transfers data from a std::initializer_list to the MappedTensor. Based on the isGrad flag, it assigns the values from the initializer list to either the tensor's main data or its gradient.

Memory management: The caller is responsible for the memory of the std::initializer_list. The function only reads values from the list and copies them into the tensor's internal memory. It ensures that at most the minimum of the list size and the tensor's size (_size) is copied to prevent out-of-bounds access. Exception handling: If isGrad is true and the tensor does not require gradients (_requires_grad is false), a std::invalid_argument exception is thrown. Relationship with other components: This function interacts with the data storage and gradient management components of the MappedTensor. It provides a convenient way to initialize the tensor's data or gradient using an initializer list.

Exceptions
std::invalid_argumentIf isGrad is true and the tensor does not require gradients.
Note
  • The function has a time complexity of O(min(n, _size)), where n is the number of elements in the std::initializer_list, as it iterates over the list to copy the data.
  • Ensure that the std::initializer_list contains elements of the correct type (value_type).
```cpp
tensor.dataInject({1, 2, 3, 4, 5, 6}, false);
```

Definition at line 254 of file MappedTensor.cu.

Here is the call graph for this function:

◆ dataInject() [2/3]

void nz::data::MappedTensor::dataInject ( float * data,
size_type size,
bool isGrad = false ) const

Inject data into either the tensor's main data or its gradient.

Parameters
dataA pointer to an array of value_type (host-to-device) that contains the data to be injected.
sizeA size_type value (host-to-device) representing the number of elements in the data array.
isGradA boolean value (host-to-device) indicating whether the data should be injected into the gradient (true) or the main tensor data (false).
Returns
void

This function is designed to inject external data into the MappedTensor. Depending on the isGrad flag, it will copy the provided data either into the tensor's main data or its gradient.

Memory management: The caller is responsible for allocating and deallocating the memory of the data array on the host side. The function uses cudaMemcpy to transfer the data from the host to the device memory of the tensor or its gradient. It only copies the minimum of size and _size elements to prevent out - of - bounds access. Exception handling: If isGrad is true and the tensor does not require gradients (_requires_grad is false), a std::invalid_argument exception is thrown. Additionally, if the cudaMemcpy operation fails, the CHECK macro is expected to handle the error, potentially throwing an exception or terminating the program. Relationship with other components: This function is closely related to the data storage and gradient management components of the MappedTensor. It provides a way to update the tensor's data or gradient values from an external source.

Exceptions
std::invalid_argumentIf isGrad is true and the tensor does not require gradients.
Anexception might be thrown by the CHECK macro if the cudaMemcpy operation fails.
Note
  • Ensure that the CUDA runtime environment is properly initialized before calling this function, as it uses cudaMemcpy for data transfer.
  • The CHECK macro is assumed to handle CUDA errors correctly. Any issues with the CUDA operations will be reported through this macro.
  • The function only copies the minimum of size and _size elements to avoid out - of - bounds access.
Warning
  • Incorrect usage of this function can lead to data corruption or CUDA errors. For example, if the CUDA environment is not set up correctly, the cudaMemcpy operation may fail unexpectedly.
```cpp
nz::data::MappedTensor::value_type data[] = {1, 2, 3, 4, 5, 6};
tensor.dataInject(data, 6, false);
```
value_type * data() const noexcept
Retrieves a pointer to the underlying data array of the MappedTensor.

Definition at line 237 of file MappedTensor.cu.

Here is the call graph for this function:

◆ dataInject() [3/3]

template<typename Iterator >
void nz::data::MappedTensor::dataInject ( Iterator begin,
Iterator end,
const bool isGrad = false ) const
inline

Inject data from an iterator range into either the tensor's data or its gradient.

Template Parameters
IteratorThe type of the iterator used to access the data source.
Parameters
beginAn iterator (host-to-host) pointing to the start of the data range to be injected.
endAn iterator (host-to-host) pointing past the end of the data range to be injected.
isGradA boolean value (host-to-host) indicating whether to inject the data into the gradient (true) or the tensor data (false). Defaults to false.
Returns
void

This function is designed to transfer data from an iterator range into the MappedTensor. It iterates through the given range and assigns the values to either the tensor's main data or its gradient, based on the isGrad flag.

Memory management: The caller is responsible for the memory occupied by the data source pointed to by the iterators. The function only reads values from the iterators and copies them into the tensor's internal memory. It ensures that at most the minimum of the range size and the tensor's size (_size) is copied to prevent out-of-bounds access. Exception handling: If isGrad is true and the tensor does not require gradients (_requires_grad is false), a std::invalid_argument exception is thrown. Additionally, the function assumes that the iterators are valid and well-behaved. If operations on the iterators (such as std::distance, dereferencing, or incrementing) throw exceptions, those exceptions will be propagated. Relationship with other components: This function interacts closely with the data storage and gradient management components of the MappedTensor. It provides a flexible way to update the tensor's data or gradient values using different data sources accessible via iterators.

Exceptions
std::invalid_argumentIf isGrad is true and the tensor does not require gradients.
Note
  • Ensure that the iterators begin and end form a valid range. Using invalid iterators may lead to undefined behavior.
  • The function has a time complexity of O(min(n, _size)), where n is the number of elements in the iterator range, as it iterates over the range to copy the data.
  • The data from the iterators is cast to value_type before being assigned to the tensor or its gradient.
```cpp
#include <vector>
std::vector<nz::data::MappedTensor::value_type> data = {1, 2, 3, 4, 5, 6};
tensor.dataInject(data.begin(), data.end(), false);
```

Definition at line 590 of file MappedTensor.cuh.

◆ end()

MappedTensor::iterator nz::data::MappedTensor::end ( ) const
nodiscard

Returns an iterator pointing to the past - the - end element of the MappedTensor.

Returns
An iterator (host-to-host) of type MappedTensor::iterator pointing to the past - the - end element of the tensor's data.

This function is used to mark the end of the range of elements in the MappedTensor. It calculates the iterator by adding the _size of the tensor to the _data pointer. This allows for standard iteration techniques where the loop continues until the iterator reaches the end() iterator.

The memory management strategy is to rely on the existing memory allocation of the MappedTensor object. The iterator points to a memory location just past the last element of the tensor's data, and no new memory is allocated or freed in this function. There is no explicit exception handling in this function, as it is a simple pointer arithmetic operation and is not expected to throw exceptions under normal circumstances. This function is commonly used in combination with begin() to iterate over all the elements of the MappedTensor using standard library algorithms or range - based for loops.

Note
  • Ensure that the MappedTensor object is properly initialized before calling this function, as an uninitialized object may lead to undefined behavior.
  • The returned iterator is valid as long as the MappedTensor object exists and its underlying data is not reallocated or modified in a way that invalidates the pointer.
```cpp
nz::data::MappedTensor::iterator it_end = tensor.end();
```

Definition at line 193 of file MappedTensor.cu.

Here is the call graph for this function:

◆ expSum() [1/2]

MappedTensor::value_type nz::data::MappedTensor::expSum ( ) const
nodiscard

Calculate the sum of the exponential values of all elements in the MappedTensor.

Returns
The sum of the exponential values of all elements in the MappedTensor as a value of type MappedTensor::value_type.

This function computes the sum of the exponential values of all elements within the MappedTensor. It first determines the CUDA block and grid dimensions based on the size of the tensor. Then, it allocates pinned host memory using cudaMallocHost to store the intermediate results. The krnl::SummationExp CUDA kernel is launched to calculate the partial sums of the exponential values on the device. After the kernel execution, the function synchronizes the device using cudaDeviceSynchronize to ensure all operations are completed. Finally, it sums up the partial results on the host, frees the allocated pinned host memory, and returns the total sum.

Memory management:

  • Pinned host memory is allocated for dData using cudaMallocHost and freed using cudaFreeHost.

Exception handling:

  • The CHECK macro is used to handle CUDA API errors. If any CUDA API call fails, the CHECK macro will throw an exception, causing the function to terminate.

Relationship with other components:

  • This function relies on the krnl::SummationExp CUDA kernel to perform partial sums of exponential values on the device.
  • It also depends on the CHECK macro to handle CUDA API errors and cudaDeviceSynchronize for device synchronization.
Exceptions
[Exceptiontype thrown by CHECK macro] If there are CUDA API errors during memory allocation, kernel execution, or memory synchronization.
Note
  • The time complexity of this function is approximately O(n), where n is the number of elements in the MappedTensor (_size). The CUDA kernel parallelizes the partial sum calculation of exponential values, and the final sum on the host is a linear operation over the number of grid blocks.
  • Ensure that the CUDA device is properly initialized before calling this function.
  • Pinned host memory allocation may have limitations, so be aware of potential memory constraints.
```cpp
nz::data::MappedTensor mapped_tensor({2, 3}, true);
// Assume mapped_tensor is filled with some values
nz::data::MappedTensor::value_type exp_sum_result = mapped_tensor.expSum();
```
value_type expSum() const
Calculate the sum of the exponential values of all elements in the MappedTensor.

Definition at line 562 of file MappedTensor.cu.

Here is the call graph for this function:

◆ expSum() [2/2]

MappedTensor::value_type nz::data::MappedTensor::expSum ( size_t batch,
size_t channel ) const
nodiscard

Computes the sum of exponential values for a given batch and channel in the MappedTensor.

This function calculates the sum of exponential values for a specific batch and channel within the MappedTensor. It first checks if the provided batch and channel indices are valid. If so, it launches a CUDA kernel to perform the exponential summation and then aggregates the partial results on the host.

Parameters
batchThe batch index. Memory flow: host-to-function, as the value is passed from the calling code to the function.
channelThe channel index. Memory flow: host-to-function, as the value is passed from the calling code to the function.
Returns
The sum of exponential values for the specified batch and channel. Memory flow: function-to-host, as the result is returned from the function to the calling code.

Memory Management Strategy:

  • The function allocates host memory for the intermediate results using cudaMallocHost. This memory is managed by the CUDA runtime.
  • After aggregating the results, the allocated host memory is freed using cuStrm::StreamManager<value_type>::Instance().freeHost.

Exception Handling Mechanism:

  • If the provided batch or channel indices are out of bounds, the function throws an std::invalid_argument exception.

Relationship with Other Components:

  • This function depends on the krnl::SummationExp CUDA kernel to perform the exponential summation on the device.
  • It also relies on the cuStrm::StreamManager<value_type> to synchronize data and free host memory.
Exceptions
std::invalid_argumentIf the provided batch or channel indices are out of bounds.
Note
  • The time complexity of this function is O(n), where n is the number of grid blocks (grid.x). This is due to the loop that aggregates the partial results.
  • Ensure that the CUDA kernel krnl::SummationExp is correctly implemented and that the cuStrm::StreamManager<value_type> functions work as expected.
  • This function involves CUDA operations, so it requires a CUDA - enabled environment.
Warning
  • Incorrect usage of CUDA functions may lead to memory leaks or other CUDA - related errors.
```cpp
MappedTensor tensor;
size_t batch = 0;
size_t channel = 1;
try {
MappedTensor::value_type result = tensor.expSum(batch, channel);
// Use the result
} catch (const std::invalid_argument& e) {
std::cerr << e.what() << std::endl;
}
```

Definition at line 577 of file MappedTensor.cu.

Here is the call graph for this function:

◆ fill()

void nz::data::MappedTensor::fill ( value_type value,
bool isGrad = false ) const

Fill the data or gradients of the MappedTensor with a given value.

Parameters
valueThe value used to fill the tensor's data or gradients (host-to-device).
isGradA boolean flag indicating whether to fill the gradients or the data. If true, gradients are filled; otherwise, data is filled (host-to-device).
Returns
None

This function fills either the data or the gradients of the MappedTensor with the specified value. It determines the appropriate CUDA grid and block dimensions based on the size of the tensor, and then invokes the krnl::Fill kernel to perform the filling operation.

Memory management: The function does not allocate or deallocate the tensor's data or gradient memory. It only modifies the existing memory in-place. Exception handling: It is assumed that the krnl::Fill kernel handles its own errors and may throw exceptions in case of issues. If an exception occurs in the kernel, it will propagate up. Relationship with other components: This function depends on the krnl::Fill kernel to perform the actual filling operation. It provides a high - level interface for initializing the tensor's data or gradients.

Note
  • The time complexity of this function is O(n), where n is the number of elements in the tensor (_size), as it needs to set each element to the given value.
  • Ensure that the CUDA environment is properly configured and the krnl::Fill kernel is correctly implemented before calling this function.
Warning
  • If the CUDA device is not properly initialized or the krnl::Fill kernel has implementation issues, this function may fail.
```cpp
tensor.fill(1.0f, false);
tensor.fill(0.0f, true);
```

Definition at line 397 of file MappedTensor.cu.

Here is the call graph for this function:

◆ fillMatrix()

void nz::data::MappedTensor::fillMatrix ( value_type value,
size_type batch,
size_type channels,
bool isGrad = false )

Fills a specific matrix within the MappedTensor with a given value.

This function fills a particular matrix in the MappedTensor, specified by the batch and channel indices, with the provided value. It can also fill the gradient matrix if the tensor requires gradients and the isGrad flag is set to true.

Parameters
valueThe value to fill the matrix with. Memory flow: host-to-function, as it is passed from the calling code to the function.
batchThe batch index. Memory flow: host-to-function, as it is passed from the calling code to the function.
channelsThe channel index. Memory flow: host-to-function, as it is passed from the calling code to the function.
isGradA boolean flag indicating whether to fill the gradient matrix. Memory flow: host-to-function, as it is passed from the calling code to the function.
Returns
None

Memory Management Strategy:

  • This function does not allocate or free any dynamic memory. It only modifies the existing data or gradient buffer of the MappedTensor.

Exception Handling Mechanism:

  • If the provided batch or channel indices are out of bounds, the function throws an std::invalid_argument exception.
  • If the isGrad flag is true but the tensor does not require gradients, the function throws an std::invalid_argument exception.

Relationship with Other Components:

  • This function depends on the krnl::Fill CUDA kernel to perform the actual filling operation on the device.
Exceptions
std::invalid_argumentIf the provided batch or channel indices are out of bounds, or if gradient filling is attempted on a tensor that does not require gradients.
Note
  • The time complexity of this function is O(n), where n is the number of elements in the matrix (_shape[2] * _shape[3]). This is because the CUDA kernel needs to process each element.
  • Ensure that the krnl::Fill CUDA kernel is correctly implemented.
  • This function involves CUDA operations, so it requires a CUDA - enabled environment.
Warning
  • Incorrect usage of CUDA functions may lead to memory access errors or other CUDA - related issues.
```cpp
MappedTensor tensor;
MappedTensor::value_type value = 1.0;
MappedTensor::size_type batch = 0;
MappedTensor::size_type channels = 1;
bool isGrad = false;
try {
tensor.fillMatrix(value, batch, channels, isGrad);
} catch (const std::invalid_argument& e) {
std::cerr << e.what() << std::endl;
}
```

Definition at line 407 of file MappedTensor.cu.

Here is the call graph for this function:

◆ grad()

MappedTensor::value_type * nz::data::MappedTensor::grad ( ) const
nodiscard

Retrieves the gradient pointer of the MappedTensor.

This member function is used to obtain the pointer to the gradient data of the MappedTensor. It can only be called if the tensor requires gradients.

Parameters
None
Returns
A pointer to the gradient data of type MappedTensor::value_type. Memory flow: function - to - host.

Memory Management Strategy:

  • The function does not allocate or free memory. It simply returns a pointer to the existing gradient data (_grad).

Exception Handling Mechanism:

  • Throws std::invalid_argument if the tensor does not require gradients (_requires_grad is false).

Relationship with Other Components:

  • Depends on the internal member variable _requires_grad to determine if gradient access is allowed.
  • The returned pointer _grad is managed by the MappedTensor class.
Exceptions
std::invalid_argumentWhen the tensor does not require gradients.
Note
  • This function has a constant time complexity of O(1) as it only performs a simple check and returns a pointer.
  • Ensure that the tensor requires gradients before calling this function to avoid exceptions.
Warning
  • Modifying the data pointed to by the returned pointer may affect the internal state of the MappedTensor.
```cpp
MappedTensor tensor;
try {
MappedTensor::value_type* gradPtr = tensor.grad();
// Do something with the gradient pointer
} catch (const std::invalid_argument& e) {
std::cerr << e.what() << std::endl;
}
```

Definition at line 206 of file MappedTensor.cu.

◆ operator!=()

bool nz::data::MappedTensor::operator!= ( const MappedTensor & other) const

Checks if two MappedTensor objects are not equal.

Parameters
otherThe other MappedTensor object to compare with. Memory flow: direct access from host memory (as the underlying operator== function accesses data in host memory).
Returns
Returns true if the two MappedTensor objects are not equal, false otherwise.

This function determines the non - equality of two MappedTensor objects. It achieves this by negating the result of the operator== function. Thus, its behavior is entirely dependent on the implementation of the operator== function for MappedTensor.

Memory Management Strategy:

  • This function does not allocate or free any memory. All memory - related operations are handled by the operator== function.

Exception Handling Mechanism:

  • Any exceptions that may occur during the comparison are handled by the operator== function. This function does not have its own exception - handling logic.

Relationship with Other Components:

  • It depends solely on the operator== function of the MappedTensor class.
Note
  • The time complexity of this function is the same as that of the operator== function, which is O(n), where n is the number of elements in the MappedTensor.
  • Ensure that the operator== function of MappedTensor is correctly implemented, as this function relies on it for the comparison.
```cpp
MappedTensor tensor1; // Assume MappedTensor1 is properly initialized
MappedTensor tensor2; // Assume MappedTensor2 is properly initialized
bool isNotEqual = tensor1 != tensor2;
```

Definition at line 507 of file MappedTensor.cu.

◆ operator*()

MappedTensor nz::data::MappedTensor::operator* ( const MappedTensor & other) const

Perform matrix multiplication between two MappedTensors.

Parameters
otherThe MappedTensor to be multiplied with the current MappedTensor (host-to-host).
Returns
A new MappedTensor containing the result of the matrix multiplication.

This function performs matrix multiplication between the current MappedTensor and another MappedTensor. It first checks if the number of columns in the current tensor is equal to the number of rows in the other tensor. If not, it throws a std::invalid_argument exception. Then, it creates a new MappedTensor with the appropriate shape for the result and sets the gradient requirement based on the current tensor. Next, it calculates the CUDA grid and block dimensions according to the shape of the result tensor and invokes the krnl::GeneralMatrixMul kernel to perform the actual matrix multiplication. Finally, it synchronizes the CUDA device and returns the resulting MappedTensor.

Memory management: A new MappedTensor is created to store the result, and its memory is managed automatically by the MappedTensor class. The input tensors' memory remains unchanged. Exception handling: If the matrix shapes do not match for multiplication, a std::invalid_argument exception is thrown. The CHECK macro is used to handle CUDA errors, and if a CUDA operation fails, it will throw an appropriate exception. Relationship with other components: This function depends on the krnl::GeneralMatrixMul kernel to perform the matrix multiplication and the CHECK macro to handle CUDA errors.

Exceptions
std::invalid_argumentIf the number of columns in the current MappedTensor is not equal to the number of rows in the other MappedTensor.
[Exceptiontype thrown by CHECK macro] If a CUDA operation fails.
Note
  • The time complexity of this function is O(m * n * k), where m is the number of rows in the current tensor, n is the number of columns in the other tensor, and k is the number of columns in the current tensor (which is equal to the number of rows in the other tensor).
Warning
  • If the CUDA device runs out of memory during the operation, the function may fail.
  • Incorrect implementation of the krnl::GeneralMatrixMul kernel may lead to incorrect multiplication results.
```cpp
nz::data::MappedTensor tensor1(shape1, true);
nz::data::MappedTensor tensor2(shape2, false);
nz::data::MappedTensor result = tensor1 * tensor2;
```

Definition at line 458 of file MappedTensor.cu.

Here is the call graph for this function:

◆ operator+()

MappedTensor nz::data::MappedTensor::operator+ ( const MappedTensor & other) const

Perform element-wise addition between two MappedTensors.

Parameters
otherThe MappedTensor to be added to the current MappedTensor (host-to-host).
Returns
A new MappedTensor containing the result of the element-wise addition.

This function performs an element-wise addition between the current MappedTensor and another MappedTensor. It first checks if the shapes of the two tensors are equal; if not, it throws a std::invalid_argument exception. Then, it creates a new MappedTensor with the same shape and the appropriate gradient requirement based on the two input tensors. After that, it calculates the CUDA grid and block dimensions according to the size of the tensors and invokes the krnl::MatrixAdd kernel to perform the actual addition operation. Finally, it synchronizes the CUDA device and returns the resulting MappedTensor.

Memory management: A new MappedTensor is created to store the result, and its memory is managed automatically by the MappedTensor class. The input tensors' memory remains unchanged. Exception handling: If the shapes of the two input tensors are not equal, a std::invalid_argument exception is thrown. The CHECK macro is used to handle CUDA errors, and if a CUDA operation fails, it will throw an appropriate exception. Relationship with other components: This function depends on the krnl::MatrixAdd kernel to perform the element-wise addition and the CHECK macro to handle CUDA errors.

Exceptions
std::invalid_argumentIf the shapes of the two MappedTensors are not equal.
[Exceptiontype thrown by CHECK macro] If a CUDA operation fails.
Note
  • The time complexity of this function is O(n), where n is the number of elements in the tensors (_size), as it needs to perform an addition operation for each element.
  • Ensure that the CUDA environment is properly configured and the krnl::MatrixAdd kernel is correctly implemented before calling this function.
Warning
  • If the CUDA device runs out of memory during the operation, the function may fail.
```cpp
nz::data::MappedTensor tensor2(shape, false);
nz::data::MappedTensor result = tensor1 + tensor2;
```

Definition at line 446 of file MappedTensor.cu.

Here is the call graph for this function:

◆ operator-() [1/2]

MappedTensor nz::data::MappedTensor::operator- ( ) const

Perform element-wise negation on the MappedTensor.

Returns
A new MappedTensor containing the element-wise negation of the current MappedTensor.

This function performs an element-wise negation operation on the current MappedTensor. It first calculates the CUDA grid and block dimensions based on the size of the tensor. Then, it creates a new MappedTensor with the same shape and gradient requirement as the current one. After that, it invokes the krnl::Negation kernel to perform the negation operation on each element of the tensor. Finally, it synchronizes the CUDA device and returns the resulting MappedTensor.

Memory management: A new MappedTensor is created to store the result, and its memory is managed by the MappedTensor class. The memory of the current MappedTensor remains unchanged. Exception handling: The CHECK macro is used to handle CUDA errors. If a CUDA operation fails, it will throw an appropriate exception. Relationship with other components: This function depends on the krnl::Negation kernel for the negation operation and the CHECK macro for CUDA error handling.

Exceptions
[Exceptiontype thrown by CHECK macro] If a CUDA operation fails.
Note
  • The time complexity of this function is O(n), where n is the number of elements in the tensor (_size), as it needs to perform a negation for each element.
  • Ensure that the CUDA environment is properly configured and the krnl::Negation kernel is correctly implemented before using this function.
Warning
  • If the CUDA device runs out of memory during the operation, the function may fail.
```cpp
nz::data::MappedTensor negatedTensor = -tensor;
```

Definition at line 467 of file MappedTensor.cu.

Here is the call graph for this function:

◆ operator-() [2/2]

MappedTensor nz::data::MappedTensor::operator- ( const MappedTensor & other) const

Perform element-wise subtraction between two MappedTensors.

Parameters
otherThe MappedTensor to be subtracted from the current MappedTensor (host-to-host).
Returns
A new MappedTensor containing the result of the element-wise subtraction.

This function conducts an element-wise subtraction between the current MappedTensor and another MappedTensor. It first verifies that the shapes of the two tensors are equal; if not, it throws a std::invalid_argument exception. Then, it constructs a new MappedTensor with the same shape and an appropriate gradient requirement based on the input tensors. Subsequently, it calculates the CUDA grid and block dimensions according to the tensor size and invokes the krnl::MatrixSub kernel to carry out the subtraction operation. Finally, it synchronizes the CUDA device and returns the resulting MappedTensor.

Memory management: A new MappedTensor is created to store the result, and its memory is managed by the MappedTensor class. The memory of the input tensors remains untouched. Exception handling: Throws a std::invalid_argument if the shapes of the two MappedTensors do not match. The CHECK macro is used to handle CUDA errors; if a CUDA operation fails, it will throw an appropriate exception. Relationship with other components: Depends on the krnl::MatrixSub kernel for the subtraction operation and the CHECK macro for CUDA error handling.

Exceptions
std::invalid_argumentIf the shapes of the two MappedTensors are not equal.
[Exceptiontype thrown by CHECK macro] If a CUDA operation fails.
Note
  • The time complexity of this function is O(n), where n is the number of elements in the tensors (_size), as it needs to perform a subtraction for each element.
  • Ensure that the CUDA environment is properly configured and the krnl::MatrixSub kernel is correctly implemented before using this function.
Warning
  • If the CUDA device runs out of memory during the operation, the function may fail.
```cpp
nz::data::MappedTensor tensor2(shape, false);
nz::data::MappedTensor result = tensor1 - tensor2;
```

Definition at line 452 of file MappedTensor.cu.

Here is the call graph for this function:

◆ operator/()

MappedTensor nz::data::MappedTensor::operator/ ( const MappedTensor & other) const

Perform element-wise division between two MappedTensors.

Parameters
otherThe MappedTensor to divide the current MappedTensor by (host-to-host).
Returns
A new MappedTensor containing the result of the element-wise division.

This function performs element-wise division between the current MappedTensor and another MappedTensor. It first checks if the shapes of the two tensors are equal. If not, it throws a std::invalid_argument exception. Then, it calculates the CUDA grid and block dimensions based on the size of the tensors. A new MappedTensor with the same shape and gradient requirement as the current tensor is created. The krnl::ElementwiseDivide kernel is invoked to perform the division operation on each corresponding element of the two tensors. Finally, the CUDA device is synchronized, and the resulting MappedTensor is returned.

Memory management: A new MappedTensor is created to store the result, and its memory is managed by the MappedTensor class. The memory of the input tensors remains unchanged. Exception handling: If the shapes of the two tensors are not equal, a std::invalid_argument exception is thrown. The CHECK macro is used to handle CUDA errors, and if a CUDA operation fails, an appropriate exception is thrown. Relationship with other components: This function depends on the krnl::ElementwiseDivide kernel for the division operation and the CHECK macro for CUDA error handling.

Exceptions
std::invalid_argumentIf the shapes of the two MappedTensors are not equal.
[Exceptiontype thrown by CHECK macro] If a CUDA operation fails.
Note
  • The time complexity of this function is O(n), where n is the number of elements in the tensors (_size), as it needs to perform a division for each pair of corresponding elements.
Warning
  • If the CUDA device runs out of memory during the operation, the function may fail.
  • Incorrect implementation of the krnl::ElementwiseDivide kernel may lead to incorrect division results.
  • Division by zero in the krnl::ElementwiseDivide kernel may lead to undefined behavior.
```cpp
nz::data::MappedTensor tensor2(shape, false);
nz::data::MappedTensor result = tensor1 / tensor2;
```

Definition at line 511 of file MappedTensor.cu.

Here is the call graph for this function:

◆ operator=() [1/2]

MappedTensor & nz::data::MappedTensor::operator= ( const MappedTensor & other)

Copy assignment operator for the MappedTensor class.

Parameters
otherA constant reference (host-to-host) to a MappedTensor object from which data and properties will be copied.
Returns
A reference to the modified MappedTensor object (host-to-host).

This copy assignment operator first checks if the object is not being assigned to itself. If not, it releases the previously allocated memory for the _data and, if _requires_grad is true, for the _grad using cudaFreeHost. Then it copies the shape, size, and _requires_grad flag from the other object. Next, it allocates new host memory for _data and, if _requires_grad is true, for _grad using cudaMallocHost. If _requires_grad is false, the _grad pointer is set to nullptr. Finally, it copies the data and, if applicable, the gradient from the other object using cudaMemcpy. The memory management strategy involves deallocating existing memory before re - allocating and copying new data. There is no explicit exception handling in this operator, but the CHECK macro is assumed to handle errors related to CUDA memory operations. This operator is used to assign the state of one MappedTensor object to another.

Note
  • The CHECK macro is assumed to handle CUDA errors properly. Ensure that the CUDA environment is properly configured before using this operator.
  • The CUDA memory operations may fail if there is not enough available memory or if the memory pointers are invalid.
Warning
  • CUDA memory operations, such as cudaMallocHost and cudaMemcpy, can be time - consuming, especially for large tensors. Be aware of the performance implications when using this operator.
```cpp
nz::data::MappedTensor tensor2(shape, false);
tensor2 = tensor1;
```

Definition at line 127 of file MappedTensor.cu.

Here is the call graph for this function:

◆ operator=() [2/2]

MappedTensor & nz::data::MappedTensor::operator= ( MappedTensor && other)

Move assignment operator for the MappedTensor class.

Parameters
otherAn rvalue reference (host-to-host) to a MappedTensor object from which resources will be moved.
Returns
A reference to the modified MappedTensor object (host-to-host).

This move assignment operator first checks if the object is not being assigned to itself. If so, it releases the existing host memory allocated for _data and, if _requires_grad is true and _grad is not null, for _grad using cudaFreeHost. Then, it transfers the ownership of resources from the other MappedTensor object to the current one. It moves the shape using std::move, copies the size and _requires_grad flag, and takes over the pointers to the data and gradient buffers from other. After that, it sets the other object's data and gradient pointers to nullptr, its size to 0, _requires_grad to false, and the shape to {0, 0}. This operation is marked as noexcept, meaning it does not throw exceptions. The memory management strategy involves freeing the current object's existing memory before taking over the memory from other, thus ensuring no memory leaks. There is no need for explicit exception handling as the operation is guaranteed not to throw. This operator is useful for efficiently reusing resources during assignment operations.

Note
  • After the move operation, the other object should not be used as it is left in a valid but empty state.
  • The CHECK macro is assumed to handle CUDA errors properly. Ensure that the CUDA environment is properly configured before using this operator.
```cpp
nz::data::MappedTensor tensor2(shape, false);
tensor2 = std::move(tensor1);
```

Definition at line 156 of file MappedTensor.cu.

Here is the call graph for this function:

◆ operator==()

bool nz::data::MappedTensor::operator== ( const MappedTensor & other) const

Checks if two MappedTensor objects are equal.

Parameters
otherThe other MappedTensor object to compare with. Memory flow: direct access from host memory.
Returns
Returns true if the two MappedTensor objects are equal, false otherwise.

This function compares two MappedTensor objects for equality. First, it checks if the _requires_grad flags of the two MappedTensors are the same. If they differ, the function immediately returns false. Then, it compares the shapes of the two MappedTensors. If the shapes are not equal, the function also returns false.

After that, it compares each element of the data arrays of the two MappedTensors one by one. If any element in the data differs, it returns false.

If the _requires_grad flag is set to true, it repeats the same process for the gradients of the MappedTensors. If any element in the gradients differs, it returns false.

Finally, if all comparisons pass, it returns true.

Memory Management Strategy:

  • This function does not allocate or free any memory. It directly accesses the _data and _grad arrays of the MappedTensor objects.

Exception Handling Mechanism:

  • This function does not have a specific exception handling mechanism. It assumes that the _data and _grad arrays are properly initialized and have the correct size.

Relationship with Other Components:

  • Depends on the _requires_grad, _shape, _size, _data, and _grad members of the MappedTensor class.
Note
  • Ensure that the _data and _grad arrays of the MappedTensor objects are properly initialized before calling this function.
  • The function has a time complexity of O(n), where n is the number of elements in the MappedTensor, due to the element - by - element comparison.
```cpp
MappedTensor tensor1; // Assume MappedTensor1 is properly initialized
MappedTensor tensor2; // Assume MappedTensor2 is properly initialized
bool isEqual = tensor1 == tensor2;
```

Definition at line 475 of file MappedTensor.cu.

Here is the call graph for this function:

◆ operator[]()

auto nz::data::MappedTensor::operator[] ( size_type index) const -> value_type&

Overload the [] operator to access an element of the MappedTensor by index.

Parameters
indexThe index of the element to access (host-to-host). It should be a non-negative integer.
Returns
A reference to the value at the specified index in the tensor's data.

This function allows users to access individual elements of the MappedTensor using the [] operator. It first checks if the given index is within the valid range of the tensor's size. If the index is out of range, it throws a std::out_of_range exception. Otherwise, it returns a reference to the corresponding element in the internal data array _data.

Memory management: The function does not allocate or deallocate any memory. It only accesses the existing internal data array _data. Exception handling: If the provided index is greater than or equal to the size of the tensor (_size), a std::out_of_range exception is thrown. Relationship with other components: This function is related to the data access component of the MappedTensor. It provides a convenient way for users to access individual elements of the tensor.

Exceptions
std::out_of_rangeIf the provided index is out of the valid range (i.e., index >= _size).
Note
  • The time complexity of this function is O(1) because it directly accesses the element in the internal data array using the given index.
  • Ensure that the index is within the valid range to avoid exceptions.
```cpp
tensor.dataInject({1, 2, 3, 4, 5, 6}, false);
nz::data::MappedTensor::value_type value = tensor[2];
std::cout << value << std::endl;
```

Definition at line 326 of file MappedTensor.cu.

◆ print()

std::ostream & nz::data::MappedTensor::print ( std::ostream & os) const

Print the tensor data in a matrix-like format to an output stream.

Parameters
osAn output stream (host-to-host) where the tensor data will be printed.
Returns
A reference to the output stream os after printing the tensor data.

This function is used to display the tensor data in a matrix-like structure. It iterates over the rows of the tensor and prints each row as a sequence of values separated by a space, enclosed in square brackets.

Memory management: The function does not allocate or deallocate any memory. It only reads the tensor's internal data (_data) for printing. Exception handling: The function assumes that the _shape and _data members of the tensor are properly initialized. If there are issues with these members (e.g., invalid shape dimensions), the behavior may be undefined. The operations on the output stream (os) are assumed to be well - behaved, and any exceptions thrown by the stream operations will be propagated. Relationship with other components: This function is mainly related to the data presentation component of the MappedTensor. It provides a user - friendly way to view the tensor's data.

Note
  • The function has a time complexity of O(m * n), where m is the number of rows (_shape[0]) and n is the number of columns (_shape[1]) of the tensor, as it iterates over all elements in the tensor.
  • Ensure that the output stream os is in a valid state before calling this function.
```cpp
tensor.dataInject({1, 2, 3, 4, 5, 6}, false);
tensor.print(std::cout);
```

Definition at line 268 of file MappedTensor.cu.

Here is the call graph for this function:

◆ printGrad()

std::ostream & nz::data::MappedTensor::printGrad ( std::ostream & os) const

Print the gradient of the tensor in a matrix-like format to an output stream.

Parameters
osAn output stream (host-to-host) where the tensor gradient will be printed.
Returns
A reference to the output stream os after printing the tensor gradient.

This function is designed to display the gradient of the MappedTensor in a matrix-like structure. It iterates over the rows of the gradient data and prints each row as a sequence of values separated by a space, enclosed in square brackets.

Memory management: The function does not allocate or deallocate any memory. It only reads the tensor's internal gradient data (_grad) for printing. Exception handling: If the tensor does not require gradients (_requires_grad is false), a std::invalid_argument exception is thrown. The operations on the output stream (os) are assumed to be well - behaved, and any exceptions thrown by the stream operations will be propagated. Relationship with other components: This function is related to the gradient management and data presentation components of the MappedTensor. It provides a way to view the gradient values of the tensor.

Exceptions
std::invalid_argumentIf the tensor does not require gradients.
Note
  • The function has a time complexity of O(m * n), where m is the number of rows (_shape[0]) and n is the number of columns (_shape[1]) of the tensor's gradient, as it iterates over all elements in the gradient.
  • Ensure that the output stream os is in a valid state before calling this function.
```cpp
tensor.dataInject({1, 2, 3, 4, 5, 6}, true);
tensor.printGrad(std::cout);
```

Definition at line 296 of file MappedTensor.cu.

Here is the call graph for this function:

◆ randomize()

void nz::data::MappedTensor::randomize ( size_type seed = 0,
bool isGrad = false ) const

Randomize the data or gradients of the MappedTensor using a given seed.

Parameters
seedThe seed value used to initialize the random number generator (host-to-device). If 0, the current system time will be used as the seed.
isGradA boolean flag indicating whether to randomize the gradients or the data. If true, gradients are randomized; otherwise, data is randomized (host-to-device).
Returns
None

This function provides the ability to randomize either the data or the gradients of the MappedTensor. It first checks if gradient randomization is valid for the tensor. If the seed is 0, it uses the current system time as the seed. Then, it initializes a CURAND pseudo - random number generator, sets the seed, and fills the appropriate memory (data or gradients) with uniformly distributed random numbers in the range [0, 1).

Memory management: The function does not allocate or deallocate the tensor's data or gradient memory. It only modifies the existing memory in - place. Exception handling: If the tensor does not require gradients and isGrad is true, a std::invalid_argument is thrown. If any of the CURAND operations (creating the generator, setting the seed, or generating random numbers) fail, a std::runtime_error is thrown. Relationship with other components: This function is related to the data and gradient initialization components of the MappedTensor. It offers a way to initialize the tensor's data or gradients with random values.

Exceptions
std::invalid_argumentIf gradient randomization is attempted on a tensor that does not require gradients.
std::runtime_errorIf CURAND fails to create the generator, set the seed, or generate random numbers.
Note
  • The time complexity of this function is O(n), where n is the number of elements in the tensor (_size), as it needs to generate a random number for each element.
  • Ensure that the CUDA and CURAND libraries are properly initialized and configured before calling this function.
Warning
  • If the CURAND library is not installed or configured correctly, this function will throw an exception.
  • Reusing the same seed will result in the same sequence of random numbers being generated.
```cpp
try {
tensor.randomize(42, false);
tensor.randomize(0, true);
} catch (const std::exception& e) {
std::cerr << e.what() << std::endl;
}
```

Definition at line 374 of file MappedTensor.cu.

◆ recip()

void nz::data::MappedTensor::recip ( )

Compute the reciprocal of each element in the MappedTensor.

This function computes the reciprocal (1/x) of each element in the MappedTensor. It first calculates the CUDA grid and block dimensions based on the size of the tensor. Then, it allocates host memory for a temporary buffer to store the reciprocal values. The krnl::Recip kernel is invoked to compute the reciprocals of the elements in the tensor and store the results in the temporary buffer. After the kernel execution, the CUDA device is synchronized. Finally, the original device memory of the tensor is freed, and the pointer is updated to point to the temporary buffer.

Memory management: Host memory is allocated for the temporary buffer using cudaMallocHost. The original device memory of the tensor is freed using cudaFreeHost. The ownership of the data is transferred to the _data member of the MappedTensor. Exception handling: The CHECK macro is used to handle CUDA errors. If any CUDA operation fails, an appropriate exception will be thrown. Relationship with other components: This function depends on the krnl::Recip kernel for computing the reciprocals and the CHECK macro for CUDA error handling.

Exceptions
[Exceptiontype thrown by CHECK macro] If a CUDA operation fails.
Note
  • The time complexity of this function is O(n), where n is the number of elements in the tensor (_size), as it needs to compute the reciprocal for each element.
  • Ensure that the CUDA environment is properly configured and the krnl::Recip kernel is correctly implemented before using this function.
Warning
  • If the CUDA device runs out of memory during the operation, the function may fail.
  • Division by zero in the krnl::Recip kernel may lead to undefined behavior.
```cpp
tensor.recip();
```

Definition at line 517 of file MappedTensor.cu.

Here is the call graph for this function:

◆ requiresGrad()

bool nz::data::MappedTensor::requiresGrad ( ) const
nodiscardnoexcept

Checks whether the MappedTensor requires gradient computation.

Returns
A boolean value (host-to-host) indicating whether the MappedTensor requires gradient computation. true means it requires gradient computation, and false means it does not.

This function provides a simple way to query the gradient requirement status of the MappedTensor. It is a read - only operation that accesses the internal state of the object.

The memory management strategy is straightforward. No new memory is allocated or freed during this function call. It simply reads the internal state of the MappedTensor object. There is no exception handling mechanism in this function because it is declared noexcept, which means it will not throw any exceptions under normal circumstances. This function can be used in various parts of the codebase to determine whether certain gradient - related operations should be performed on the MappedTensor.

Note
  • This function is a const member function, so it can be called on const MappedTensor objects.
  • The [[nodiscard]] attribute indicates that the return value should not be ignored, as it conveys important information about the gradient requirement of the tensor.
```cpp
bool grad_required = tensor.requiresGrad();
```

Definition at line 198 of file MappedTensor.cu.

◆ reshape()

void nz::data::MappedTensor::reshape ( const shape_type & shape)

Reshape the MappedTensor to a new shape.

Parameters
shapeThe new shape of the tensor (host-to-device).
Returns
None

This function reshapes the MappedTensor to the specified new shape. It allocates a new memory block for the data and, if the tensor requires gradients, for the gradients as well. It then copies the existing data and gradients (up to the minimum of the old and new sizes) to the new memory blocks and frees the old memory. Finally, it updates the tensor's shape and size information.

Memory management: The function allocates new host memory for the data and gradients using cudaMallocHost and frees the old host memory using cudaFreeHost. Exception handling: The CHECK macro is used to handle potential CUDA errors during memory allocation, memory setting, and memory copying operations. If any of these operations fail, the CHECK macro will handle the error according to its implementation, which may include logging and terminating the program. Relationship with other components: This function is related to the data and gradient management components of the MappedTensor. It provides a way to change the shape of the tensor and adjust its internal memory accordingly.

Exceptions
Noneexplicitly, but the CHECK macro may handle and report CUDA errors.
Note
  • The time complexity of this function is O(n), where n is the minimum of the old and new sizes of the tensor, as the data and gradients are copied element by element.
  • Ensure that the CUDA environment is properly initialized before calling this function.
  • The new shape should be compatible with the intended use of the tensor.
Warning
  • If the CUDA environment is not set up correctly, memory allocation, setting, or copying operations may fail, and the CHECK macro will handle the error, which may lead to program termination.
  • Changing the shape of a tensor may affect the interpretation of its data in subsequent operations.
```cpp
nz::data::MappedTensor tensor(oldShape, false);
tensor.reshape(newShape);
```

Definition at line 347 of file MappedTensor.cu.

Here is the call graph for this function:

◆ setRequiresGrad()

void nz::data::MappedTensor::setRequiresGrad ( bool requires_grad)

Sets the gradient requirement flag for the MappedTensor and manages the associated gradient memory accordingly.

Parameters
requires_gradA boolean value (host-to-host) indicating whether the tensor should require gradient computation.
Returns
void

This function is responsible for updating the _requires_grad flag of the MappedTensor. If the flag is being changed from true to false, it frees the memory allocated for the gradient (_grad). Conversely, if the flag is being changed from false to true, it allocates memory for the gradient using cudaMallocHost.

The memory management strategy is as follows: When requires_grad is set to false and the tensor previously required gradients, the gradient memory is freed using cudaFreeHost. When requires_grad is set to true and the tensor did not previously require gradients, new memory is allocated using cudaMallocHost. The size of the allocated memory is based on the total number of elements (_size) in the tensor multiplied by the size of each element (sizeof(size_type)). The exception handling mechanism relies on the CHECK macro. If the cudaFreeHost or cudaMallocHost operations fail, the CHECK macro is expected to handle the error appropriately, potentially throwing an exception or terminating the program. This function is closely related to the gradient computation mechanism of the MappedTensor. It ensures that the memory for gradients is allocated and freed as needed, which is crucial for efficient gradient calculation and memory management in a CUDA - enabled environment.

Exceptions
Anexception might be thrown by the CHECK macro if the cudaFreeHost or cudaMallocHost operations fail.
Note
  • Ensure that the CUDA runtime environment is properly initialized before calling this function, as it uses CUDA memory management functions.
  • The CHECK macro is assumed to handle CUDA errors correctly. Any issues with the CUDA operations will be reported through this macro.
Warning
  • Incorrect usage of this function can lead to memory leaks or segmentation faults. For example, if the CUDA environment is not set up correctly, the memory allocation or deallocation operations may fail unexpectedly.
```cpp
tensor.setRequiresGrad(true);
```

Definition at line 221 of file MappedTensor.cu.

Here is the call graph for this function:

◆ setShape()

void nz::data::MappedTensor::setShape ( const shape_type & shape)

Sets a new shape for the MappedTensor and adjusts its data and gradient memory accordingly.

Parameters
shapeA reference to a shape_type object (host-to-host) that represents the new shape of the MappedTensor.
Returns
void

This function is used to change the shape of the MappedTensor. It allocates new memory for the tensor's data based on the new shape, initializes the new memory to zero, copies the data from the old memory to the new memory, and then frees the old memory. If the tensor requires gradient computation (_requires_grad is true), it performs similar operations for the gradient memory.

The memory management strategy involves multiple steps. First, new memory is allocated using cudaMallocHost for the data and gradient (if required). Then, the new memory is initialized to zero using cudaMemset. Next, the data is copied from the old memory to the new memory using cudaMemcpy with the cudaMemcpyDeviceToDevice flag. Finally, the old memory is freed using cudaFreeHost. The exception handling mechanism relies on the CHECK macro. If any of the CUDA operations (cudaMallocHost, cudaMemset, cudaMemcpy, cudaFreeHost) fail, the CHECK macro is expected to handle the error, potentially throwing an exception or terminating the program. This function is closely related to the data storage and gradient management components of the MappedTensor. It ensures that the tensor's data and gradient are properly adjusted when the shape changes, which is crucial for maintaining data consistency during tensor operations.

Exceptions
Anexception might be thrown by the CHECK macro if any of the CUDA operations fail.
Note
  • Ensure that the CUDA runtime environment is properly initialized before calling this function, as it uses CUDA memory management and data transfer functions.
  • The CHECK macro is assumed to handle CUDA errors correctly. Any issues with the CUDA operations will be reported through this macro.
  • The function assumes that the shape_type object has at least two elements, as it accesses shape[0] and shape[1] to calculate the new size.
Warning
  • Incorrect usage of this function can lead to memory leaks or segmentation faults. For example, if the CUDA environment is not set up correctly, the memory allocation, initialization, or deallocation operations may fail unexpectedly.
  • If the new shape has a smaller size than the old shape, data beyond the new size will be discarded. If the new shape has a larger size, the additional memory will be initialized to zero.
```cpp
nz::data::MappedTensor tensor(oldShape, false);
tensor.setShape(newShape);
```

Definition at line 233 of file MappedTensor.cu.

Here is the call graph for this function:

◆ shape()

MappedTensor::shape_type nz::data::MappedTensor::shape ( ) const
nodiscardnoexcept

Retrieves the shape of the MappedTensor.

Returns
A value of type shape_type (host-to-host) representing the shape of the MappedTensor. The shape is a container that holds the size of each dimension of the tensor.

This function allows users to obtain the dimensional information of the MappedTensor. The shape provides crucial details about how the elements are organized in the tensor, which is essential for many tensor operations.

The memory management strategy involves returning a copy of the internal shape representation of the MappedTensor. The caller is responsible for managing the memory of the returned shape_type object, but the original shape data within the MappedTensor remains intact. No new memory is allocated for the tensor's internal shape data during this call. There is no exception handling mechanism in this function because it is declared noexcept, meaning it will not throw any exceptions under normal circumstances. This function can be used in combination with other functions that require knowledge of the tensor's shape, such as reshaping operations or accessing specific elements based on multi - dimensional indices.

Note
  • This function is a const member function, so it can be called on const MappedTensor objects.
  • The [[nodiscard]] attribute indicates that the return value should not be ignored, as the shape information is vital for working with the tensor.
```cpp
nz::data::MappedTensor tensor(inputShape, false);
nz::data::MappedTensor::shape_type tensorShape = tensor.shape();
```

Definition at line 217 of file MappedTensor.cu.

◆ size()

MappedTensor::size_type nz::data::MappedTensor::size ( ) const
nodiscardnoexcept

Retrieves the total number of elements in the MappedTensor.

Returns
A value of type size_type (host-to-host) representing the total number of elements in the MappedTensor.

This function is used to obtain the size of the MappedTensor, which is the product of the dimensions of its shape. It provides a quick way to know the quantity of elements stored in the tensor.

The memory management strategy is straightforward. No new memory is allocated or freed during this function call. It simply returns the pre - calculated size of the tensor. There is no exception handling mechanism in this function because it is declared noexcept, meaning it will not throw any exceptions under normal circumstances. This function can be used in various scenarios, such as loop iterations over all elements of the tensor or to allocate appropriate memory when transferring data to another structure.

Note
  • This function is a const member function, so it can be called on const MappedTensor objects.
  • The [[nodiscard]] attribute indicates that the return value should not be ignored, as it gives important information about the size of the tensor.
```cpp
nz::data::MappedTensor::size_type tensorSize = tensor.size();
```

Definition at line 213 of file MappedTensor.cu.

◆ sum() [1/2]

MappedTensor::value_type nz::data::MappedTensor::sum ( ) const
nodiscard

Calculate the sum of all elements in the MappedTensor.

Returns
The sum of all elements in the MappedTensor as a value of type MappedTensor::value_type.

This function computes the sum of all elements within the MappedTensor. It utilizes CUDA parallel processing to perform the summation efficiently. First, it determines the block and grid dimensions for the CUDA kernel. Then, it allocates pinned host memory for storing the intermediate results using cudaMallocHost. The krnl::Summation CUDA kernel is launched to calculate partial sums on the device. After the kernel execution, the function synchronizes the device using cudaDeviceSynchronize to ensure that all operations are completed. Finally, it sums up the partial results on the host, frees the allocated pinned host memory, and returns the total sum.

Memory management:

  • Pinned host memory is allocated for dData using cudaMallocHost and freed using cudaFreeHost.

Exception handling:

  • The CHECK macro is used to handle CUDA API errors. If any CUDA API call fails, the CHECK macro will throw an exception, causing the function to terminate.

Relationship with other components:

  • This function relies on the krnl::Summation CUDA kernel to perform partial sums on the device.
  • It also depends on the CHECK macro to handle CUDA API errors and cudaDeviceSynchronize for device synchronization.
Exceptions
[Exceptiontype thrown by CHECK macro] If there are CUDA API errors during memory allocation, kernel execution, or memory synchronization.
Note
  • The time complexity of this function is approximately O(n), where n is the number of elements in the MappedTensor (_size). The CUDA kernel parallelizes the partial sum calculation, and the final sum on the host is a linear operation over the number of grid blocks.
  • Ensure that the CUDA device is properly initialized before calling this function.
  • Pinned host memory allocation may have limitations, so be aware of potential memory constraints.
```cpp
nz::data::MappedTensor mapped_tensor({2, 3}, true);
// Assume mapped_tensor is filled with some values
nz::data::MappedTensor::value_type sum_result = mapped_tensor.sum();
```
value_type sum() const
Calculate the sum of all elements in the MappedTensor.

Definition at line 527 of file MappedTensor.cu.

Here is the call graph for this function:

◆ sum() [2/2]

MappedTensor::value_type nz::data::MappedTensor::sum ( size_t batch,
size_t channel ) const
nodiscard

Calculate the sum of elements in a specific batch and channel of a MappedTensor.

Parameters
batchThe index of the batch. Memory flow: host-to-host (used for index calculation on the host side).
channelThe index of the channel. Memory flow: host-to-host (used for index calculation on the host side).
Returns
The sum of elements in the specified batch and channel of the MappedTensor.

This function first validates the provided batch and channel indices. If they are out of the valid range of the MappedTensor's shape, it throws a std::invalid_argument exception. Then, it calculates the size of the region to be summed. It allocates pinned host memory for intermediate results using cudaMallocHost. After that, it determines the offset in the MappedTensor's data based on the batch and channel indices. The krnl::Summation kernel is launched to perform the partial summation. Finally, it sums up all the intermediate results on the host, frees the allocated pinned host memory, and returns the final sum.

Memory Management Strategy:

  • Pinned host memory for dData is allocated using cudaMallocHost and freed using cuStrm::StreamManager<value_type>::Instance().freeHost.

Exception Handling Mechanism:

  • Throws a std::invalid_argument exception if the provided batch or channel indices are out of the valid range of the MappedTensor's shape.
  • CUDA memory allocation operations may return error codes indicating failures. It is assumed that the calling code or the CUDA runtime will handle these errors appropriately.

Relationship with Other Components:

  • Depends on the _shape member of the MappedTensor class to get the shape information and strides.
  • Uses the krnl::Summation kernel to perform the partial summation.
  • Relies on cuStrm::StreamManager<value_type>::Instance() for freeing the pinned host memory.
Exceptions
std::invalid_argumentIf the provided batch or channel indices are out of the valid range of the MappedTensor's shape.
Note
  • Ensure that the provided batch and channel indices are within the valid range of the MappedTensor's shape to avoid exceptions.
  • Be aware of potential CUDA errors during memory allocation operations and handle them appropriately in the calling code.
```cpp
MappedTensor mappedTensor; // Assume MappedTensor is properly initialized
size_t batch = 0;
size_t channel = 1;
MappedTensor::value_type sumResult = mappedTensor.sum(batch, channel);
```

Definition at line 542 of file MappedTensor.cu.

Here is the call graph for this function:

◆ sync()

void nz::data::MappedTensor::sync ( ) const

Synchronizes the tensor data and its gradient.

This function first calls the syncData method of the cuStrm::streamManagerFP32 object, passing the _data member of the MappedTensor class. This is to ensure that all CUDA stream write operations on the tensor data are completed by blocking the host. Then it calls the syncGrad method to synchronize the gradient data if gradient computation is required.

Parameters
None
Returns
None

There is no explicit memory allocation or deallocation in this function. Memory management for the _data and _grad data is assumed to be handled elsewhere. The function does not have an explicit exception - handling mechanism. It relies on the cuStrm::streamManagerFP32.syncData method and the syncGrad method to manage any errors during the synchronization process.

Note
  • The time complexity of this function depends on the time it takes for the CUDA stream write operations on _data and _grad (if applicable) to complete. In the worst - case scenario, if there are long - running write operations, it could take a significant amount of time.
```cpp
// Assume MappedTensor is defined and an instance is created
MappedTensor mappedTensor;
mappedTensor.sync();
```
void sync() const
Synchronizes the tensor data and its gradient.

Definition at line 607 of file MappedTensor.cu.

Here is the call graph for this function:

◆ syncData()

void nz::data::MappedTensor::syncData ( ) const

Synchronizes the tensor data by waiting for all CUDA stream write operations on it to finish.

This function invokes the syncData method of the cuStrm::streamManagerFP32 object, passing the _data member of the MappedTensor class. It blocks the host until all CUDA stream write operations on the _data are completed.

Parameters
None
Returns
None

Memory management for the _data is assumed to be handled elsewhere. There is no memory allocation or deallocation within this function. This function does not have an explicit exception - handling mechanism. It depends on the cuStrm::streamManagerFP32.syncData method to handle any errors during the synchronization process.

Note
  • The time complexity of this function depends on the time taken for the CUDA stream write operations on _data to complete. In the worst - case scenario, it could take a long time if there are long - running write operations.
```cpp
// Assume MappedTensor is defined and an instance is created
MappedTensor mappedTensor;
mappedTensor.syncData();
```
void syncData() const
Synchronizes the tensor data by waiting for all CUDA stream write operations on it to finish.

Definition at line 603 of file MappedTensor.cu.

Here is the call graph for this function:

◆ syncGrad()

void nz::data::MappedTensor::syncGrad ( ) const

Synchronizes the gradient data if gradient computation is required.

This function checks the _requires_grad flag. If the flag is set to true, it calls the syncData method of the cuStrm::streamManagerFP32 object, passing the _grad data. The syncData method blocks the host until all CUDA stream write operations on the input data are completed.

Returns
None

There is no explicit memory allocation or deallocation in this function. Memory management for the _grad data is assumed to be handled elsewhere. The function does not have an explicit exception - handling mechanism. It relies on the cuStrm::streamManagerFP32.syncData method to manage any errors during the synchronization process.

Note
  • The time complexity of this function depends on the time it takes for the CUDA stream write operations on _grad to complete. In the worst - case scenario, if there are long - running write operations, it could take a significant amount of time.
```cpp
// Assume MappedTensor is defined and an instance is created
MappedTensor mappedTensor;
mappedTensor.syncGrad();
```
void syncGrad() const
Synchronizes the gradient data if gradient computation is required.

Definition at line 597 of file MappedTensor.cu.

Here is the call graph for this function:

◆ transpose()

void nz::data::MappedTensor::transpose ( )

Transpose the MappedTensor and its gradients (if required).

Parameters
None
Returns
None

This function transposes the data of the MappedTensor. If the tensor requires gradients, it also transposes the gradients. It first calculates the CUDA grid and block dimensions based on the tensor's shape and a predefined tile size. Then, it allocates host memory for a temporary buffer, invokes the krnl::Transpose kernel to perform the transpose operation, and synchronizes the device. After that, it frees the original data memory and assigns the temporary buffer as the new data. If gradients are required, the same process is repeated for the gradients. Finally, it swaps the shape dimensions of the tensor.

Memory management: The function allocates host memory for temporary buffers using cudaMallocHost and frees the original data and gradient memory using cudaFreeHost. Exception handling: The CHECK macro is used to handle CUDA errors. If a CUDA operation fails, the CHECK macro will throw an appropriate exception. Relationship with other components: This function depends on the krnl::Transpose kernel to perform the actual transpose operation. It also interacts with the CUDA memory management functions.

Exceptions
[Exceptiontype thrown by CHECK macro] [Thrown when a CUDA operation fails]
Note
  • The time complexity of this function is O(n), where n is the number of elements in the tensor (_size), as it needs to process each element during the transpose operation.
  • Ensure that the CUDA environment is properly configured and the krnl::Transpose kernel is correctly implemented before calling this function.
  • The TILE_SIZE must be properly defined for the CUDA kernel to work correctly.
Warning
  • If the CUDA device runs out of memory during the memory allocation, the function will fail.
  • Incorrect implementation of the krnl::Transpose kernel may lead to incorrect transpose results.
```cpp
tensor.transpose();
```

Definition at line 421 of file MappedTensor.cu.

Here is the call graph for this function:

Friends And Related Symbol Documentation

◆ operator<<

DL_API std::ostream & operator<< ( std::ostream & os,
const MappedTensor & tensor )
friend

Overload the << operator to print a MappedTensor object to an output stream.

Parameters
osAn output stream (host-to-host) where the MappedTensor data and gradient will be printed.
tensorA constant reference (host-to-host) to the MappedTensor object to be printed.
Returns
A reference to the output stream os after printing the tensor data and possibly its gradient.

This function provides a convenient way to print a MappedTensor object using the << operator. It first calls the print method of the MappedTensor to print the tensor's data. If the tensor requires gradients, it then prints a header "Gradient: " followed by the gradient data using the printGrad method.

Memory management: The function does not allocate or deallocate any memory. It relies on the print and printGrad methods of the MappedTensor, which also do not perform memory allocation. Exception handling: If the tensor requires gradients and an exception occurs during the printGrad call (e.g., due to an invalid state of the output stream or incorrect internal data), the exception will be propagated. If the tensor does not require gradients, the printGrad call is skipped, and no exception related to gradient printing will be thrown. Relationship with other components: This function is related to the data presentation component of the MappedTensor. It integrates the print and printGrad methods to provide a unified way of printing the tensor and its gradient.

Exceptions
std::invalid_argumentPropagated from the printGrad method if the tensor requires gradients and there is an issue with gradient printing.
Note
  • The overall time complexity of this function is O(m * n) if the tensor does not require gradients and O(2 * m * n) if it does, where m is the number of rows (_shape[0]) and n is the number of columns (_shape[1]) of the tensor, as it iterates over the tensor data and possibly the gradient data.
  • Ensure that the output stream os is in a valid state before calling this function.
```cpp
tensor.dataInject({1, 2, 3, 4, 5, 6}, false);
tensor.dataInject({7, 8, 9, 10, 11, 12}, true);
std::cout << tensor;
```

Definition at line 45 of file MappedTensor.cu.

◆ operator>>

DL_API std::istream & operator>> ( std::istream & is,
MappedTensor & tensor )
friend

Overload the >> operator to read data from an input stream into a MappedTensor object.

Parameters
isAn input stream (host-to-host) from which the data will be read.
tensorA reference (host-to-host) to the MappedTensor object where the data will be stored.
Returns
A reference to the input stream is after the reading operation.

This function provides a convenient way to populate a MappedTensor object with data from an input stream. It iterates through the elements of the tensor and reads values from the input stream one by one, until either all elements of the tensor have been filled or the input stream fails to provide more data.

Memory management: The function does not allocate or deallocate any memory. It assumes that the _data array of the MappedTensor has already been allocated with the appropriate size (_size). Exception handling: If the input stream fails to provide data (e.g., due to end-of-file or an invalid input format), the loop will terminate, and the function will return the input stream in its current state. No exceptions are thrown by this function itself, but the >> operator on the input stream may throw exceptions depending on its implementation. Relationship with other components: This function is related to the data input component of the MappedTensor. It integrates with the standard input stream to allow easy data population.

Note
  • The time complexity of this function is O(n), where n is the size of the tensor (_size), as it iterates through each element of the tensor once.
  • Ensure that the input stream contains valid data in the correct format to avoid unexpected behavior.
```cpp
std::istringstream iss("1 2 3 4 5 6");
iss >> tensor;
```

Definition at line 81 of file MappedTensor.cu.


The documentation for this class was generated from the following files: