NeuZephyr
Simple DL Framework
nz::cuStrm::EventPool Class Reference

Internal event management system for CUDA stream synchronization (Part of StreamManager) More...

Public Member Functions

 EventPool (const size_t maxEvent)
 Construct an EventPool object with a specified maximum number of events.
 
 ~EventPool ()
 Destruct the EventPool object, releasing all managed CUDA events.
 
cudaEvent_t recordData (cudaStream_t stream, void *data)
 Record an event in a CUDA stream associated with a given data pointer.
 
std::unordered_set< cudaEvent_t > getEvents (void *data)
 Retrieve all CUDA events associated with a given data pointer.
 
void syncData (void *data)
 Synchronize the program execution with the completion of all events associated with a given data pointer.
 

Detailed Description

Internal event management system for CUDA stream synchronization (Part of StreamManager)

This class implements a thread-safe CUDA event pool with automatic recycling and data-aware synchronization capabilities. It serves as the foundational event infrastructure for nz::cuStrm::StreamManager's operation scheduling system.

Warning
  • This class must not be directly instantiated or invoked
  • Event lifecycle management should exclusively be handled through StreamManager interfaces
  • Direct usage may lead to:
    • Undefined synchronization behavior
    • Event pool corruption
    • CUDA resource leaks

Core Functionality:

  • Pool Management:
    • Pre-allocates cudaEventDisableTiming events during initialization
    • Dynamically expands when concurrent demands exceed initial capacity
    • Implements triple-state tracking (free/busy/released) with atomic transfers
  • Data-Event Binding:
    • Maintains bidirectional mappings between CUDA events and user data pointers
    • Enables data-centric synchronization through syncData()
  • Automatic Recycling:
    • Utilizes CUDA stream callbacks for event release detection
    • Implements lock-protected resource transitions between states

Critical Methods (Internal Use Only):

  • recordData():
    • Binds event recording to specific data pointer
    • Triggers automatic recycling via stream callback
  • getEvents():
    • Retrieves all events associated with data pointer
    • Used for dependency graph construction
  • syncData():
    • Blocks until all events linked to data complete
    • Implements condition variable-based waiting

The pool maintains three distinct event states to ensure safe CUDA event reuse:

  • Free Pool: Contains immediately available cudaEvent_t instances ready for allocation. Events are drawn from this pool when servicing new recording requests through acquire().
  • Busy Pool: Tracks actively used events currently associated with in-flight CUDA operations. Events remain here until their host stream completes execution, at which point the stream callback moves them to the released state.
  • Released Pool: Holds events that have completed execution but haven't been recycled. These events are transferred back to the free pool during subsequent acquire() calls via the internal transfer() method, ensuring safe temporal separation between event usage cycles and preventing premature reuse.
Note
  1. All public methods employ std::lock_guard for thread safety
  2. Event destruction occurs only during pool destruction
  3. Exceeding maxEvent capacity triggers silent pool expansion
  4. Callback parameters use heap-allocated memory for cross-stream safety

Definition at line 165 of file EventPool.cuh.

Constructor & Destructor Documentation

◆ EventPool()

nz::cuStrm::EventPool::EventPool ( const size_t maxEvent)
inlineexplicit

Construct an EventPool object with a specified maximum number of events.

This constructor initializes an EventPool object with a given maximum number of CUDA events. It creates maxEvent number of CUDA events with the cudaEventDisableTiming flag and inserts them into the free set, indicating that these events are initially available for use.

Parameters
maxEventThe maximum number of CUDA events that the EventPool can manage. Memory location: host.
Returns
None

Memory management: The constructor allocates memory for CUDA events using cudaEventCreateWithFlags. The responsibility of deallocating these events lies with the destructor of the EventPool class. Exception handling: This constructor does not have an explicit exception - handling mechanism. It relies on the CUDA runtime to report any errors that occur during event creation. If an error occurs during cudaEventCreateWithFlags, the program's behavior may be undefined. Relationship with other components: This constructor is part of the EventPool class, which is likely used in a larger CUDA - related application to manage the lifecycle of CUDA events.

Note
  • The time complexity of this constructor is O(n), where n is the value of maxEvent, as it iterates maxEvent times to create and insert events.
  • Ensure that the CUDA environment is properly initialized before creating an EventPool object.
```cpp
size_t maxEvents = 10;
EventPool pool(maxEvents);
```
Internal event management system for CUDA stream synchronization (Part of StreamManager)

Definition at line 191 of file EventPool.cuh.

◆ ~EventPool()

nz::cuStrm::EventPool::~EventPool ( )
inline

Destruct the EventPool object, releasing all managed CUDA events.

This destructor iterates through the sets of free, busy, and released CUDA events and destroys each event using cudaEventDestroy. This ensures that all resources allocated for these events are properly released.

Parameters
None
Returns
None

Memory management: The destructor is responsible for deallocating the memory associated with the CUDA events that were created during the lifetime of the EventPool object. It destroys all events in the free, busy, and released sets. Exception handling: This destructor does not have an explicit exception - handling mechanism. It relies on the CUDA runtime to report any errors that occur during event destruction. If an error occurs during cudaEventDestroy, the program's behavior may be undefined. Relationship with other components: This destructor is part of the EventPool class and is crucial for proper resource management in a CUDA - related application that uses the EventPool to manage CUDA events.

Note
  • The time complexity of this destructor is O(n), where n is the total number of events in the free, busy, and released sets combined.
  • Ensure that all CUDA operations associated with the events have completed before the EventPool object is destroyed.
```cpp
// Assume EventPool is defined and an instance is created
EventPool pool(10);
// Some operations with the pool
// ...
// The pool will be destroyed automatically when it goes out of scope
```

Definition at line 226 of file EventPool.cuh.

Member Function Documentation

◆ getEvents()

std::unordered_set< cudaEvent_t > nz::cuStrm::EventPool::getEvents ( void * data)
inline

Retrieve all CUDA events associated with a given data pointer.

This function searches for the provided data pointer in the internal mapping and returns a set of all CUDA events associated with it. If no events are found for the given data pointer, an empty set is returned.

Parameters
dataA pointer to the data for which the associated events are to be retrieved. Memory location: host or device, depending on the context.
Returns
An unordered set of CUDA event handles associated with the given data pointer. If no events are associated, an empty set is returned.

Memory management: The function does not allocate or deallocate any memory directly. It only accesses the internal mapping data structure of the EventPool class. Exception handling: This function does not have an explicit exception - handling mechanism. It relies on the underlying standard library functions for std::unordered_map operations. If an error occurs during the map lookup, the program's behavior may be undefined. Relationship with other components: This function is part of the EventPool class. It interacts with the internal eventMap data structure to retrieve the associated events.

Note
  • The average time complexity of this function is O(1) because it uses an std::unordered_map for lookup. In the worst - case scenario, the time complexity is O(n), where n is the number of elements in the eventMap.
  • Ensure that the data pointer is valid and has been previously used in the recordData function to associate events with it.

Definition at line 282 of file EventPool.cuh.

◆ recordData()

cudaEvent_t nz::cuStrm::EventPool::recordData ( cudaStream_t stream,
void * data )
inline

Record an event in a CUDA stream associated with a given data pointer.

This function records a CUDA event in the specified CUDA stream and associates it with the provided data pointer. It first acquires an available event from the event pool, then updates the mapping between data pointers and events and vice - versa. Finally, it records the event in the stream and returns the event handle.

Parameters
streamThe CUDA stream in which the event will be recorded. Memory location: host.
dataA pointer to the data associated with the event. Memory location: host or device, depending on the context.
Returns
A handle to the recorded CUDA event.

Memory management: The function does not allocate or deallocate any memory directly. It uses an existing event pool and updates mapping data structures. The responsibility of event memory management lies with the event pool's constructor and destructor. Exception handling: This function does not have an explicit exception - handling mechanism. It relies on the underlying CUDA functions (such as acquire and record) to report any errors. If an error occurs during event acquisition or recording, the program's behavior may be undefined. Relationship with other components: This function is part of the EventPool class. It interacts with the event pool's internal state (event sets and mapping data structures) and the CUDA runtime to record events.

Note
  • The time complexity of this function is O(log n) due to the operations on the mapping data structures (assuming they are implemented as balanced trees), where n is the number of elements in the mapping.
  • Ensure that the CUDA stream is properly initialized before calling this function.

Definition at line 256 of file EventPool.cuh.

◆ syncData()

void nz::cuStrm::EventPool::syncData ( void * data)
inline

Synchronize the program execution with the completion of all events associated with a given data pointer.

This function waits until all CUDA events associated with the provided data pointer have completed. It uses a condition variable (cv) to block the current thread until the eventMap no longer contains any events for the given data pointer, indicating that all associated events have finished.

Parameters
dataA pointer to the data for which the associated events need to be synchronized. Memory location: host or device, depending on the context.
Returns
None.

Memory management: The function does not allocate or deallocate any memory directly. It only accesses the internal eventMap data structure of the EventPool class. Exception handling: This function does not have an explicit exception - handling mechanism. It relies on the underlying standard library functions for mutex and condition variable operations. If an error occurs during locking, unlocking, or waiting, the program's behavior may be undefined. Relationship with other components: This function is part of the EventPool class. It interacts with the internal eventMap data structure and the condition variable (cv) to wait for event completion.

Note
  • The time complexity of this function is not fixed as it depends on when the associated events complete. It will block until all relevant events are finished.
  • Ensure that the data pointer is valid and has been previously used in the recordData function to associate events with it.
  • The function assumes that the internal state of the eventMap is updated correctly when events are completed to signal the condition variable.

Definition at line 309 of file EventPool.cuh.


The documentation for this class was generated from the following file: