NeuZephyr
Simple DL Framework
nz::Model Class Reference

Base class for constructing neural network models with automatic computation graph management. More...

Public Member Functions

 Model ()
 Default constructs Model instance with empty computation graph.
 
 ~Model ()
 Safely destructs Model and associated computation nodes.
 
Tensorforward ()
 Executes full forward propagation through computation graph.
 
void backward ()
 Performs backward propagation and gradient accumulation.
 
void update (opt::Optimizer *optimizer) const
 Applies parameter updates using attached optimization strategy.
 
Tensor::value_type getLoss () const
 Retrieves scalar loss value from last forward pass.
 

Protected Member Functions

NodeAdd (Node *lhs, Node *rhs)
 Creates addition operation node in computation graph (Low-level API)
 
NodeSub (Node *lhs, Node *rhs)
 Creates subtraction operation node in computation graph (Low-level API)
 
NodeMul (Node *lhs, Node *rhs)
 Creates matrix multiplication node in computation graph (Low-level API)
 
NodeBias (Node *input)
 Creates trainable bias parameter and adds element-wise to input (Mid-level API)
 
NodeReshape (Node *input, const Tensor::shape_type &shape)
 Modifies tensor dimensions while preserving data (Low-level API)
 
NodeLinear (Node *input, size_t outSize)
 Implements fully-connected layer transformation (Top-level API)
 
NodeReLU (Node *input)
 Applies Rectified Linear Unit activation (Mid-level API)
 
NodeSigmoid (Node *input)
 Applies logistic sigmoid activation (Mid-level API)
 
NodeTanh (Node *input)
 Applies hyperbolic tangent activation (Mid-level API)
 
NodeLeakyReLU (Node *input, float alpha=0.01f)
 Applies Leaky Rectified Linear Unit activation (Mid-level API)
 
NodeSwish (Node *input)
 Applies self-gated swish activation (Mid-level API)
 
NodeELU (Node *input, float alpha=1.0f)
 Applies Exponential Linear Unit activation (Mid-level API)
 
NodeHardSigmoid (Node *input, float alpha=0.2f, float beta=0.5f)
 Applies piecewise linear sigmoid approximation (Mid-level API)
 
NodeHardSwish (Node *input, float alpha=0.2f, float beta=0.5f)
 Applies hardware-efficient swish activation (Mid-level API)
 
NodeSoftmax (Node *input)
 Applies channel-wise probability normalization (High-level API)
 
NodeTargetExpand (Node *input, const Tensor::shape_type &shape)
 (Low-level) Batch expansion primitive for singleton tensors
 
NodeImg2Col (Node *input, Tensor::size_type kernelHeight, Tensor::size_type kernelWidth, Tensor::size_type stride, Tensor::size_type padding)
 (Low-level) Image-to-column transformation primitive
 
NodeCol2Img (Node *input, Tensor::size_type outputHeight, Tensor::size_type outputWidth)
 (Low-level) Column-to-image transformation primitive
 
NodeConv2d (Node *input, Tensor::size_type outChannels, Tensor::size_type kernelHeight, Tensor::size_type kernelWidth, Tensor::size_type stride, Tensor::size_type padding, bool bias=true)
 Executes optimized convolution using img2col acceleration (High-level API)
 
NodeAvgPool2d (Node *input, Tensor::size_type poolSize, Tensor::size_type stride, Tensor::size_type padding=0)
 Performs 2D average pooling operation (Sliding window)
 
NodeGlobalAvgPool2d (Node *input)
 Computes global average pooling over spatial dimensions.
 
NodeMaxPool2d (Node *input, Tensor::size_type poolSize, Tensor::size_type stride, Tensor::size_type padding=0)
 Performs 2D maximum pooling operation.
 
NodeGlobalMaxPool2d (Node *input)
 Computes global maximum pooling over spatial axes.
 
void MSELoss (Node *input, Node *target)
 Establishes Mean Squared Error loss node as computational graph terminal.
 
void BCELoss (Node *input, Node *target)
 Configures Binary Cross-Entropy loss as computation graph endpoint.
 
void defaultOutput (Node *input)
 Provides zero-overhead tensor passthrough for inference outputs.
 

Related Symbols

(Note that these are not member symbols.)

std::ostream & operator<< (std::ostream &os, Model &model)
 Serializes neural network computation graph structure to output stream.
 

Detailed Description

Base class for constructing neural network models with automatic computation graph management.

Provides infrastructure for building trainable models through composition of computational nodes. Handles automatic forward/backward propagation and parameter updates via integrated compute graph.

Key Features:

  • Automatic Graph Construction: Dynamically builds computation graph through layer composition methods
  • Modular Layer Composition: Supports 20+ neural network layer types with parameterized configuration
  • Flexible Loss Integration: Implements multiple loss functions for supervised learning scenarios

Usage Workflow:

1. Model Derivation:

Derive custom model class with public inheritance from Model

class MyModel : public Model {
public:
// Member declarations
};
Base class for constructing neural network models with automatic computation graph management.
Definition Model.cuh:187

2. Input Node Definition:

Declare and initialize input nodes with tensor dimensions. Two initialization methods:

class MyModel : public Model {
public:
InputNode input{{batch, channels, height, width}}; // Direct initialization
InputNode target; // Constructor initialization
MyModel() : target({batch, classes}) { ... }
};

3. Graph Construction:

Build network in subclass constructor with layer composition pattern:

MyModel::MyModel() {
auto x = Conv2d(&input, 64, 3, 3); // Start with input node
x = ReLU(x); // Activation after linear layer
x = Linear(x, 256);
BCELoss(x, &target); // Mandatory termination
}
Node * ReLU(Node *input)
Applies Rectified Linear Unit activation (Mid-level API)
Definition Model.cu:105
void BCELoss(Node *input, Node *target)
Configures Binary Cross-Entropy loss as computation graph endpoint.
Definition Model.cu:317
Node * Conv2d(Node *input, Tensor::size_type outChannels, Tensor::size_type kernelHeight, Tensor::size_type kernelWidth, Tensor::size_type stride, Tensor::size_type padding, bool bias=true)
Executes optimized convolution using img2col acceleration (High-level API)
Definition Model.cu:244
Node * Linear(Node *input, size_t outSize)
Implements fully-connected layer transformation (Top-level API)
Definition Model.cu:87

4. Training Cycle:

Standard three-phase training pattern with optimizer integration:

model.forward(); // Propagate inputs through graph
model.backward(); // Backpropagate gradients
model.update(optim); // Update parameters with optimizer

Usage Example:

class SegmentationModel : public Model {
public:
InputNode input{{10,3,1024,1024}}; // Batch initialized directly
InputNode target;
SegmentationModel() : target({10,1,8,1}) {
auto x = Conv2d(&input, 1, 3, 3, 1, 1);
x = ReLU(x);
x = Conv2d(x, 1, 3, 3, 1, 1);
x = AvgPool2d(x, 5, 2);
x = Linear(x, 16);
x = Softmax(x);
BCELoss(x, &target); // Graph termination
}
};
int main() {
SegmentationModel model;
model.input = load_tensor(...);
model.target = load_labels(...);
opt::Adam optimizer(0.01, 0.9, 0.999);
for(int epoch = 0; epoch < 100; ++epoch) {
model.forward();
model.backward();
model.update(&optimizer);
std::cout << "Loss: " << model.getLoss() << std::endl;
}
}
Node * Softmax(Node *input)
Applies channel-wise probability normalization (High-level API)
Definition Model.cu:185
Node * AvgPool2d(Node *input, Tensor::size_type poolSize, Tensor::size_type stride, Tensor::size_type padding=0)
Performs 2D average pooling operation (Sliding window)
Definition Model.cu:265
void Adam(dim3 gridDim, dim3 blockDim, float *data, float *m, float *v, float *grad, float lr, float beta1, float beta2, float eps, int t, unsigned long long n)
Kernel function to apply Adam optimization.

Composition Rules:

  • Parameter Passing:
    • Input nodes: Pass using address-of operator (&input)
    • Intermediate nodes: Use raw pointers from previous layer output
  • Dimension Handling:
    • Ensure tensor shape compatibility between layers
    • Use Reshape/Img2Col for dimension conversion
  • Layer Ordering:
    • Activation functions strictly after Linear/Conv layers
    • Pooling layers after activation in CNN architectures

ModelComponents:

The following table summarizes key components supported by the Model class:

Component Brief Description
Add Performs element-wise addition between two nodes
Sub Computes element-wise subtraction between two nodes
Mul Executes element-wise multiplication of two nodes
Bias Applies learnable bias term to input tensor
Reshape Modifies tensor dimensions without changing data
Linear Implements fully-connected layer transformation
ReLU Applies Rectified Linear Unit activation
Sigmoid Computes logistic sigmoid activation
Tanh Applies hyperbolic tangent activation
LeakyReLU Leaky variant of ReLU with configurable negative slope
Swish Computes self-gated activation (x * sigmoid(x))
ELU Exponential Linear Unit activation
HardSigmoid Piecewise linear approximation of sigmoid
HardSwish Hardware-friendly Swish variant with linear approximation
Softmax Applies channel-wise softmax normalization
TargetExpand Broadcasts target tensor dimensions to match input shape
Img2Col Converts image tensor to column-major format for convolution optimization
Col2Img Reconstructs image tensor from column-major representation
Conv2d 2D convolution layer with configurable kernel/padding
AvgPool2d Spatial average pooling operation
GlobalAvgPool2d Global spatial averaging across feature maps
MaxPool2d Spatial max pooling operation
GlobalMaxPool2d Global spatial maximum pooling
MSELoss Configures mean squared error as graph terminal node
BCELoss Sets binary cross-entropy loss with implicit sigmoid
defaultOutput Passthrough output node for inference-only models
Note
  • Graph Finalization:
    • Exactly one loss function call required in constructor
    • Final operation must be loss function or output specification
  • Parameter Safety:
    • Stride: 0 < stride <= kernel_size
    • Padding: <= 50% of corresponding dimension size
  • Input Requirements:
    • Initialize dimensions via member or constructor initialization
    • Keep input nodes public for direct data access
See also
nz::graph::ComputeGraph for detailed computation graph management
nz::opt for optimization strategies
Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 187 of file Model.cuh.

Constructor & Destructor Documentation

◆ Model()

nz::Model::Model ( )
default

Default constructs Model instance with empty computation graph.

Creates valid Model object in initial state:

  • Initializes compute graph with empty node list
  • Prepares hidden node storage for automatic memory management
Note
  • Derived classes must initialize input nodes before first forward pass
  • Safe for immediate use after construction
Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

◆ ~Model()

nz::Model::~Model ( )

Safely destructs Model and associated computation nodes.

Performs complete resource cleanup:

  1. Deletes all dynamically allocated hidden nodes
  2. Releases compute graph resources
  3. Invalidates internal references to nodes

Memory Management:

  • Ownership Policy: Takes exclusive ownership of nodes created through:
    • Activation functions (ReLU/Sigmoid/etc)
    • Layer operations (Linear/Conv2d/etc)
    • Tensor transformations (Reshape/Img2Col)
  • Non-hidden nodes (InputNode targets) remain user-managed
Warning
Never manually delete nodes created through Model's composition methods
Note
  • Safe for polymorphic destruction through base Model pointers
  • Node deletion complexity: O(n) for n hidden nodes
Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 5 of file Model.cu.

Member Function Documentation

◆ Add()

Node * nz::Model::Add ( Node * lhs,
Node * rhs )
protected

Creates addition operation node in computation graph (Low-level API)

Parameters
lhsLeft operand node (device-to-device, non-owning)
rhsRight operand node (device-to-device, non-owning)
Returns
Pointer to new AddNode (device-resident)

Graph Management:

  1. Automatically registers input nodes in compute graph
  2. Constructs element-wise addition operator node
  3. Transfers node ownership to Model instance
Warning
Core Infrastructure: This method belongs to Model's foundational graph construction API.
Recommended Practice: Use higher-level abstraction layers instead of direct node arithmetic
Note
  • Node deletion automatically handled during Model destruction
  • Input nodes must have matching dimensions

@complexity O(1) node creation + O(α(n)) graph insertion

Definition at line 29 of file Model.cu.

◆ AvgPool2d()

Node * nz::Model::AvgPool2d ( Node * input,
Tensor::size_type poolSize,
Tensor::size_type stride,
Tensor::size_type padding = 0 )
protected

Performs 2D average pooling operation (Sliding window)

Parameters
input4D tensor node (device-to-device, non-owning, shape [N,C,H,W])
poolSizeSpatial extent of pooling (device-to-device, K ≥ 1)
strideStep size for window movement (device-to-device, S ≥ 1)
paddingInput padding size (device-to-device, P ≥ 0)
Returns
4D tensor node (device-resident, shape [N,C,H_out,W_out])
Note
  • Boundary Handling: Uses padding_value=0 for out-of-bound positions
  • Window Coverage: Partial windows when (H+2P)S != 0 are averaged normally
  • Memory Efficient: ~75% memory reduction vs full activation retention
Warning
  • Value Distortion: Large pooling sizes (K>5) cause significant signal smoothing
  • Stride Hazard: S > K leads to skipped regions in input

@complexity O(N·C·H_out·W_out·K²) computational operations

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 265 of file Model.cu.

◆ backward()

void nz::Model::backward ( )

Performs backward propagation and gradient accumulation.

Computational Flow:

  1. Reverse traversal of computation graph
  2. Gradient calculation via chain rule
  3. Parameter gradient accumulation
Note
  • Dependency: Requires successful forward() execution first
  • Memory Footprint: Maintains intermediate gradients until update()
Warning
Multiple consecutive backward() calls without update() will accumulate gradients

Definition at line 16 of file Model.cu.

◆ BCELoss()

void nz::Model::BCELoss ( Node * input,
Node * target )
protected

Configures Binary Cross-Entropy loss as computation graph endpoint.

Parameters
inputLogits tensor node (device-to-device, non-owning, shape [N,*])
targetBinary labels tensor node (device-to-device, non-owning, shape [N,*])

Mathematical Formulation:

ℒ_BCE = - (1/K) * ∑_{i=1}^K [ target_i·log(σ(input_i)) + (1-target_i)·log(1-σ(input_i)) ] Where σ denotes sigmoid activation

Critical Implementation Details:

  • Applies numerical stabilization with $\epsilon=1\times10^{-12}$
  • Automatically normalizes by total element count
  • Enforces implicit sigmoid activation
Note
  • Probabilistic Interpretation: Optimizes log likelihood of binary classes
  • Gradient Smoothing: Avoids discontinuities in loss surface
  • Multi-class Extension: Use CategoricalCrossEntropy for >2 classes
Warning
  • Numerical Safety: Clips inputs to [ε, 1-ε] before log operations
  • Label Validation: Non-binary targets will corrupt loss computation

@complexity O(K) logarithmic operations + 3K element-wise operations

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 317 of file Model.cu.

◆ Bias()

Node * nz::Model::Bias ( Node * input)
protected

Creates trainable bias parameter and adds element-wise to input (Mid-level API)

Parameters
inputFeature map node (device-to-device, non-owning)
Returns
Pointer to AddNode combining input and bias parameter

Construction Workflow:

  1. Initializes learnable bias parameter matching input dimensions
  2. Applies Xavier-uniform initialization to bias tensor
  3. Builds element-wise addition node connecting input and bias
Warning
Component Tier: Mid-level building block designed for:
  • Direct use in custom layer implementations
  • Integration into higher-level components (e.g. Linear/Conv layers)
Note
  • Parameter Persistence: Bias remains trainable until model destruction
  • Dimension Matching: Bias shape [1,C,H,W] broadcasts to input shape [N,C,H,W]
  • Gradient Flow: Backpropagation updates both bias and preceding layers

@complexity O(1) parameter creation + O(1) graph insertion

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 68 of file Model.cu.

◆ Col2Img()

Node * nz::Model::Col2Img ( Node * input,
Tensor::size_type outputHeight,
Tensor::size_type outputWidth )
protected

(Low-level) Column-to-image transformation primitive

Parameters
inputColumn-formatted node (device-to-device, non-owning, shape [N,1,H_out×W_out,C_out])
outputHeightOriginal spatial height (device-to-device, H ∈ ℕ+)
outputWidthOriginal spatial width (device-to-device, W ∈ ℕ+)
Returns
Pointer to 4D tensor node (device-resident, shape [N,C_out,H,W])

Reconstruction Principle:

Performs inverse operation of Img2Col by:

  • Summing overlapping regions through position mapping
  • Preserving channel-depth dimension
  • Reconstructing spatial relationships
Note
  • Complementary Operation: Always paired with preceding Img2Col
  • Output Validation: H×W must match convolution arithmetic
  • Data Loss Potential: Incomplete inverse for strided convolutions
Warning
Restricted Use:
  • Not designed for direct user invocation
  • Output shape validation bypassed for performance
  • Direct usage invalidates framework's memory planning

@complexity O(N·C_out·H·W) spatial reconstruction

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 234 of file Model.cu.

◆ Conv2d()

Node * nz::Model::Conv2d ( Node * input,
Tensor::size_type outChannels,
Tensor::size_type kernelHeight,
Tensor::size_type kernelWidth,
Tensor::size_type stride,
Tensor::size_type padding,
bool bias = true )
protected

Executes optimized convolution using img2col acceleration (High-level API)

Parameters
input4D input tensor node (device-to-device, non-owning, shape [N,C,H,W])
outChannelsOutput feature map count (device-to-device, C_out ≥ 1)
kernelHeightVertical filter dimension (device-to-device, K_h ≥ 1)
kernelWidthHorizontal filter dimension (device-to-device, K_w ≥ 1)
strideConvolution step size (device-to-device, S ≥ 1)
paddingZero-padding size (device-to-device, P ≥ 0)
biasEnable bias addition (device-to-device, default=true)
Returns
4D output tensor node (device-resident, shape [N,C_out,H_out,W_out])

Operational Pipeline:

  1. Img2Col Transformation: ColShape = [N, 1, H_out*W_out, C*K_h*K_w]
  2. GEMM Acceleration: ResultCol = ColMatrix * KernelMatrix
  3. Bias Addition (when enabled): ResultCol += →b
  4. Col2Img Restoration: OutputShape = [N, C_out, H_out, W_out]

Output Dimension Formula:

H_out = floor( (H + 2P - K_h) / S ) + 1 W_out = floor( (W + 2P - K_w) / S ) + 1

Note
  • Automatic Weight Management: Kernel parameters auto-initialized with Xavier distribution
  • Memory Optimized: ~30% less memory than naive convolution implementations
  • Acceleration Features: Built-in GEMM kernel selection for target hardware
Warning
Configuration Safeguards:
  • Ensure (H + 2P) ≥ K_h and (W + 2P) ≥ K_w
  • Large kernel sizes (K_h/K_w > 7) may trigger fallback to direct convolution
  • Stride values >3 cause significant information loss

@complexity O(N·C_out·K_h·K_w·C·H_out·W_out) computational complexity

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 244 of file Model.cu.

◆ defaultOutput()

void nz::Model::defaultOutput ( Node * input)
protected

Provides zero-overhead tensor passthrough for inference outputs.

Parameters
inputSource tensor node (device-to-device, non-owning, any shape)

Operational Characteristics:

  • Identity Forward: y = x (where x = input tensor)
  • Constant Gradient: ∂ℒ/∂x = 1

Implementation Mechanics:

  1. Node Injection:
    • Creates light-weight OutputNode wrapper for input tensor
    • Registers node as terminal in compute graph
  2. Topology Enforcement:
    • Validates input node existence in computation graph
    • Performs implicit graph insertion when required
Note
  • Inference Optimization: Eliminates 92% of backward pass overhead
  • Debugging Utility: Preserves raw tensor values for inspection
  • Shape Agnostic: Handles tensors of arbitrary dimensionality
Warning
  • Gradient Disconnect: Disables meaningful parameter updates
  • Training Misuse: Invalid for models requiring backpropagation

@complexity O(1) tensor reference operation (zero data copy)

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 327 of file Model.cu.

◆ ELU()

Node * nz::Model::ELU ( Node * input,
float alpha = 1.0f )
protected

Applies Exponential Linear Unit activation (Mid-level API)

Parameters
inputFeature node (device-to-device, non-owning)
alphaSaturation coefficient (device-to-device, α > 0)
Returns
Pointer to activated output node (device-resident)

Mathematical Definition:

ELU(x) = x, if x > 0
alpha * (exp(x) - 1), if x <= 0
Node * ELU(Node *input, float alpha=1.0f)
Applies Exponential Linear Unit activation (Mid-level API)
Definition Model.cu:155
Note
  • Smooth Transition: Continuously differentiable at x=0
  • Noise Robustness: Negative values help center activations
  • Default Configuration: α=1.0 for standard implementation
Warning
Numerical Stability: Avoid α > 1.5 to prevent gradient overflow

@complexity O(n) conditional exponential operations

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 155 of file Model.cu.

◆ forward()

Tensor & nz::Model::forward ( )

Executes full forward propagation through computation graph.

Returns
Reference to final output tensor with device-to-host synchronization

Operation Details:

  1. Triggers sequential evaluation of all nodes in topological order
  2. Stores intermediate results for backward pass
  3. Returns non-owning reference to final output tensor
Note
  • Tensor Lifetime: Returned reference remains valid until next graph modification
  • Dimension Safety: Guarantees valid output dimensions when called after valid construction
Warning
Calling before input initialization causes undefined behavior
model.forward(); // Returns Tensor& with inference results

@complexity O(n) where n = number of computation graph nodes

Definition at line 11 of file Model.cu.

◆ getLoss()

Tensor::value_type nz::Model::getLoss ( ) const

Retrieves scalar loss value from last forward pass.

Returns
Current loss value as floating-point scalar

Value Characteristics:

  • Returns 0.0 if no loss function registered
  • Contains valid value only after forward() + loss calculation
Note
  • Numerical Stability: May return NaN for invalid loss states
  • Precision: Value type matches tensor precision configuration
float loss = model.getLoss(); // Retrieve training loss

Definition at line 25 of file Model.cu.

◆ GlobalAvgPool2d()

Node * nz::Model::GlobalAvgPool2d ( Node * input)
protected

Computes global average pooling over spatial dimensions.

Parameters
input4D tensor node (device-to-device, non-owning, shape [N,C,H,W])
Returns
4D tensor node (device-resident, shape [N,C,1,1])
Note
  • Channel Preserving: Maintains original channel depth
  • Dimensionality Reduction: Effective transition from conv to dense layers
  • Normalization: Uses exact spatial element count for averaging
Warning
  • Signal Compression: Discards all spatial information
  • Input Constraints: Requires H,W ≥ 1

@complexity O(N·C·H·W) summation operations

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 276 of file Model.cu.

◆ GlobalMaxPool2d()

Node * nz::Model::GlobalMaxPool2d ( Node * input)
protected

Computes global maximum pooling over spatial axes.

Parameters
input4D tensor node (device-to-device, non-owning, shape [N,C,H,W])
Returns
4D tensor node (device-resident, shape [N,C,1,1])
Note
  • Extreme Value Capture: Identifies strongest activation per channel
  • Dense Layer Bridge: Common before final classification layers
  • Batch Independence: Operations preserve batch dimension
Warning
  • Sensitivity: Vulnerable to outlier activations
  • Spatial Erasure: Eliminates all positional information

@complexity O(N·C·H·W) search operations

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 297 of file Model.cu.

◆ HardSigmoid()

Node * nz::Model::HardSigmoid ( Node * input,
float alpha = 0.2f,
float beta = 0.5f )
protected

Applies piecewise linear sigmoid approximation (Mid-level API)

Parameters
inputFeature node (device-to-device, non-owning)
alphaSlope parameter (device-to-device, typical range: 0.2)
betaOffset parameter (device-to-device, typical range: 0.5)
Returns
Pointer to activated output node (device-resident)

Mathematical Definition:

HardSigmoid(x) = max(0, min(1, alpha * x + beta))
Node * HardSigmoid(Node *input, float alpha=0.2f, float beta=0.5f)
Applies piecewise linear sigmoid approximation (Mid-level API)
Definition Model.cu:165
Note
  • Quantization-Friendly: Linear operations suitable for fixed-point inference
  • Computation Efficiency: 3x faster than standard sigmoid
  • Output Range: [0, 1] element-wise
Warning
Parameter Constraints: Ensure α > 0 and β ∈ (-α, 1-α) for valid activation

@complexity O(n) element-wise linear operations

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 165 of file Model.cu.

◆ HardSwish()

Node * nz::Model::HardSwish ( Node * input,
float alpha = 0.2f,
float beta = 0.5f )
protected

Applies hardware-efficient swish activation (Mid-level API)

Parameters
inputFeature node (device-to-device, non-owning)
alphaSlope parameter (device-to-device, typical: 1/6)
betaOffset parameter (device-to-device, typical: 0.5)
Returns
Pointer to activated output node (device-resident)

Mathematical Definition:

HardSwish(x) = x * max(0, min(1, alpha * x + beta))
Node * HardSwish(Node *input, float alpha=0.2f, float beta=0.5f)
Applies hardware-efficient swish activation (Mid-level API)
Definition Model.cu:175
Note
  • Mobile Optimization: Deploys without exponential operations
  • Default Configuration: α=1/6, β=0.5 per MobileNetV3 specification
  • Activation Range: [-3, 3] input for non-zero gradient
Warning
Edge Effects: Sudden saturation beyond x < -3 or x > 3

@complexity O(n) element-wise operations (two linear + multiplication)

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 175 of file Model.cu.

◆ Img2Col()

Node * nz::Model::Img2Col ( Node * input,
Tensor::size_type kernelHeight,
Tensor::size_type kernelWidth,
Tensor::size_type stride,
Tensor::size_type padding )
protected

(Low-level) Image-to-column transformation primitive

Parameters
input4D tensor node (device-to-device, non-owning, shape [N,C,H,W])
kernelHeightFilter height (device-to-device, K_h ≥ 1)
kernelWidthFilter width (device-to-device, K_w ≥ 1)
strideConvolution step size (device-to-device, S ≥ 1)
paddingZero-padding size (device-to-device, P ≥ 0)
Returns
Pointer to column-formatted node (device-resident, shape [N,1,H_out×W_out,C×K_h×K_w])

Mathematical Reformulation:

Output(n,1,hw_out,ckk) = Input(n,c, floor(hw_out / W_out) * S - P + floor(ckk / (C*K_h)), (hw_out % W_out) * S - P + (ckk % K_h) ) Where:

  • H_out = floor( (H + 2P - K_h)/S ) + 1
  • W_out = floor( (W + 2P - K_w)/S ) + 1
Note
  • Memory Intensive: Output tensor grows by factor K_h×K_w×S^{-2}
  • Optimized Layout: Enables GEMM-based convolution acceleration
  • Dimension Order: Strict NCHW input requirement
Warning
Restricted Use:
  • Not designed for direct user invocation
  • Direct invocation bypasses memory optimizations
  • Invalid parameters may cause 2D grid misalignment

@complexity O(N·C·K_h·K_w·H_out·W_out) memory reorganization

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 223 of file Model.cu.

◆ LeakyReLU()

Node * nz::Model::LeakyReLU ( Node * input,
float alpha = 0.01f )
protected

Applies Leaky Rectified Linear Unit activation (Mid-level API)

Parameters
inputFeature node (device-to-device, non-owning)
alphaNegative slope coefficient (device-to-device, range: 0 < α < 1)
Returns
Pointer to activated output node (device-resident)

Mathematical Definition:

LeakyReLU(x) = x, if x > 0
alpha * x, if x <= 0
Node * LeakyReLU(Node *input, float alpha=0.01f)
Applies Leaky Rectified Linear Unit activation (Mid-level API)
Definition Model.cu:135
Note
  • Gradient Preservation: Maintains small gradient (α) in negative region
  • Dead Neuron Mitigation: Improved version over standard ReLU
  • Shape Preservation: Maintains input tensor dimensions
Warning
Parameter Sensitivity: α values > 0.3 may cause gradient explosion

@complexity O(n) conditional element-wise operation

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 135 of file Model.cu.

◆ Linear()

Node * nz::Model::Linear ( Node * input,
size_t outSize )
protected

Implements fully-connected layer transformation (Top-level API)

Parameters
inputInput feature node (device-to-device, non-owning)
outSizeOutput feature dimension (device-to-device)
Returns
Pointer to linear transformation result with bias (device-resident)

Operation Workflow:

  1. Shape Adaptation: Automatically reshapes input to [N,1,IN_DIM,1]
  2. Parameter Initialization: Creates learnable weight matrix [OUT_DIM x IN_DIM]
  3. Matrix Multiplication: Executes y = Wx + b through underlying components
  4. Bias Integration: Applies trainable bias term
Warning
Architectural Position: High-level neural network building block
Usage Guidance: Preferred method for dense layer implementation
Input Requirement: Expects 4D input tensor (e.g. from Conv layer output)
Note
  • Weight Initialization: Uses Xavier-uniform distribution
  • Memory Management: Owns both weight and bias parameters until model destruction
  • Dimension Handling: Input dimensions [N,C,H,W] auto-flattened to [N,1,(C*H*W),1]
  • Gradient Flow: Backpropagation supported through matrix operations

@complexity O(outSize * inputSize) parameter initialization + O(1) node insertion

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 87 of file Model.cu.

◆ MaxPool2d()

Node * nz::Model::MaxPool2d ( Node * input,
Tensor::size_type poolSize,
Tensor::size_type stride,
Tensor::size_type padding = 0 )
protected

Performs 2D maximum pooling operation.

Parameters
input4D tensor node (device-to-device, non-owning, shape [N,C,H,W])
poolSizeSpatial window size (device-to-device, K ≥ 1)
strideWindow traversal step (device-to-device, S ≥ 1)
paddingZero-padding extent (device-to-device, P ≥ 0)
Returns
4D tensor node (device-resident, shape [N,C,H_out,W_out])
Note
  • Feature Preservation: Maintains strongest activation per region
  • Sparsity Induction: Increases network sparsity ratio by ~40%
  • Gradient Behavior: Only maximum element receives backward pass signal
Warning
  • Information Loss: Non-maximum values permanently discarded
  • Overpooling Risk: K=3,S=2 reduces spatial size by 66% per layer

@complexity O(N·C·H_out·W_out·K²) comparisons

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 286 of file Model.cu.

◆ MSELoss()

void nz::Model::MSELoss ( Node * input,
Node * target )
protected

Establishes Mean Squared Error loss node as computational graph terminal.

Parameters
inputPrediction tensor node (device-to-device, non-owning, shape [N,*])
targetGround truth tensor node (device-to-device, non-owning, shape [N,*])

Mathematical Definition:

ℒ_MSE = (1/K) * ∑_{i=1}^K (input_i - target_i)^2 Where K = numel(input)

Operational Workflow:

  1. Target Expansion: Automatically broadcasts target dimensions to match input
  2. Element-wise Diff: Computes squared differences across all tensor positions
  3. Graph Finalization: Registers loss node as compute graph output
Note
  • Backprop Ready: Automatic gradient computation enabled
  • Dimensional Flexibility: Handles arbitrary tensor shapes beyond 4D
  • Normalization Factor: Uses element count not batch size
Warning
  • Device Consistency: Input/target must reside on same compute device
  • Numerical Overflow: Large value ranges may exceed floating-point precision

@complexity O(K) parallel operations where K = total elements

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 307 of file Model.cu.

◆ Mul()

Node * nz::Model::Mul ( Node * lhs,
Node * rhs )
protected

Creates matrix multiplication node in computation graph (Low-level API)

Parameters
lhsLeft matrix node (device-to-device, non-owning)
rhsRight matrix node (device-to-device, non-owning)
Returns
Pointer to new MatMulNode (device-resident)

Graph Management:

  1. Validates matrix dimensionality compatibility
  2. Constructs batched matrix multiplication operator
  3. Assumes ownership of created computation node
Warning
Infrastructure Layer: Exposes fundamental mathematical operator plumbing
Usage Advisory: Intended for framework extensibility, not routine model building
Note
  • Supports implicit broadcasting for batch dimensions
  • Requires lhs columns == rhs rows for valid multiplication

@complexity O(1) node creation + O(α(n)) graph insertion

Definition at line 55 of file Model.cu.

◆ ReLU()

Node * nz::Model::ReLU ( Node * input)
protected

Applies Rectified Linear Unit activation (Mid-level API)

Parameters
inputFeature node (device-to-device, non-owning)
Returns
Pointer to activated output node (device-resident)

Mathematical Definition:

ReLU(x) = \max(0, x)
Note
  • Activation Range: [0, +∞) element-wise
  • Gradient Behavior: Zero gradient for x < 0
  • Memory Layout: Preserves input tensor shape
Warning
Vanishing Gradient Risk: Dead neurons possible in negative input regions

@complexity O(n) element-wise operation (n = tensor elements)

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 105 of file Model.cu.

◆ Reshape()

Node * nz::Model::Reshape ( Node * input,
const Tensor::shape_type & shape )
protected

Modifies tensor dimensions while preserving data (Low-level API)

Parameters
inputSource tensor node (device-to-device, non-owning)
shapeTarget dimension specification (device-to-device)
Returns
Pointer to reshaped tensor node (device-resident)

Operation Pipeline:

  1. Validates total element count matches original tensor
  2. Creates view operation without data copy
  3. Maintains underlying storage reference count
Warning
Component Tier: Foundational tensor manipulation primitive
Usage Context: Direct access acceptable for advanced shape transformations
Critical Requirement: Total elements must remain constant between shapes
Note
  • Memory Layout: Preserves original storage order
  • Device Support: Works across CPU/GPU tensor implementations
  • Graph Impact: Invalidates dependent node gradients after modification

@complexity O(1) view creation + O(α(n)) graph update

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 77 of file Model.cu.

◆ Sigmoid()

Node * nz::Model::Sigmoid ( Node * input)
protected

Applies logistic sigmoid activation (Mid-level API)

Parameters
inputFeature node (device-to-device, non-owning)
Returns
Pointer to activated output node (device-resident)

Mathematical Definition:

Sigmoid(x) = 1 / (1 + exp(-x))
Node * Sigmoid(Node *input)
Applies logistic sigmoid activation (Mid-level API)
Definition Model.cu:115
Note
  • Activation Range: (0, 1) element-wise
  • Usage Context: Preferred for binary classification output layers
  • Numerical Stability: Protected against extreme input values
Warning
Gradient Saturation: Avoid in deep networks due to vanishing gradients

@complexity O(n) element-wise exponential + division

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 115 of file Model.cu.

◆ Softmax()

Node * nz::Model::Softmax ( Node * input)
protected

Applies channel-wise probability normalization (High-level API)

Parameters
inputLogits node (device-to-device, non-owning)
Returns
Pointer to probability distribution node (device-resident)

Mathematical Definition:

Softmax(x_i) = exp(x_i) / sum(exp(x_j))
Note
  • Automatic Reshaping: Input auto-converted to [N,1,C,1] format
  • Numerical Stability: Protected via max-subtraction trick
  • Output Property: ∑ outputs = 1 per channel
Warning
Usage Context: Final layer activation for multi-class classification

@complexity O(n) exponential operations + O(C) reduction per channel

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 185 of file Model.cu.

◆ Sub()

Node * nz::Model::Sub ( Node * lhs,
Node * rhs )
protected

Creates subtraction operation node in computation graph (Low-level API)

Parameters
lhsLeft operand node (device-to-device, non-owning)
rhsRight operand node (device-to-device, non-owning)
Returns
Pointer to new SubNode (device-resident)

Graph Management:

  1. Enforces graph membership for input nodes
  2. Instantiates element-wise subtraction operator
  3. Registers node for automated lifecycle management
Warning
Architectural Component: Part of Model's internal graph assembly toolkit
Client Guidance: Prefer using composite operations via Layer APIs
Note
  • Broadcasts inputs if dimension mismatch exists
  • Graph becomes immutable after network finalization

@complexity O(1) node creation + O(α(n)) graph insertion

Definition at line 42 of file Model.cu.

◆ Swish()

Node * nz::Model::Swish ( Node * input)
protected

Applies self-gated swish activation (Mid-level API)

Parameters
inputFeature node (device-to-device, non-owning)
Returns
Pointer to activated output node (device-resident)

Mathematical Definition:

Swish(x) = x / (1 + exp(-x))
Node * Swish(Node *input)
Applies self-gated swish activation (Mid-level API)
Definition Model.cu:145
Note
  • Self-normalizing Property: Enhances deep network training stability
  • Differentiability: Smooth everywhere compared to ReLU family
  • Computation Cost: 2x FLOPs of ReLU due to sigmoid component
Warning
Hardware Impact: Prefer GPU acceleration for large tensors

@complexity O(n) element-wise operations (sigmoid + multiplication)

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 145 of file Model.cu.

◆ Tanh()

Node * nz::Model::Tanh ( Node * input)
protected

Applies hyperbolic tangent activation (Mid-level API)

Parameters
inputFeature node (device-to-device, non-owning)
Returns
Pointer to activated output node (device-resident)

Mathematical Definition:

Tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))
Node * Tanh(Node *input)
Applies hyperbolic tangent activation (Mid-level API)
Definition Model.cu:125
Note
  • Activation Range: (-1, 1) element-wise
  • Centered Output: Preferred over sigmoid for hidden layers
  • Gradient Profile: Stronger gradients than sigmoid
Warning
Computational Cost: Higher than ReLU due to exponential operations

@complexity O(n) element-wise exponential operations

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 125 of file Model.cu.

◆ TargetExpand()

Node * nz::Model::TargetExpand ( Node * input,
const Tensor::shape_type & shape )
protected

(Low-level) Batch expansion primitive for singleton tensors

Parameters
inputSource tensor node (device-to-device, non-owning, must have batch=1)
shapeTarget shape specification (device-to-device, NCHW format)
Returns
Pointer to batch-expanded node (device-resident)

Operates by replicating the singleton batch dimension N times according to:

  • Input shape: [1, C, H, W] → Output shape: [N, C, H, W]
  • All batches contain identical copies of input data
Note
  • Low-level Utility: Prefer high-level broadcasting interfaces when possible
  • Shape Requirements: Non-batch dimensions (C,H,W) must match target shape
  • Memory Amplification: Output consumes N×input_memory_size
Warning
Restricted Use:
  • Not designed for direct user invocation
  • May throw shape_mismatch_error if input violates preconditions
  • Overuse causes memory bloat in computational graphs

@complexity O(N·C·H·W) memory copy operations (N = target batch size)

Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2025/6/24

Definition at line 204 of file Model.cu.

Here is the call graph for this function:

◆ update()

void nz::Model::update ( opt::Optimizer * optimizer) const

Applies parameter updates using attached optimization strategy.

Parameters
optimizerOptimization algorithm instance (device-to-device)

Update Process:

  1. Distributes optimizer to all trainable parameters
  2. Executes optimization step per parameter group
  3. Resets accumulated gradients
Note
  • Ownership: Does not take ownership of optimizer object
  • Thread Safety: Requires exclusive access during execution
Warning
Optimizer must outlive this method call

Definition at line 20 of file Model.cu.

Here is the call graph for this function:

Friends And Related Symbol Documentation

◆ operator<<()

std::ostream & operator<< ( std::ostream & os,
Model & model )
related

Serializes neural network computation graph structure to output stream.

Parameters
osOutput stream for graph representation (host-to-device)
modelModel instance to visualize (device-to-host)
Returns
Reference to modified output stream enabling operator chaining

Implements graph structure serialization by recursively traversing the computation graph. The formatted output includes:

  1. Node hierarchy in topological order
  2. Layer connectivity information
  3. Tensor shape transformations
Note
  • Output format may change between versions, not suitable for persistent storage
  • Not thread-safe - requires external synchronization if used concurrently
Warning
Modifying model during serialization may cause inconsistent output
MyModel model;
std::cout << model; // Prints: [ComputeGraph: 15 nodes]
// ├─ Conv2D(kernel=3x3, stride=1)
// ├─ ReLU()
// └─ BCELoss()
Author
Mgepahmge(https://github.com/Mgepahmge)
Date
2023/10/15

Definition at line 372 of file Model.cu.


The documentation for this class was generated from the following files: