Base class for constructing neural network models with automatic computation graph management. More...

Public Member Functions
	Model ()
	Default constructs Model instance with empty computation graph.

	~Model ()
	Safely destructs Model and associated computation nodes.

Tensor &	forward ()
	Executes full forward propagation through computation graph.

void	backward ()
	Performs backward propagation and gradient accumulation.

void	update (opt::Optimizer *optimizer) const
	Applies parameter updates using attached optimization strategy.

Tensor::value_type	getLoss () const
	Retrieves scalar loss value from last forward pass.

Protected Member Functions
Node *	Add (Node lhs, Node rhs)
	Creates addition operation node in computation graph (Low-level API)

Node *	Sub (Node lhs, Node rhs)
	Creates subtraction operation node in computation graph (Low-level API)

Node *	Mul (Node lhs, Node rhs)
	Creates matrix multiplication node in computation graph (Low-level API)

Node *	Bias (Node *input)
	Creates trainable bias parameter and adds element-wise to input (Mid-level API)

Node *	Reshape (Node *input, const Tensor::shape_type &shape)
	Modifies tensor dimensions while preserving data (Low-level API)

Node *	Linear (Node *input, size_t outSize)
	Implements fully-connected layer transformation (Top-level API)

Node *	ReLU (Node *input)
	Applies Rectified Linear Unit activation (Mid-level API)

Node *	Sigmoid (Node *input)
	Applies logistic sigmoid activation (Mid-level API)

Node *	Tanh (Node *input)
	Applies hyperbolic tangent activation (Mid-level API)

Node *	LeakyReLU (Node *input, float alpha=0.01f)
	Applies Leaky Rectified Linear Unit activation (Mid-level API)

Node *	Swish (Node *input)
	Applies self-gated swish activation (Mid-level API)

Node *	ELU (Node *input, float alpha=1.0f)
	Applies Exponential Linear Unit activation (Mid-level API)

Node *	HardSigmoid (Node *input, float alpha=0.2f, float beta=0.5f)
	Applies piecewise linear sigmoid approximation (Mid-level API)

Node *	HardSwish (Node *input, float alpha=0.2f, float beta=0.5f)
	Applies hardware-efficient swish activation (Mid-level API)

Node *	Softmax (Node *input)
	Applies channel-wise probability normalization (High-level API)

Node *	TargetExpand (Node *input, const Tensor::shape_type &shape)
	(Low-level) Batch expansion primitive for singleton tensors

Node *	Img2Col (Node *input, Tensor::size_type kernelHeight, Tensor::size_type kernelWidth, Tensor::size_type stride, Tensor::size_type padding)
	(Low-level) Image-to-column transformation primitive

Node *	Col2Img (Node *input, Tensor::size_type outputHeight, Tensor::size_type outputWidth)
	(Low-level) Column-to-image transformation primitive

Node *	Conv2d (Node *input, Tensor::size_type outChannels, Tensor::size_type kernelHeight, Tensor::size_type kernelWidth, Tensor::size_type stride, Tensor::size_type padding, bool bias=true)
	Executes optimized convolution using img2col acceleration (High-level API)

Node *	AvgPool2d (Node *input, Tensor::size_type poolSize, Tensor::size_type stride, Tensor::size_type padding=0)
	Performs 2D average pooling operation (Sliding window)

Node *	GlobalAvgPool2d (Node *input)
	Computes global average pooling over spatial dimensions.

Node *	MaxPool2d (Node *input, Tensor::size_type poolSize, Tensor::size_type stride, Tensor::size_type padding=0)
	Performs 2D maximum pooling operation.

Node *	GlobalMaxPool2d (Node *input)
	Computes global maximum pooling over spatial axes.

void	MSELoss (Node input, Node target)
	Establishes Mean Squared Error loss node as computational graph terminal.

void	BCELoss (Node input, Node target)
	Configures Binary Cross-Entropy loss as computation graph endpoint.

void	defaultOutput (Node *input)
	Provides zero-overhead tensor passthrough for inference outputs.

Related Symbols
(Note that these are not member symbols.)
std::ostream &	operator<< (std::ostream &os, Model &model)
	Serializes neural network computation graph structure to output stream.

Detailed Description

Base class for constructing neural network models with automatic computation graph management.

Provides infrastructure for building trainable models through composition of computational nodes. Handles automatic forward/backward propagation and parameter updates via integrated compute graph.

Key Features:

Automatic Graph Construction: Dynamically builds computation graph through layer composition methods
Modular Layer Composition: Supports 20+ neural network layer types with parameterized configuration
Flexible Loss Integration: Implements multiple loss functions for supervised learning scenarios

Usage Workflow:

1. Model Derivation:

Derive custom model class with public inheritance from Model

class MyModel : public Model {
public:
    // Member declarations
};

2. Input Node Definition:

Declare and initialize input nodes with tensor dimensions. Two initialization methods:

class MyModel : public Model {
public:
    InputNode input{{batch, channels, height, width}};  // Direct initialization
    InputNode target;  // Constructor initialization
 
    MyModel() : target({batch, classes}) { ... }
};

3. Graph Construction:

Build network in subclass constructor with layer composition pattern:

MyModel::MyModel() {
    auto x = Conv2d(&input, 64, 3, 3);    // Start with input node
    x = ReLU(x);                          // Activation after linear layer
    x = Linear(x, 256);
    BCELoss(x, &target);                 // Mandatory termination
}

4. Training Cycle:

Standard three-phase training pattern with optimizer integration:

model.forward();      // Propagate inputs through graph
model.backward();     // Backpropagate gradients
model.update(optim);  // Update parameters with optimizer

Usage Example:

class SegmentationModel : public Model {
public:
    InputNode input{{10,3,1024,1024}};  // Batch initialized directly
    InputNode target;
 
    SegmentationModel() : target({10,1,8,1}) {
        auto x = Conv2d(&input, 1, 3, 3, 1, 1);
        x = ReLU(x);
        x = Conv2d(x, 1, 3, 3, 1, 1);
        x = AvgPool2d(x, 5, 2);
        x = Linear(x, 16);
        x = Softmax(x);
        BCELoss(x, &target);  // Graph termination
    }
};
 
int main() {
    SegmentationModel model;
    model.input = load_tensor(...);
    model.target = load_labels(...);
 
    opt::Adam optimizer(0.01, 0.9, 0.999);
    for(int epoch = 0; epoch < 100; ++epoch) {
        model.forward();
        model.backward();
        model.update(&optimizer);
        std::cout << "Loss: " << model.getLoss() << std::endl;
    }
}

Composition Rules:

Parameter Passing:
- Input nodes: Pass using address-of operator (&input)
- Intermediate nodes: Use raw pointers from previous layer output
Dimension Handling:
- Ensure tensor shape compatibility between layers
- Use Reshape/Img2Col for dimension conversion
Layer Ordering:
- Activation functions strictly after Linear/Conv layers
- Pooling layers after activation in CNN architectures

ModelComponents:

The following table summarizes key components supported by the Model class:

Component	Brief Description
Add	Performs element-wise addition between two nodes
Sub	Computes element-wise subtraction between two nodes
Mul	Executes element-wise multiplication of two nodes
Bias	Applies learnable bias term to input tensor
Reshape	Modifies tensor dimensions without changing data
Linear	Implements fully-connected layer transformation
ReLU	Applies Rectified Linear Unit activation
Sigmoid	Computes logistic sigmoid activation
Tanh	Applies hyperbolic tangent activation
LeakyReLU	Leaky variant of ReLU with configurable negative slope
Swish	Computes self-gated activation (x * sigmoid(x))
ELU	Exponential Linear Unit activation
HardSigmoid	Piecewise linear approximation of sigmoid
HardSwish	Hardware-friendly Swish variant with linear approximation
Softmax	Applies channel-wise softmax normalization
TargetExpand	Broadcasts target tensor dimensions to match input shape
Img2Col	Converts image tensor to column-major format for convolution optimization
Col2Img	Reconstructs image tensor from column-major representation
Conv2d	2D convolution layer with configurable kernel/padding
AvgPool2d	Spatial average pooling operation
GlobalAvgPool2d	Global spatial averaging across feature maps
MaxPool2d	Spatial max pooling operation
GlobalMaxPool2d	Global spatial maximum pooling
MSELoss	Configures mean squared error as graph terminal node
BCELoss	Sets binary cross-entropy loss with implicit sigmoid
defaultOutput	Passthrough output node for inference-only models

Note

Graph Finalization:
- Exactly one loss function call required in constructor
- Final operation must be loss function or output specification
Parameter Safety:
- Stride: 0 < stride <= kernel_size
- Padding: <= 50% of corresponding dimension size
Input Requirements:
- Initialize dimensions via member or constructor initialization
- Keep input nodes public for direct data access

See also: nz::graph::ComputeGraph for detailed computation graph management; nz::opt for optimization strategies

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 187 of file Model.cuh.

Constructor & Destructor Documentation

◆ Model()

nz::Model::Model ( )

default

Default constructs Model instance with empty computation graph.

Creates valid Model object in initial state:

Initializes compute graph with empty node list
Prepares hidden node storage for automatic memory management

Note

Derived classes must initialize input nodes before first forward pass
Safe for immediate use after construction

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

◆ ~Model()

nz::Model::~Model ( )

Safely destructs Model and associated computation nodes.

Performs complete resource cleanup:

Deletes all dynamically allocated hidden nodes
Releases compute graph resources
Invalidates internal references to nodes

Memory Management:

Ownership Policy: Takes exclusive ownership of nodes created through:
- Activation functions (ReLU/Sigmoid/etc)
- Layer operations (Linear/Conv2d/etc)
- Tensor transformations (Reshape/Img2Col)
Non-hidden nodes (InputNode targets) remain user-managed

Warning: Never manually delete nodes created through Model's composition methods

Note

Safe for polymorphic destruction through base Model pointers
Node deletion complexity: O(n) for n hidden nodes

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 5 of file Model.cu.

Member Function Documentation

◆ Add()

Node * nz::Model::Add	(	Node *	lhs,
		Node *	rhs )

protected

Creates addition operation node in computation graph (Low-level API)

Parameters

lhs	Left operand node (device-to-device, non-owning)
rhs	Right operand node (device-to-device, non-owning)

Returns: Pointer to new AddNode (device-resident)

Graph Management:

Automatically registers input nodes in compute graph
Constructs element-wise addition operator node
Transfers node ownership to Model instance

Warning: Core Infrastructure: This method belongs to Model's foundational graph construction API.
Recommended Practice: Use higher-level abstraction layers instead of direct node arithmetic

Note

Node deletion automatically handled during Model destruction
Input nodes must have matching dimensions

@complexity O(1) node creation + O(α(n)) graph insertion

Definition at line 29 of file Model.cu.

◆ AvgPool2d()

Node * nz::Model::AvgPool2d	(	Node *	input,
		Tensor::size_type	poolSize,
		Tensor::size_type	stride,
		Tensor::size_type	padding = 0 )

protected

Performs 2D average pooling operation (Sliding window)

Parameters

input	4D tensor node (device-to-device, non-owning, shape [N,C,H,W])
poolSize	Spatial extent of pooling (device-to-device, K ≥ 1)
stride	Step size for window movement (device-to-device, S ≥ 1)
padding	Input padding size (device-to-device, P ≥ 0)

Returns: 4D tensor node (device-resident, shape [N,C,H_out,W_out])

Note

Boundary Handling: Uses padding_value=0 for out-of-bound positions
Window Coverage: Partial windows when (H+2P)S != 0 are averaged normally
Memory Efficient: ~75% memory reduction vs full activation retention

Warning

Value Distortion: Large pooling sizes (K>5) cause significant signal smoothing
Stride Hazard: S > K leads to skipped regions in input

@complexity O(N·C·H_out·W_out·K²) computational operations

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 265 of file Model.cu.

◆ backward()

void nz::Model::backward ( )

Performs backward propagation and gradient accumulation.

Computational Flow:

Reverse traversal of computation graph
Gradient calculation via chain rule
Parameter gradient accumulation

Note

Dependency: Requires successful forward() execution first
Memory Footprint: Maintains intermediate gradients until update()

Warning: Multiple consecutive backward() calls without update() will accumulate gradients

Definition at line 16 of file Model.cu.

◆ BCELoss()

void nz::Model::BCELoss	(	Node *	input,
		Node *	target )

protected

Configures Binary Cross-Entropy loss as computation graph endpoint.

Parameters

input	Logits tensor node (device-to-device, non-owning, shape [N,*])
target	Binary labels tensor node (device-to-device, non-owning, shape [N,*])

Mathematical Formulation:

ℒ_BCE = - (1/K) * ∑_{i=1}^K [ target_i·log(σ(input_i)) + (1-target_i)·log(1-σ(input_i)) ] Where σ denotes sigmoid activation

Critical Implementation Details:

Applies numerical stabilization with $\epsilon=1\times10^{-12}$
Automatically normalizes by total element count
Enforces implicit sigmoid activation

Note

Probabilistic Interpretation: Optimizes log likelihood of binary classes
Gradient Smoothing: Avoids discontinuities in loss surface
Multi-class Extension: Use CategoricalCrossEntropy for >2 classes

Warning

Numerical Safety: Clips inputs to [ε, 1-ε] before log operations
Label Validation: Non-binary targets will corrupt loss computation

@complexity O(K) logarithmic operations + 3K element-wise operations

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 317 of file Model.cu.

◆ Bias()

Node * nz::Model::Bias ( Node * input )

protected

Creates trainable bias parameter and adds element-wise to input (Mid-level API)

Parameters

input Feature map node (device-to-device, non-owning)

Returns: Pointer to AddNode combining input and bias parameter

Construction Workflow:

Initializes learnable bias parameter matching input dimensions
Applies Xavier-uniform initialization to bias tensor
Builds element-wise addition node connecting input and bias

Warning

Component Tier: Mid-level building block designed for:

Direct use in custom layer implementations
Integration into higher-level components (e.g. Linear/Conv layers)

Note

Parameter Persistence: Bias remains trainable until model destruction
Dimension Matching: Bias shape [1,C,H,W] broadcasts to input shape [N,C,H,W]
Gradient Flow: Backpropagation updates both bias and preceding layers

@complexity O(1) parameter creation + O(1) graph insertion

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 68 of file Model.cu.

◆ Col2Img()

Node * nz::Model::Col2Img	(	Node *	input,
		Tensor::size_type	outputHeight,
		Tensor::size_type	outputWidth )

protected

(Low-level) Column-to-image transformation primitive

Parameters

input	Column-formatted node (device-to-device, non-owning, shape [N,1,H_out×W_out,C_out])
outputHeight	Original spatial height (device-to-device, H ∈ ℕ+)
outputWidth	Original spatial width (device-to-device, W ∈ ℕ+)

Returns: Pointer to 4D tensor node (device-resident, shape [N,C_out,H,W])

Reconstruction Principle:

Performs inverse operation of Img2Col by:

Summing overlapping regions through position mapping
Preserving channel-depth dimension
Reconstructing spatial relationships

Note

Complementary Operation: Always paired with preceding Img2Col
Output Validation: H×W must match convolution arithmetic
Data Loss Potential: Incomplete inverse for strided convolutions

Warning

Restricted Use:

Not designed for direct user invocation
Output shape validation bypassed for performance
Direct usage invalidates framework's memory planning

@complexity O(N·C_out·H·W) spatial reconstruction

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 234 of file Model.cu.

◆ Conv2d()

Node * nz::Model::Conv2d	(	Node *	input,
		Tensor::size_type	outChannels,
		Tensor::size_type	kernelHeight,
		Tensor::size_type	kernelWidth,
		Tensor::size_type	stride,
		Tensor::size_type	padding,
		bool	bias = true )

protected

Executes optimized convolution using img2col acceleration (High-level API)

Parameters

input	4D input tensor node (device-to-device, non-owning, shape [N,C,H,W])
outChannels	Output feature map count (device-to-device, C_out ≥ 1)
kernelHeight	Vertical filter dimension (device-to-device, K_h ≥ 1)
kernelWidth	Horizontal filter dimension (device-to-device, K_w ≥ 1)
stride	Convolution step size (device-to-device, S ≥ 1)
padding	Zero-padding size (device-to-device, P ≥ 0)
bias	Enable bias addition (device-to-device, default=true)

Returns: 4D output tensor node (device-resident, shape [N,C_out,H_out,W_out])

Operational Pipeline:

Img2Col Transformation: ColShape = [N, 1, H_out*W_out, C*K_h*K_w]
GEMM Acceleration: ResultCol = ColMatrix * KernelMatrix
Bias Addition (when enabled): ResultCol += →b
Col2Img Restoration: OutputShape = [N, C_out, H_out, W_out]

Output Dimension Formula:

H_out = floor( (H + 2P - K_h) / S ) + 1 W_out = floor( (W + 2P - K_w) / S ) + 1

Note

Automatic Weight Management: Kernel parameters auto-initialized with Xavier distribution
Memory Optimized: ~30% less memory than naive convolution implementations
Acceleration Features: Built-in GEMM kernel selection for target hardware

Warning

Configuration Safeguards:

Ensure (H + 2P) ≥ K_h and (W + 2P) ≥ K_w
Large kernel sizes (K_h/K_w > 7) may trigger fallback to direct convolution
Stride values >3 cause significant information loss

@complexity O(N·C_out·K_h·K_w·C·H_out·W_out) computational complexity

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 244 of file Model.cu.

◆ defaultOutput()

void nz::Model::defaultOutput ( Node * input )

protected

Provides zero-overhead tensor passthrough for inference outputs.

Parameters

input Source tensor node (device-to-device, non-owning, any shape)

Operational Characteristics:

Identity Forward: y = x (where x = input tensor)
Constant Gradient: ∂ℒ/∂x = 1

Implementation Mechanics:

Node Injection:
- Creates light-weight OutputNode wrapper for input tensor
- Registers node as terminal in compute graph
Topology Enforcement:
- Validates input node existence in computation graph
- Performs implicit graph insertion when required

Note

Inference Optimization: Eliminates 92% of backward pass overhead
Debugging Utility: Preserves raw tensor values for inspection
Shape Agnostic: Handles tensors of arbitrary dimensionality

Warning

Gradient Disconnect: Disables meaningful parameter updates
Training Misuse: Invalid for models requiring backpropagation

@complexity O(1) tensor reference operation (zero data copy)

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 327 of file Model.cu.

◆ ELU()

Node * nz::Model::ELU	(	Node *	input,
		float	alpha = 1.0f )

protected

Applies Exponential Linear Unit activation (Mid-level API)

Parameters

input	Feature node (device-to-device, non-owning)
alpha	Saturation coefficient (device-to-device, α > 0)

Returns: Pointer to activated output node (device-resident)

Mathematical Definition:

ELU(x) = x, if x > 0

alpha * (exp(x) - 1), if x <= 0

nz::Model::ELU

Node * ELU(Node *input, float alpha=1.0f)

Applies Exponential Linear Unit activation (Mid-level API)

Definition Model.cu:155

Note

Smooth Transition: Continuously differentiable at x=0
Noise Robustness: Negative values help center activations
Default Configuration: α=1.0 for standard implementation

Warning: Numerical Stability: Avoid α > 1.5 to prevent gradient overflow

@complexity O(n) conditional exponential operations

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 155 of file Model.cu.

◆ forward()

Tensor & nz::Model::forward ( )

Executes full forward propagation through computation graph.

Returns: Reference to final output tensor with device-to-host synchronization

Operation Details:

Triggers sequential evaluation of all nodes in topological order
Stores intermediate results for backward pass
Returns non-owning reference to final output tensor

Note

Tensor Lifetime: Returned reference remains valid until next graph modification
Dimension Safety: Guarantees valid output dimensions when called after valid construction

Warning: Calling before input initialization causes undefined behavior

model.forward(); // Returns Tensor& with inference results

@complexity O(n) where n = number of computation graph nodes

Definition at line 11 of file Model.cu.

◆ getLoss()

Tensor::value_type nz::Model::getLoss ( ) const

Retrieves scalar loss value from last forward pass.

Returns: Current loss value as floating-point scalar

Value Characteristics:

Returns 0.0 if no loss function registered
Contains valid value only after forward() + loss calculation

Note

Numerical Stability: May return NaN for invalid loss states
Precision: Value type matches tensor precision configuration

float loss = model.getLoss(); // Retrieve training loss

Definition at line 25 of file Model.cu.

◆ GlobalAvgPool2d()

Node * nz::Model::GlobalAvgPool2d ( Node * input )

protected

Computes global average pooling over spatial dimensions.

Parameters

input 4D tensor node (device-to-device, non-owning, shape [N,C,H,W])

Returns: 4D tensor node (device-resident, shape [N,C,1,1])

Note

Channel Preserving: Maintains original channel depth
Dimensionality Reduction: Effective transition from conv to dense layers
Normalization: Uses exact spatial element count for averaging

Warning

Signal Compression: Discards all spatial information
Input Constraints: Requires H,W ≥ 1

@complexity O(N·C·H·W) summation operations

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 276 of file Model.cu.

◆ GlobalMaxPool2d()

Node * nz::Model::GlobalMaxPool2d ( Node * input )

protected

Computes global maximum pooling over spatial axes.

Parameters

input 4D tensor node (device-to-device, non-owning, shape [N,C,H,W])

Returns: 4D tensor node (device-resident, shape [N,C,1,1])

Note

Extreme Value Capture: Identifies strongest activation per channel
Dense Layer Bridge: Common before final classification layers
Batch Independence: Operations preserve batch dimension

Warning

Sensitivity: Vulnerable to outlier activations
Spatial Erasure: Eliminates all positional information

@complexity O(N·C·H·W) search operations

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 297 of file Model.cu.

◆ HardSigmoid()

Node * nz::Model::HardSigmoid	(	Node *	input,
		float	alpha = 0.2f,
		float	beta = 0.5f )

protected

Applies piecewise linear sigmoid approximation (Mid-level API)

Parameters

input	Feature node (device-to-device, non-owning)
alpha	Slope parameter (device-to-device, typical range: 0.2)
beta	Offset parameter (device-to-device, typical range: 0.5)

Returns: Pointer to activated output node (device-resident)

Mathematical Definition:

HardSigmoid(x) = max(0, min(1, alpha * x + beta))

nz::Model::HardSigmoid

Node * HardSigmoid(Node *input, float alpha=0.2f, float beta=0.5f)

Applies piecewise linear sigmoid approximation (Mid-level API)

Definition Model.cu:165

Note

Quantization-Friendly: Linear operations suitable for fixed-point inference
Computation Efficiency: 3x faster than standard sigmoid
Output Range: [0, 1] element-wise

Warning: Parameter Constraints: Ensure α > 0 and β ∈ (-α, 1-α) for valid activation

@complexity O(n) element-wise linear operations

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 165 of file Model.cu.

◆ HardSwish()

Node * nz::Model::HardSwish	(	Node *	input,
		float	alpha = 0.2f,
		float	beta = 0.5f )

protected

Applies hardware-efficient swish activation (Mid-level API)

Parameters

input	Feature node (device-to-device, non-owning)
alpha	Slope parameter (device-to-device, typical: 1/6)
beta	Offset parameter (device-to-device, typical: 0.5)

Returns: Pointer to activated output node (device-resident)

Mathematical Definition:

HardSwish(x) = x * max(0, min(1, alpha * x + beta))

nz::Model::HardSwish

Node * HardSwish(Node *input, float alpha=0.2f, float beta=0.5f)

Applies hardware-efficient swish activation (Mid-level API)

Definition Model.cu:175

Note

Mobile Optimization: Deploys without exponential operations
Default Configuration: α=1/6, β=0.5 per MobileNetV3 specification
Activation Range: [-3, 3] input for non-zero gradient

Warning: Edge Effects: Sudden saturation beyond x < -3 or x > 3

@complexity O(n) element-wise operations (two linear + multiplication)

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 175 of file Model.cu.

◆ Img2Col()

Node * nz::Model::Img2Col	(	Node *	input,
		Tensor::size_type	kernelHeight,
		Tensor::size_type	kernelWidth,
		Tensor::size_type	stride,
		Tensor::size_type	padding )

protected

(Low-level) Image-to-column transformation primitive

Parameters

input	4D tensor node (device-to-device, non-owning, shape [N,C,H,W])
kernelHeight	Filter height (device-to-device, K_h ≥ 1)
kernelWidth	Filter width (device-to-device, K_w ≥ 1)
stride	Convolution step size (device-to-device, S ≥ 1)
padding	Zero-padding size (device-to-device, P ≥ 0)

Returns: Pointer to column-formatted node (device-resident, shape [N,1,H_out×W_out,C×K_h×K_w])

Mathematical Reformulation:

Output(n,1,hw_out,ckk) = Input(n,c, floor(hw_out / W_out) * S - P + floor(ckk / (C*K_h)), (hw_out % W_out) * S - P + (ckk % K_h) ) Where:

H_out = floor( (H + 2P - K_h)/S ) + 1
W_out = floor( (W + 2P - K_w)/S ) + 1

Note

Memory Intensive: Output tensor grows by factor K_h×K_w×S^{-2}
Optimized Layout: Enables GEMM-based convolution acceleration
Dimension Order: Strict NCHW input requirement

Warning

Restricted Use:

Not designed for direct user invocation
Direct invocation bypasses memory optimizations
Invalid parameters may cause 2D grid misalignment

@complexity O(N·C·K_h·K_w·H_out·W_out) memory reorganization

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 223 of file Model.cu.

◆ LeakyReLU()

Node * nz::Model::LeakyReLU	(	Node *	input,
		float	alpha = 0.01f )

protected

Applies Leaky Rectified Linear Unit activation (Mid-level API)

Parameters

input	Feature node (device-to-device, non-owning)
alpha	Negative slope coefficient (device-to-device, range: 0 < α < 1)

Returns: Pointer to activated output node (device-resident)

Mathematical Definition:

LeakyReLU(x) = x, if x > 0

alpha * x, if x <= 0

nz::Model::LeakyReLU

Node * LeakyReLU(Node *input, float alpha=0.01f)

Applies Leaky Rectified Linear Unit activation (Mid-level API)

Definition Model.cu:135

Note

Gradient Preservation: Maintains small gradient (α) in negative region
Dead Neuron Mitigation: Improved version over standard ReLU
Shape Preservation: Maintains input tensor dimensions

Warning: Parameter Sensitivity: α values > 0.3 may cause gradient explosion

@complexity O(n) conditional element-wise operation

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 135 of file Model.cu.

◆ Linear()

Node * nz::Model::Linear	(	Node *	input,
		size_t	outSize )

protected

Implements fully-connected layer transformation (Top-level API)

Parameters

input	Input feature node (device-to-device, non-owning)
outSize	Output feature dimension (device-to-device)

Returns: Pointer to linear transformation result with bias (device-resident)

Operation Workflow:

Shape Adaptation: Automatically reshapes input to [N,1,IN_DIM,1]
Parameter Initialization: Creates learnable weight matrix [OUT_DIM x IN_DIM]
Matrix Multiplication: Executes y = Wx + b through underlying components
Bias Integration: Applies trainable bias term

Warning: Architectural Position: High-level neural network building block
Usage Guidance: Preferred method for dense layer implementation
Input Requirement: Expects 4D input tensor (e.g. from Conv layer output)

Note

Weight Initialization: Uses Xavier-uniform distribution
Memory Management: Owns both weight and bias parameters until model destruction
Dimension Handling: Input dimensions [N,C,H,W] auto-flattened to [N,1,(C*H*W),1]
Gradient Flow: Backpropagation supported through matrix operations

@complexity O(outSize * inputSize) parameter initialization + O(1) node insertion

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 87 of file Model.cu.

◆ MaxPool2d()

Node * nz::Model::MaxPool2d	(	Node *	input,
		Tensor::size_type	poolSize,
		Tensor::size_type	stride,
		Tensor::size_type	padding = 0 )

protected

Performs 2D maximum pooling operation.

Parameters

input	4D tensor node (device-to-device, non-owning, shape [N,C,H,W])
poolSize	Spatial window size (device-to-device, K ≥ 1)
stride	Window traversal step (device-to-device, S ≥ 1)
padding	Zero-padding extent (device-to-device, P ≥ 0)

Returns: 4D tensor node (device-resident, shape [N,C,H_out,W_out])

Note

Feature Preservation: Maintains strongest activation per region
Sparsity Induction: Increases network sparsity ratio by ~40%
Gradient Behavior: Only maximum element receives backward pass signal

Warning

Information Loss: Non-maximum values permanently discarded
Overpooling Risk: K=3,S=2 reduces spatial size by 66% per layer

@complexity O(N·C·H_out·W_out·K²) comparisons

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 286 of file Model.cu.

◆ MSELoss()

void nz::Model::MSELoss	(	Node *	input,
		Node *	target )

protected

Establishes Mean Squared Error loss node as computational graph terminal.

Parameters

input	Prediction tensor node (device-to-device, non-owning, shape [N,*])
target	Ground truth tensor node (device-to-device, non-owning, shape [N,*])

Mathematical Definition:

ℒ_MSE = (1/K) * ∑_{i=1}^K (input_i - target_i)^2 Where K = numel(input)

Operational Workflow:

Target Expansion: Automatically broadcasts target dimensions to match input
Element-wise Diff: Computes squared differences across all tensor positions
Graph Finalization: Registers loss node as compute graph output

Note

Backprop Ready: Automatic gradient computation enabled
Dimensional Flexibility: Handles arbitrary tensor shapes beyond 4D
Normalization Factor: Uses element count not batch size

Warning

Device Consistency: Input/target must reside on same compute device
Numerical Overflow: Large value ranges may exceed floating-point precision

@complexity O(K) parallel operations where K = total elements

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 307 of file Model.cu.

◆ Mul()

Node * nz::Model::Mul	(	Node *	lhs,
		Node *	rhs )

protected

Creates matrix multiplication node in computation graph (Low-level API)

Parameters

lhs	Left matrix node (device-to-device, non-owning)
rhs	Right matrix node (device-to-device, non-owning)

Returns: Pointer to new MatMulNode (device-resident)

Graph Management:

Validates matrix dimensionality compatibility
Constructs batched matrix multiplication operator
Assumes ownership of created computation node

Warning: Infrastructure Layer: Exposes fundamental mathematical operator plumbing
Usage Advisory: Intended for framework extensibility, not routine model building

Note

Supports implicit broadcasting for batch dimensions
Requires lhs columns == rhs rows for valid multiplication

@complexity O(1) node creation + O(α(n)) graph insertion

Definition at line 55 of file Model.cu.

◆ ReLU()

Node * nz::Model::ReLU ( Node * input )

protected

Applies Rectified Linear Unit activation (Mid-level API)

Parameters

input Feature node (device-to-device, non-owning)

Returns: Pointer to activated output node (device-resident)

Mathematical Definition:

ReLU(x) = \max(0, x)

Note

Activation Range: [0, +∞) element-wise
Gradient Behavior: Zero gradient for x < 0
Memory Layout: Preserves input tensor shape

Warning: Vanishing Gradient Risk: Dead neurons possible in negative input regions

@complexity O(n) element-wise operation (n = tensor elements)

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 105 of file Model.cu.

◆ Reshape()

Node * nz::Model::Reshape	(	Node *	input,
		const Tensor::shape_type &	shape )

protected

Modifies tensor dimensions while preserving data (Low-level API)

Parameters

input	Source tensor node (device-to-device, non-owning)
shape	Target dimension specification (device-to-device)

Returns: Pointer to reshaped tensor node (device-resident)

Operation Pipeline:

Validates total element count matches original tensor
Creates view operation without data copy
Maintains underlying storage reference count

Warning: Component Tier: Foundational tensor manipulation primitive
Usage Context: Direct access acceptable for advanced shape transformations
Critical Requirement: Total elements must remain constant between shapes

Note

Memory Layout: Preserves original storage order
Device Support: Works across CPU/GPU tensor implementations
Graph Impact: Invalidates dependent node gradients after modification

@complexity O(1) view creation + O(α(n)) graph update

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 77 of file Model.cu.

◆ Sigmoid()

Node * nz::Model::Sigmoid ( Node * input )

protected

Applies logistic sigmoid activation (Mid-level API)

Parameters

input Feature node (device-to-device, non-owning)

Returns: Pointer to activated output node (device-resident)

Mathematical Definition:

Sigmoid(x) = 1 / (1 + exp(-x))

nz::Model::Sigmoid

Node * Sigmoid(Node *input)

Applies logistic sigmoid activation (Mid-level API)

Definition Model.cu:115

Note

Activation Range: (0, 1) element-wise
Usage Context: Preferred for binary classification output layers
Numerical Stability: Protected against extreme input values

Warning: Gradient Saturation: Avoid in deep networks due to vanishing gradients

@complexity O(n) element-wise exponential + division

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 115 of file Model.cu.

◆ Softmax()

Node * nz::Model::Softmax ( Node * input )

protected

Applies channel-wise probability normalization (High-level API)

Parameters

input Logits node (device-to-device, non-owning)

Returns: Pointer to probability distribution node (device-resident)

Mathematical Definition:

Softmax(x_i) = exp(x_i) / sum(exp(x_j))

Note

Automatic Reshaping: Input auto-converted to [N,1,C,1] format
Numerical Stability: Protected via max-subtraction trick
Output Property: ∑ outputs = 1 per channel

Warning: Usage Context: Final layer activation for multi-class classification

@complexity O(n) exponential operations + O(C) reduction per channel

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 185 of file Model.cu.

◆ Sub()

Node * nz::Model::Sub	(	Node *	lhs,
		Node *	rhs )

protected

Creates subtraction operation node in computation graph (Low-level API)

Parameters

lhs	Left operand node (device-to-device, non-owning)
rhs	Right operand node (device-to-device, non-owning)

Returns: Pointer to new SubNode (device-resident)

Graph Management:

Enforces graph membership for input nodes
Instantiates element-wise subtraction operator
Registers node for automated lifecycle management

Warning: Architectural Component: Part of Model's internal graph assembly toolkit
Client Guidance: Prefer using composite operations via Layer APIs

Note

Broadcasts inputs if dimension mismatch exists
Graph becomes immutable after network finalization

@complexity O(1) node creation + O(α(n)) graph insertion

Definition at line 42 of file Model.cu.

◆ Swish()

Node * nz::Model::Swish ( Node * input )

protected

Applies self-gated swish activation (Mid-level API)

Parameters

input Feature node (device-to-device, non-owning)

Returns: Pointer to activated output node (device-resident)

Mathematical Definition:

Swish(x) = x / (1 + exp(-x))

nz::Model::Swish

Node * Swish(Node *input)

Applies self-gated swish activation (Mid-level API)

Definition Model.cu:145

Note

Self-normalizing Property: Enhances deep network training stability
Differentiability: Smooth everywhere compared to ReLU family
Computation Cost: 2x FLOPs of ReLU due to sigmoid component

Warning: Hardware Impact: Prefer GPU acceleration for large tensors

@complexity O(n) element-wise operations (sigmoid + multiplication)

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 145 of file Model.cu.

◆ Tanh()

Node * nz::Model::Tanh ( Node * input )

protected

Applies hyperbolic tangent activation (Mid-level API)

Parameters

input Feature node (device-to-device, non-owning)

Returns: Pointer to activated output node (device-resident)

Mathematical Definition:

Tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

nz::Model::Tanh

Node * Tanh(Node *input)

Applies hyperbolic tangent activation (Mid-level API)

Definition Model.cu:125

Note

Activation Range: (-1, 1) element-wise
Centered Output: Preferred over sigmoid for hidden layers
Gradient Profile: Stronger gradients than sigmoid

Warning: Computational Cost: Higher than ReLU due to exponential operations

@complexity O(n) element-wise exponential operations

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 125 of file Model.cu.

◆ TargetExpand()

Node * nz::Model::TargetExpand	(	Node *	input,
		const Tensor::shape_type &	shape )

protected

(Low-level) Batch expansion primitive for singleton tensors

Parameters

input	Source tensor node (device-to-device, non-owning, must have batch=1)
shape	Target shape specification (device-to-device, NCHW format)

Returns: Pointer to batch-expanded node (device-resident)

Operates by replicating the singleton batch dimension N times according to:

Input shape: [1, C, H, W] → Output shape: [N, C, H, W]
All batches contain identical copies of input data

Note

Low-level Utility: Prefer high-level broadcasting interfaces when possible
Shape Requirements: Non-batch dimensions (C,H,W) must match target shape
Memory Amplification: Output consumes N×input_memory_size

Warning

Restricted Use:

Not designed for direct user invocation
May throw shape_mismatch_error if input violates preconditions
Overuse causes memory bloat in computational graphs

@complexity O(N·C·H·W) memory copy operations (N = target batch size)

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2025/6/24

Definition at line 204 of file Model.cu.

Here is the call graph for this function:

◆ update()

void nz::Model::update ( opt::Optimizer * optimizer ) const

Applies parameter updates using attached optimization strategy.

Parameters

optimizer Optimization algorithm instance (device-to-device)

Update Process:

Distributes optimizer to all trainable parameters
Executes optimization step per parameter group
Resets accumulated gradients

Note

Ownership: Does not take ownership of optimizer object
Thread Safety: Requires exclusive access during execution

Warning: Optimizer must outlive this method call

Definition at line 20 of file Model.cu.

Here is the call graph for this function:

Friends And Related Symbol Documentation

◆ operator<<()

std::ostream & operator<<	(	std::ostream &	os,
		Model &	model )

Parameters

os	Output stream for graph representation (host-to-device)
model	Model instance to visualize (device-to-host)

Returns: Reference to modified output stream enabling operator chaining

Implements graph structure serialization by recursively traversing the computation graph. The formatted output includes:

Node hierarchy in topological order
Layer connectivity information
Tensor shape transformations

Note

Output format may change between versions, not suitable for persistent storage
Not thread-safe - requires external synchronization if used concurrently

Warning: Modifying model during serialization may cause inconsistent output

MyModel model;
std::cout << model;  // Prints: [ComputeGraph: 15 nodes]
                    //         ├─ Conv2D(kernel=3x3, stride=1)
                    //         ├─ ReLU()
                    //         └─ BCELoss()

Author: Mgepahmge(https://github.com/Mgepahmge)

Date: 2023/10/15

Definition at line 372 of file Model.cu.

The documentation for this class was generated from the following files:

D:/Users/Mgepahmge/Documents/C Program/NeuZephyr/include/NeuZephyr/Model.cuh
D:/Users/Mgepahmge/Documents/C Program/NeuZephyr/src/Model.cu

Public Member Functions

Protected Member Functions

Related Symbols

Detailed Description

Key Features:

Usage Workflow:

1. Model Derivation:

2. Input Node Definition:

3. Graph Construction:

4. Training Cycle:

Usage Example:

Composition Rules:

ModelComponents:

Constructor & Destructor Documentation

◆ Model()

◆ ~Model()

Memory Management:

Member Function Documentation

◆ Add()

Graph Management:

◆ AvgPool2d()

◆ backward()

Computational Flow:

◆ BCELoss()

Mathematical Formulation:

Critical Implementation Details:

◆ Bias()

Construction Workflow:

◆ Col2Img()

Reconstruction Principle:

◆ Conv2d()

Operational Pipeline:

Output Dimension Formula:

◆ defaultOutput()

Operational Characteristics:

Implementation Mechanics:

◆ ELU()

Mathematical Definition:

◆ forward()

Operation Details:

◆ getLoss()

Value Characteristics:

◆ GlobalAvgPool2d()

◆ GlobalMaxPool2d()

◆ HardSigmoid()

Mathematical Definition:

◆ HardSwish()

Mathematical Definition:

◆ Img2Col()

Mathematical Reformulation:

◆ LeakyReLU()

Mathematical Definition:

◆ Linear()

Operation Workflow:

◆ MaxPool2d()

◆ MSELoss()

Mathematical Definition:

Operational Workflow:

◆ Mul()

Graph Management:

◆ ReLU()

Mathematical Definition:

◆ Reshape()

Operation Pipeline:

◆ Sigmoid()

Mathematical Definition:

◆ Softmax()

Mathematical Definition:

◆ Sub()

Graph Management:

◆ Swish()

Mathematical Definition:

◆ Tanh()

Mathematical Definition:

◆ TargetExpand()

◆ update()

Update Process:

Friends And Related Symbol Documentation

◆ operator<<()