dart_tensor_preprocessing 0.5.1
dart_tensor_preprocessing: ^0.5.1 copied to clipboard
High-performance tensor preprocessing library for Flutter/Dart. NumPy-like transforms pipeline for ONNX Runtime inference.
Changelog #
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.5.1 - 2026-01-13 #
Added #
-
BufferPool - Memory pooling API for buffer reuse (
buffer_pool.dart):- Singleton
BufferPool.instancefor global buffer reuse - Power-of-2 size bucketing for efficient allocation
- Per-dtype buffer pools (Float32, Float64, Int32, Uint8, etc.)
acquire(minSize, dtype)andrelease(buffer)methodsacquireFloat32(),acquireFloat64(), etc. convenience extensions- Max buffers per bucket limit (8) to prevent unbounded memory growth
pooledCountandpooledBytesfor monitoring
- Singleton
-
TypedData Views - Zero-copy tensor view utilities (
typed_data_views.dart):TypedDataViews.float32SublistView()- Zero-copy Float32List slicingTypedDataViews.float64SublistView()- Zero-copy Float64List slicingTypedDataViews.viewAs()- Create typed view from ByteBuffer at offsetTensorViewExtensionon TensorBuffer:sliceFirst(start, end)- Zero-copy slice along first dimensionisViewable- Check if tensor can be used as a viewtoChannelsLast()- NCHW to NHWC without copyingtoChannelsFirst()- NHWC to NCHW without copyingflatten()- 1D view of contiguous tensorunbind(dim)- Split tensor into views along dimensionselect(dim, index)- Select single index with reduced ranknarrow(dim, start, length)- Narrow dimension without copying
-
Utility Libraries (
lib/src/utils/):dtype_dispatcher.dart- DTypeDispatcher for dtype-specialized dispatchtensor_indexing.dart- TensorIndexer for index calculations (index2D, index3D, index4D, linearToCoords, coordsToLinear, computeStrides)
-
TensorBuffer/TensorStorage Factory Methods:
TensorBuffer.fromFloat64List()- Create tensor from Float64ListTensorStorage.fromFloat64List()- Create storage from Float64List
Changed #
-
SoftmaxOp Optimization: Now preserves input dtype (Float32/Float64) instead of always using Float64. Added dtype-specialized implementations for better performance.
-
Double-copy elimination: Operations now use
cloneForModification()pattern (input.isContiguous ? input.clone() : input.contiguous()) to avoid unnecessary copies:ReLUOp,LeakyReLUOp,SigmoidOp,TanhOp,SoftmaxOpAbsOp,NegOp,SqrtOp,ExpOp,LogOp(UnaryMathOp)NormalizeOp,ScaleOp
Internal #
- Added
cloneForModification()helper toRequiresContiguousmixin intransform_op.dart - Integrated
DTypeDispatcherinto activation ops (ReLUOp,LeakyReLUOp,SigmoidOp,TanhOp) for dtype-specialized loops - Integrated
DTypeDispatcherintoScaleOpfor consistent dtype handling - Replaced stride computation with
TensorIndexer.computeStrides()inSoftmaxOp(removed 3x code duplication)
0.5.0 - 2026-01-10 #
Added #
-
BatchNormOp - Batch normalization for CNN inference (
batch_norm_op.dart):- Full PyTorch-compatible
torch.nn.BatchNorm2dimplementation - Pre-computed scale/shift coefficients for efficient inference:
y = x * scale + shift - Supports 3D
[C,H,W]and 4D[N,C,H,W]tensors BatchNormOp.fromStateDict()factory for loading PyTorch weights- Dtype-specialized loops for Float32/Float64
- In-place support via
applyInPlace()
- Full PyTorch-compatible
-
LayerNormOp - Layer normalization for Transformer inference (
layer_norm_op.dart):- Full PyTorch-compatible
torch.nn.LayerNormimplementation - Normalizes over last N dimensions (e.g.,
[768]for BERT) - Welford's algorithm for numerically stable mean/variance computation
LayerNormOp.bert()andLayerNormOp.bertLarge()factory presetsLayerNormOp.fromStateDict()factory for loading PyTorch weights- Dtype-specialized loops for Float32/Float64
- In-place support via
applyInPlace()
- Full PyTorch-compatible
PyTorch Compatibility #
| Operation | PyTorch Equivalent |
|---|---|
BatchNormOp |
torch.nn.BatchNorm2d (inference) |
LayerNormOp |
torch.nn.LayerNorm |
0.4.1 - 2026-01-09 #
Performance Optimizations #
-
Dtype-specialized loops: Hot paths in transform operations now use dtype-specific code paths with direct
Float32List/Float64Listaccess, avoiding per-element switch overhead:NormalizeOp._normalize3D(),NormalizeOp._normalize4D()ScaleOp._scale()ClipOp._clip()GaussianBlurOp._applySeparableBlur()ResizeOp._resizeNearest(),_resizeBilinear(),_resizeBicubic()CenterCropOp._crop3D(),_crop4D()concat()with optimized axis=0 bulk copy
-
Clone-Before-Modify optimization:
ClipOp.apply()now avoids double copy by checkingisContiguousbefore deciding whether toclone()orcontiguous() -
Isolate threshold:
TensorPipeline.runAsync()now accepts optionalisolateThresholdparameter (default: 100,000 elements). Small tensors skip isolate overhead and run synchronously -
Buffer reuse:
GaussianBlurOpnow pre-allocates and reuses temp buffer across channels, reducing allocations -
Concat linear copy:
concat()now uses pre-computed strides for linear index calculation instead of recursive index computation. Axis=0 concatenation of contiguous tensors uses bulksetRange()copy -
Loop unrolling:
ResizeOp._resizeBicubic()unrolls 4x4 kernel with pre-computed weights and indices
0.4.0 - 2026-01-09 #
Added #
- Arithmetic Operations (
arithmetic_op.dart):AddOp- Element-wise addition (scalar or tensor)SubOp- Element-wise subtraction (scalar or tensor)MulOp- Element-wise multiplication (scalar or tensor)DivOp- Element-wise division (scalar or tensor)PowOp- Element-wise power operation
- Math Operations (
math_op.dart):AbsOp- Element-wise absolute valueNegOp- Element-wise negationSqrtOp- Element-wise square rootExpOp- Element-wise exponential (e^x)LogOp- Element-wise natural logarithm
- Activation Functions (
activation_op.dart):ReLUOp- Rectified Linear UnitLeakyReLUOp- Leaky ReLU with configurable negative slopeSigmoidOp- Sigmoid activationTanhOp- Hyperbolic tangent activationSoftmaxOp- Softmax along specified axis
- TensorBuffer Factory Methods:
TensorBuffer.full()- Create tensor filled with specified valueTensorBuffer.random()- Create tensor with uniform random values [0, 1)TensorBuffer.randn()- Create tensor with standard normal distributionTensorBuffer.eye()- Create identity matrix (supports rectangular)TensorBuffer.linspace()- Create tensor with evenly spaced valuesTensorBuffer.arange()- Create tensor with sequence values
- Utility Libraries (
lib/src/utils/):index_utils.dart- Index manipulation utilities (reflectIndex, replicateIndex, circularIndex)validation_utils.dart- Common tensor validation patterns
Changed #
- Exception Consistency:
TensorStorage._checkBounds()now throwsIndexOutOfBoundsExceptioninstead ofRangeErrorfor consistent exception handling across the library
Internal #
- Extracted duplicate
_reflectIndexcode frompad_op.dartandaugmentation_op.dartinto shared utility - Added
TensorValidationextension withrequireRank3Or4(),requireExactRank(),requireMinRank()methods
0.3.1 - 2026-01-08 #
Added #
- Performance benchmark suite (
benchmark/directory):tensor_creation_benchmark.dart- Tensor creation performancetensor_ops_benchmark.dart- Zero-copy and copy operationspipeline_benchmark.dart- Pipeline sync/async comparisonmemory_benchmark.dart- Memory usage measurementrun_all.dart- Unified benchmark runnerutils/benchmark_utils.dart- Benchmark utilities
Fixed #
- Removed unused variables in benchmark files
- Fixed lint issues in benchmark files
0.3.0 - 2026-01-08 #
Added #
ClipOp- Element-wise value clamping with factory presets (unit, symmetric, uint8)PadOp- Padding with multiple modes (constant, reflect, replicate, circular)SliceOp- Python-like tensor slicing with support for negative indices and stepsRandomCropOp- Random cropping for data augmentation with deterministic seed supportGaussianBlurOp- Gaussian blur using separable convolution with factory presetsconcat()- Utility function for tensor concatenation along specified axis
Fixed #
concat()axis-based copy logic now correctly handles multi-axis concatenation
Changed #
- BREAKING: Unified exception handling across the library
- All exceptions now extend
TensorExceptionsealed class ArgumentError→ShapeMismatchException,InvalidParameterExceptionRangeError→IndexOutOfBoundsException
- All exceptions now extend
0.2.0 - 2026-01-04 #
Added #
IndexOutOfBoundsException- Thrown when an index or axis is out of valid rangeDTypeMismatchException- Thrown when tensor data types do not match
Changed #
- BREAKING: Unified exception handling across the library
- All exceptions now extend
TensorExceptionsealed class ArgumentError→ShapeMismatchException,InvalidParameterExceptionRangeError→IndexOutOfBoundsExceptionStateError→NonContiguousException,DTypeMismatchException
- All exceptions now extend
- Shape validation now happens before buffer creation in
zeros()andones()
Migration Guide #
If you were catching standard Dart exceptions, update your code:
| Before | After |
|---|---|
on RangeError |
on IndexOutOfBoundsException |
on ArgumentError |
on ShapeMismatchException or on InvalidParameterException |
on StateError |
on NonContiguousException or on DTypeMismatchException |
0.1.4 - 2026-01-04 #
Added #
- Reduction operations for
TensorBuffer:sum()- Returns the sum of all elementsmean()- Returns the arithmetic mean of all elementsmin()- Returns the minimum valuemax()- Returns the maximum value
- Axis-wise reduction operations:
sumAxis(int axis, {bool keepDims})- Sum along a specific axismeanAxis(int axis, {bool keepDims})- Mean along a specific axisminAxis(int axis, {bool keepDims})- Min along a specific axismaxAxis(int axis, {bool keepDims})- Max along a specific axis
- Support for negative axis indexing in axis-wise operations
- Comprehensive test coverage for all reduction operations (49 tests)
0.1.3 - 2026-01-03 #
0.1.1 - 2025-12-27 #
Added #
- Comprehensive dartdoc comments for all public API elements
- Library-level documentation with usage examples
0.1.0 - 2025-12-27 #
Added #
-
Core tensor operations
TensorBufferwith shape, strides, and view/storage separationTensorStoragefor immutable typed data wrapperDTypeenum with ONNX-compatible data types
-
Transform operations
ResizeOpwith nearest, bilinear, bicubic interpolationResizeShortestOpfor aspect-ratio preserving resizeCenterCropOpfor center croppingNormalizeOpwith ImageNet, CIFAR-10, symmetric presetsScaleOpfor value scalingPermuteOpfor axis reorderingToTensorOpfor HWC uint8 to CHW float32 conversionToImageOpfor CHW float32 to HWC uint8 conversionUnsqueezeOp,SqueezeOp,ReshapeOp,FlattenOpfor shape manipulationTypeCastOpfor dtype conversion
-
Pipeline system
TensorPipelinefor chaining operationsPipelinePresetswith ImageNet, ResNet, YOLO, CLIP, ViT, MobileNet presets- Async execution via
Isolate.run
-
Zero-copy operations
transpose()via stride manipulationsqueeze(),unsqueeze()as shape-only changes