dart_tensor_preprocessing
Tensor preprocessing library for Flutter/Dart. NumPy-like transforms pipeline for ONNX Runtime, TFLite, and other AI inference engines.
Features
- PyTorch Compatible: Matches PyTorch/torchvision tensor operations
- Non-blocking: Isolate-based async execution prevents UI jank
- Type-safe: ONNX-compatible tensor types (Float32, Int64, Uint8, etc.)
- Zero-copy: View/stride manipulation for reshape/transpose operations
- Declarative: Chain operations into reusable pipelines
- SIMD Accelerated: Float32/Float64 vectorized operations for 2-4x speedup
- Memory Efficient: Buffer pooling, uninitialized allocation, fused operations
Installation
dependencies:
dart_tensor_preprocessing: ^0.8.2
Quick Start
import 'package:dart_tensor_preprocessing/dart_tensor_preprocessing.dart';
// Create a tensor from image data (HWC format, Uint8)
final imageData = Uint8List.fromList([/* RGBA pixel data */]);
final tensor = TensorBuffer.fromUint8List(imageData, [height, width, channels]);
// Use a preset pipeline for ImageNet models
final pipeline = PipelinePresets.imagenetClassification();
final result = await pipeline.runAsync(tensor);
// result.shape: [1, 3, 224, 224] (NCHW, Float32, normalized)
Pipeline Presets
| Preset | Output Shape | Use Case |
|---|---|---|
imagenetClassification() |
1, 3, 224, 224 |
ResNet, VGG, etc. |
objectDetection() |
1, 3, 640, 640 |
YOLO, SSD |
faceRecognition() |
1, 3, 112, 112 |
ArcFace, FaceNet |
clip() |
1, 3, 224, 224 |
CLIP models |
mobileNet() |
1, 3, 224, 224 |
MobileNet family |
Custom Pipeline
final pipeline = TensorPipeline([
ResizeOp(height: 224, width: 224),
ToTensorOp(normalize: true), // HWC -> CHW, scale to [0,1]
NormalizeOp.imagenet(), // ImageNet mean/std
UnsqueezeOp.batch(), // Add batch dimension
]);
// Sync execution
final result = pipeline.run(input);
// Async execution (runs in isolate)
final result = await pipeline.runAsync(input);
// Async with custom isolate threshold (default: 100,000 elements)
// Small tensors skip isolate overhead and run synchronously
final result = await pipeline.runAsync(input, isolateThreshold: 50000);
Available Operations
Resize & Crop
ResizeOp- Resize to fixed dimensions (nearest, bilinear, bicubic, area, lanczos) with ONNX-compatible coordinate transform modesResizeShortestOp- Resize preserving aspect ratioCenterCropOp- Center crop to fixed dimensionsClipOp- Element-wise value clamping (presets: unit, symmetric, uint8)PadOp- Padding with multiple modes (constant, reflect, replicate, circular)SliceOp- Python-like tensor slicing with negative index support
Normalization
NormalizeOp- Channel-wise normalization (presets: ImageNet, CIFAR-10, symmetric)ScaleOp- Scale values (e.g.,0-255to0-1)BatchNormOp- Batch normalization for CNN inference (PyTorch compatible)LayerNormOp- Layer normalization for Transformer inference (presets: BERT, BERT-Large)GroupNormOp- Group normalization for modern CNNs (PyTorch compatible)InstanceNormOp- Instance normalization for style transfer and GANs (PyTorch compatible)RMSNormOp- Root Mean Square normalization for LLMs (LLaMA, Gemma)
Layout
PermuteOp- Axis reordering (e.g., HWC to CHW)ToTensorOp- HWC uint8 to CHW float32 with optional scalingToImageOp- CHW float32 to HWC uint8
Data Augmentation
RandomCropOp- Random cropping with deterministic seed supportGaussianBlurOp- Gaussian blur using separable convolutionRandomHorizontalFlipOp/RandomVerticalFlipOp- Probabilistic flip augmentationHorizontalFlipOp/VerticalFlipOp- Deterministic flip operationsRandomErasingOp- Random erasing (cutout) augmentationColorJitterOp- Random brightness, contrast, saturation, and hue jitterAdjustBrightnessOp/AdjustContrastOp/AdjustSaturationOp/AdjustHueOp- Individual color adjustments
Fused Operations
ResizeNormalizeFusedOp- Combines resize + normalize in single pass (eliminates intermediate tensor)
Activation Functions
ReLUOp- Rectified Linear Unit (SIMD accelerated)LeakyReLUOp- Leaky ReLU with configurable slope (SIMD accelerated)GELUOp- Gaussian Error Linear Unit (Transformers: BERT, GPT, ViT)SiLUOp/SwishOp- Sigmoid Linear Unit (EfficientNet, YOLOv5)HardsigmoidOp- Hardware-efficient sigmoid (MobileNetV3)HardswishOp- Hardware-efficient swish (MobileNetV3)MishOp- Self-regularizing activation (YOLOv4+)ELUOp- Exponential Linear UnitSELUOp- Scaled Exponential Linear UnitGLUOp- Gated Linear UnitSigmoidOp- Sigmoid activationTanhOp- Hyperbolic tangent activationSoftmaxOp- Softmax along specified axis
Math Operations
AbsOp- Absolute value (SIMD accelerated)NegOp- Negation (SIMD accelerated)SqrtOp- Square root (SIMD accelerated)ExpOp- Exponential (e^x)LogOp- Natural logarithmPowOp- Power operationFloorOp/CeilOp/RoundOp- Element-wise rounding operationsSinOp/CosOp/TanOp- Trigonometric functionsAsinOp/AcosOp/AtanOp/Atan2Op- Inverse trigonometric functions
Arithmetic Operations
AddOp/SubOp- Element-wise addition/subtraction (SIMD accelerated)MulOp/DivOp- Element-wise multiplication/division (SIMD accelerated)
Normalization (continued)
LpNormalizeOp- Lp normalization (L1, L2, Linf) along a dimension
Tensor Manipulation
tensorWhere()/WhereOp- Element-wise conditional selectionMaskedFillOp- Fill tensor positions where mask is trueGatherOp- Gather elements along a dimension by indexTileOp- Tile/repeat tensor contentsRepeatOp- Repeat tensor (PyTorch.repeat()semantics)RollOp- Circular shift along dimensionsPositionalEncodingOp- Transformer positional encoding
Utility
concat()- Concatenates tensors along specified axisstack()- Stacks tensors along a new dimensionsplit()/chunk()- Split tensor into parts along a dimension
Shape
UnsqueezeOp- Add dimensionSqueezeOp- Remove size-1 dimensionsReshapeOp- Reshape tensor (supports -1 for inference)FlattenOp- Flatten dimensions
Type
TypeCastOp- Convert between data types
Core Classes
TensorBuffer
Tensor with shape and stride metadata over physical storage.
// Create tensors
final zeros = TensorBuffer.zeros([3, 224, 224]);
final ones = TensorBuffer.ones([3, 224, 224], dtype: DType.float32);
final fromData = TensorBuffer.fromFloat32List(data, [3, 224, 224]);
// Access elements
final value = tensor[[0, 100, 100]];
// Zero-copy operations
final transposed = tensor.transpose([2, 0, 1]); // Changes strides only
final squeezed = tensor.squeeze();
// Copy operations
final contiguous = tensor.contiguous(); // Force contiguous memory
final cloned = tensor.clone();
DType
ONNX-compatible data types with onnxId for runtime integration.
DType.float32 // ONNX ID: 1
DType.int64 // ONNX ID: 7
DType.uint8 // ONNX ID: 2
BufferPool
Memory pooling for buffer reuse, reducing GC pressure in hot paths.
final pool = BufferPool.instance;
// Acquire buffer (reuses from pool if available)
final buffer = pool.acquireFloat32(1000);
// ... use buffer ...
// Release back to pool for reuse
pool.release(buffer);
// Monitor pool usage
print('Pooled: ${pool.pooledCount} buffers, ${pool.pooledBytes} bytes');
Zero-Copy View Operations
TensorBuffer extension methods for zero-copy tensor manipulation:
// Slice along first dimension (batch slicing)
final batch = tensor.sliceFirst(2, 5); // Views elements 2..4
// Split tensor into views
final items = tensor.unbind(0); // List of views along dim 0
// Select single index (reduces rank)
final first = tensor.select(0, 0); // First item, shape reduced
// Narrow dimension
final narrowed = tensor.narrow(0, 1, 3); // 3 elements starting at 1
// Format conversion without copying
final nhwc = nchwTensor.toChannelsLast(); // NCHW -> NHWC view
final nchw = nhwcTensor.toChannelsFirst(); // NHWC -> NCHW view
// Flatten to 1D view
final flat = tensor.flatten();
In-Place Operations
Many operations support in-place modification to avoid allocation overhead:
// In-place operations (modify tensor directly)
ReLUOp().applyInPlace(tensor);
NormalizeOp.imagenet().applyInPlace(tensor);
ClipOp(min: 0, max: 1).applyInPlace(tensor);
BatchNormOp(...).applyInPlace(tensor);
// Query operation capabilities
final op = ReLUOp();
print(op.capabilities.supportsInPlace); // true
print(op.capabilities.requiresContiguous); // true
print(op.capabilities.preservesShape); // true
Operations supporting in-place: ReLUOp, LeakyReLUOp, SigmoidOp, TanhOp, AbsOp, NegOp, SqrtOp, ExpOp, LogOp, PowOp, AddOp, SubOp, MulOp, DivOp, ClipOp, NormalizeOp, ScaleOp, BatchNormOp, LayerNormOp, GroupNormOp, InstanceNormOp, RMSNormOp, SELUOp, LpNormalizeOp, MaskedFillOp, RandomErasingOp.
Memory Formats
| Format | Layout | Strides (for 1,3,224,224) |
|---|---|---|
contiguous |
NCHW | 150528, 50176, 224, 1 |
channelsLast |
NHWC | 150528, 1, 672, 3 |
PyTorch Compatibility
This library is designed to produce identical results to PyTorch/torchvision operations:
| Operation | PyTorch Equivalent |
|---|---|
TensorBuffer.zeros() |
torch.zeros() |
TensorBuffer.ones() |
torch.ones() |
tensor.transpose() |
tensor.permute() |
tensor.reshape() |
tensor.reshape() |
tensor.squeeze() |
tensor.squeeze() |
tensor.unsqueeze() |
tensor.unsqueeze() |
tensor.sum() / sumAxis() |
tensor.sum() |
tensor.sumAxes([...]) |
tensor.sum(dim=[...]) |
tensor.mean() / meanAxis() |
tensor.mean() |
tensor.meanAxes([...]) |
tensor.mean(dim=[...]) |
tensor.min() / max() |
tensor.min() / max() |
tensor.minAxes([...]) |
tensor.amin(dim=[...]) |
tensor.maxAxes([...]) |
tensor.amax(dim=[...]) |
NormalizeOp.imagenet() |
transforms.Normalize(mean, std) |
ResizeOp(mode: bilinear) |
F.interpolate(mode='bilinear') |
ResizeOp(mode: area) |
F.interpolate(mode='area') |
ResizeOp(mode: lanczos) |
Lanczos3 interpolation |
ResizeOp(coordinateMode: halfPixel) |
ONNX Resize half_pixel |
ResizeOp(coordinateMode: asymmetric) |
ONNX Resize asymmetric (TF default) |
ResizeOp(coordinateMode: pytorchHalfPixel) |
ONNX Resize pytorch_half_pixel |
ToTensorOp() |
transforms.ToTensor() |
ClipOp(min, max) |
torch.clamp(min, max) |
PadOp(mode: reflect) |
F.pad(mode='reflect') |
SliceOp([(start, end, step)]) |
tensor[start:end:step] |
concat(tensors, axis) |
torch.cat(tensors, dim) |
stack(tensors, dim) |
torch.stack(tensors, dim) |
RandomCropOp |
transforms.RandomCrop() |
GaussianBlurOp |
transforms.GaussianBlur() |
AddOp / SubOp |
torch.add() / torch.sub() |
MulOp / DivOp |
torch.mul() / torch.div() |
PowOp |
torch.pow() |
AbsOp / NegOp |
torch.abs() / torch.neg() |
SqrtOp / ExpOp / LogOp |
torch.sqrt() / exp() / log() |
FloorOp / CeilOp / RoundOp |
torch.floor() / ceil() / round() |
SinOp / CosOp / TanOp |
torch.sin() / cos() / tan() |
AsinOp / AcosOp / AtanOp |
torch.asin() / acos() / atan() |
Atan2Op |
torch.atan2() |
ReLUOp / LeakyReLUOp |
F.relu() / F.leaky_relu() |
GELUOp |
F.gelu() |
SiLUOp / SwishOp |
F.silu() |
HardsigmoidOp |
F.hardsigmoid() |
HardswishOp |
F.hardswish() |
MishOp |
F.mish() |
ELUOp |
F.elu() |
SigmoidOp / TanhOp |
torch.sigmoid() / torch.tanh() |
SoftmaxOp |
F.softmax() |
SELUOp |
F.selu() |
GLUOp |
F.glu() |
BatchNormOp |
torch.nn.BatchNorm2d (inference) |
LayerNormOp |
torch.nn.LayerNorm |
GroupNormOp |
torch.nn.GroupNorm |
InstanceNormOp |
torch.nn.InstanceNorm2d |
RMSNormOp |
torch.nn.RMSNorm (PyTorch 2.4+) |
TensorBuffer.full() |
torch.full() |
TensorBuffer.random() |
torch.rand() |
TensorBuffer.randn() |
torch.randn() |
TensorBuffer.eye() |
torch.eye() |
TensorBuffer.linspace() |
torch.linspace() |
TensorBuffer.arange() |
torch.arange() |
tensor.select(dim, index) |
tensor.select(dim, index) |
tensor.narrow(dim, start, len) |
tensor.narrow(dim, start, len) |
tensor.unbind(dim) |
tensor.unbind(dim) |
tensor.flatten() |
tensor.flatten() |
LpNormalizeOp |
F.normalize() |
tensorWhere() / WhereOp |
torch.where() |
MaskedFillOp |
Tensor.masked_fill_() |
GatherOp |
torch.gather() |
split() / chunk() |
torch.split() / torch.chunk() |
TileOp |
Tensor.repeat() / ONNX Tile |
RepeatOp |
Tensor.repeat() |
RollOp |
torch.roll() |
RandomHorizontalFlipOp |
transforms.RandomHorizontalFlip() |
RandomVerticalFlipOp |
transforms.RandomVerticalFlip() |
RandomErasingOp |
transforms.RandomErasing() |
ColorJitterOp |
transforms.ColorJitter() |
PositionalEncodingOp |
Transformer positional encoding |
ResizeNormalizeFusedOp |
F.interpolate() + transforms.Normalize() (fused) |
Performance Benchmarks
Run benchmarks with dart run benchmark/run_all.dart.
SIMD Acceleration
Operations with Float32x4/Float64x2 SIMD vectorization:
| Operation | SIMD Throughput | Speedup |
|---|---|---|
ClipOp |
~6.2 GE/s (Float32) | ~4x |
AbsOp |
~6.2 GE/s (Float32) | ~4x |
SqrtOp |
~6.2 GE/s (Float32) | ~4x |
NormalizeOp |
~6.2 GE/s (Float32) | ~4x |
ReLUOp / LeakyReLUOp |
~6.2 GE/s (Float32) | ~4x |
ScaleOp |
~6.2 GE/s (Float32) | ~4x |
AddOp / SubOp / MulOp / DivOp |
~6.2 GE/s (Float32) | ~4x |
GE/s = Giga Elements per second. Float64 SIMD achieves ~53% of Float32 performance due to Float64x2 vs Float32x4.
Operation Complexity
| Operation | Time Complexity | Space Complexity |
|---|---|---|
ResizeOp (bilinear) |
O(C × H × W) | O(C × H × W) |
ResizeOp (bicubic) |
O(C × H × W × 16) | O(C × H × W) |
ResizeOp (lanczos) |
O(C × H × W × 36) | O(C × H × W) |
NormalizeOp |
O(n) | O(n) or O(1) in-place |
BatchNormOp |
O(n) | O(n) or O(1) in-place |
LayerNormOp |
O(n) | O(n) or O(1) in-place |
GaussianBlurOp |
O(C × H × W × k) | O(C × H × W) |
ResizeNormalizeFusedOp |
O(C × H × W) | O(C × H × W) |
Zero-Copy Operations (O(1))
| Operation | Time | Ops/sec |
|---|---|---|
transpose() |
~1µs | 700K+ |
reshape() |
~1µs | 1.6M+ |
squeeze() |
<1µs | 3.2M+ |
unsqueeze() |
~1µs | 780K+ |
Pipeline Performance
| Pipeline | Input Shape | Time |
|---|---|---|
| Simple (Normalize + Unsqueeze) | 3, 224, 224 |
~3.4ms |
| ImageNet Classification | 3, 224, 224 |
~3.0ms |
| Object Detection | 3, 640, 640 |
~25ms |
Sync vs Async
| Execution | 224x224 | 640x640 |
|---|---|---|
run() (sync) |
~3.5ms | ~29ms |
runAsync() (isolate) |
~11ms | ~93ms |
| Isolate overhead | ~7ms | ~64ms |
Note: Use
runAsync()for large tensors or when UI responsiveness is critical.
Requirements
- Dart SDK ^3.0.0
License
MIT
Libraries
- dart_tensor_preprocessing
- A high-performance tensor preprocessing library for Flutter/Dart.