dart_tensor_preprocessing
Tensor preprocessing library for Flutter/Dart. NumPy-like transforms pipeline for ONNX Runtime, TFLite, and other AI inference engines.
Features
- PyTorch Compatible: Matches PyTorch/torchvision tensor operations
- Non-blocking: Isolate-based async execution prevents UI jank
- Type-safe: ONNX-compatible tensor types (Float32, Int64, Uint8, etc.)
- Zero-copy: View/stride manipulation for reshape/transpose operations
- Declarative: Chain operations into reusable pipelines
Installation
dependencies:
dart_tensor_preprocessing: ^0.5.0
Quick Start
import 'package:dart_tensor_preprocessing/dart_tensor_preprocessing.dart';
// Create a tensor from image data (HWC format, Uint8)
final imageData = Uint8List.fromList([/* RGBA pixel data */]);
final tensor = TensorBuffer.fromUint8List(imageData, [height, width, channels]);
// Use a preset pipeline for ImageNet models
final pipeline = PipelinePresets.imagenetClassification();
final result = await pipeline.runAsync(tensor);
// result.shape: [1, 3, 224, 224] (NCHW, Float32, normalized)
Pipeline Presets
| Preset | Output Shape | Use Case |
|---|---|---|
imagenetClassification() |
1, 3, 224, 224 |
ResNet, VGG, etc. |
objectDetection() |
1, 3, 640, 640 |
YOLO, SSD |
faceRecognition() |
1, 3, 112, 112 |
ArcFace, FaceNet |
clip() |
1, 3, 224, 224 |
CLIP models |
mobileNet() |
1, 3, 224, 224 |
MobileNet family |
Custom Pipeline
final pipeline = TensorPipeline([
ResizeOp(height: 224, width: 224),
ToTensorOp(normalize: true), // HWC -> CHW, scale to [0,1]
NormalizeOp.imagenet(), // ImageNet mean/std
UnsqueezeOp.batch(), // Add batch dimension
]);
// Sync execution
final result = pipeline.run(input);
// Async execution (runs in isolate)
final result = await pipeline.runAsync(input);
// Async with custom isolate threshold (default: 100,000 elements)
// Small tensors skip isolate overhead and run synchronously
final result = await pipeline.runAsync(input, isolateThreshold: 50000);
Available Operations
Resize & Crop
ResizeOp- Resize to fixed dimensions (nearest, bilinear, bicubic)ResizeShortestOp- Resize preserving aspect ratioCenterCropOp- Center crop to fixed dimensionsClipOp- Element-wise value clamping (presets: unit, symmetric, uint8)PadOp- Padding with multiple modes (constant, reflect, replicate, circular)SliceOp- Python-like tensor slicing with negative index support
Normalization
NormalizeOp- Channel-wise normalization (presets: ImageNet, CIFAR-10, symmetric)ScaleOp- Scale values (e.g.,0-255to0-1)BatchNormOp- Batch normalization for CNN inference (PyTorch compatible)LayerNormOp- Layer normalization for Transformer inference (presets: BERT, BERT-Large)
Layout
PermuteOp- Axis reordering (e.g., HWC to CHW)ToTensorOp- HWC uint8 to CHW float32 with optional scalingToImageOp- CHW float32 to HWC uint8
Data Augmentation
RandomCropOp- Random cropping with deterministic seed supportGaussianBlurOp- Gaussian blur using separable convolution
Utility
concat()- Concatenates tensors along specified axis
Shape
UnsqueezeOp- Add dimensionSqueezeOp- Remove size-1 dimensionsReshapeOp- Reshape tensor (supports -1 for inference)FlattenOp- Flatten dimensions
Type
TypeCastOp- Convert between data types
Core Classes
TensorBuffer
Tensor with shape and stride metadata over physical storage.
// Create tensors
final zeros = TensorBuffer.zeros([3, 224, 224]);
final ones = TensorBuffer.ones([3, 224, 224], dtype: DType.float32);
final fromData = TensorBuffer.fromFloat32List(data, [3, 224, 224]);
// Access elements
final value = tensor[[0, 100, 100]];
// Zero-copy operations
final transposed = tensor.transpose([2, 0, 1]); // Changes strides only
final squeezed = tensor.squeeze();
// Copy operations
final contiguous = tensor.contiguous(); // Force contiguous memory
final cloned = tensor.clone();
DType
ONNX-compatible data types with onnxId for runtime integration.
DType.float32 // ONNX ID: 1
DType.int64 // ONNX ID: 7
DType.uint8 // ONNX ID: 2
Memory Formats
| Format | Layout | Strides (for 1,3,224,224) |
|---|---|---|
contiguous |
NCHW | 150528, 50176, 224, 1 |
channelsLast |
NHWC | 150528, 1, 672, 3 |
PyTorch Compatibility
This library is designed to produce identical results to PyTorch/torchvision operations:
| Operation | PyTorch Equivalent |
|---|---|
TensorBuffer.zeros() |
torch.zeros() |
TensorBuffer.ones() |
torch.ones() |
tensor.transpose() |
tensor.permute() |
tensor.reshape() |
tensor.reshape() |
tensor.squeeze() |
tensor.squeeze() |
tensor.unsqueeze() |
tensor.unsqueeze() |
tensor.sum() / sumAxis() |
tensor.sum() |
tensor.mean() / meanAxis() |
tensor.mean() |
tensor.min() / max() |
tensor.min() / max() |
NormalizeOp.imagenet() |
transforms.Normalize(mean, std) |
ResizeOp(mode: bilinear) |
F.interpolate(mode='bilinear') |
ToTensorOp() |
transforms.ToTensor() |
ClipOp(min, max) |
torch.clamp(min, max) |
PadOp(mode: reflect) |
F.pad(mode='reflect') |
SliceOp([(start, end, step)]) |
tensor[start:end:step] |
concat(tensors, axis) |
torch.cat(tensors, dim) |
RandomCropOp |
transforms.RandomCrop() |
GaussianBlurOp |
transforms.GaussianBlur() |
AddOp / SubOp |
torch.add() / torch.sub() |
MulOp / DivOp |
torch.mul() / torch.div() |
PowOp |
torch.pow() |
AbsOp / NegOp |
torch.abs() / torch.neg() |
SqrtOp / ExpOp / LogOp |
torch.sqrt() / exp() / log() |
ReLUOp / LeakyReLUOp |
F.relu() / F.leaky_relu() |
SigmoidOp / TanhOp |
torch.sigmoid() / torch.tanh() |
SoftmaxOp |
F.softmax() |
BatchNormOp |
torch.nn.BatchNorm2d (inference) |
LayerNormOp |
torch.nn.LayerNorm |
TensorBuffer.full() |
torch.full() |
TensorBuffer.random() |
torch.rand() |
TensorBuffer.randn() |
torch.randn() |
TensorBuffer.eye() |
torch.eye() |
TensorBuffer.linspace() |
torch.linspace() |
TensorBuffer.arange() |
torch.arange() |
Performance Benchmarks
Run benchmarks with dart run benchmark/run_all.dart.
Zero-Copy Operations (O(1))
| Operation | Time | Ops/sec |
|---|---|---|
transpose() |
~1µs | 700K+ |
reshape() |
~1µs | 1.6M+ |
squeeze() |
<1µs | 3.2M+ |
unsqueeze() |
~1µs | 780K+ |
Pipeline Performance
| Pipeline | Input Shape | Time |
|---|---|---|
| Simple (Normalize + Unsqueeze) | 3, 224, 224 |
~3.4ms |
| ImageNet Classification | 3, 224, 224 |
~3.0ms |
| Object Detection | 3, 640, 640 |
~25ms |
Sync vs Async
| Execution | 224x224 | 640x640 |
|---|---|---|
run() (sync) |
~3.5ms | ~29ms |
runAsync() (isolate) |
~11ms | ~93ms |
| Isolate overhead | ~7ms | ~64ms |
Note: Use
runAsync()for large tensors or when UI responsiveness is critical.
Requirements
- Dart SDK ^3.0.0
License
MIT
Libraries
- dart_tensor_preprocessing
- A high-performance tensor preprocessing library for Flutter/Dart.