dart_tensor_preprocessing

Tensor preprocessing library for Flutter/Dart. NumPy-like transforms pipeline for ONNX Runtime, TFLite, and other AI inference engines.

Features

PyTorch Compatible: Matches PyTorch/torchvision tensor operations
Non-blocking: Isolate-based async execution prevents UI jank
Type-safe: ONNX-compatible tensor types (Float32, Int64, Uint8, etc.)
Zero-copy: View/stride manipulation for reshape/transpose operations
Declarative: Chain operations into reusable pipelines

Installation

dependencies:
  dart_tensor_preprocessing: ^0.5.0

Quick Start

import 'package:dart_tensor_preprocessing/dart_tensor_preprocessing.dart';

// Create a tensor from image data (HWC format, Uint8)
final imageData = Uint8List.fromList([/* RGBA pixel data */]);
final tensor = TensorBuffer.fromUint8List(imageData, [height, width, channels]);

// Use a preset pipeline for ImageNet models
final pipeline = PipelinePresets.imagenetClassification();
final result = await pipeline.runAsync(tensor);

// result.shape: [1, 3, 224, 224] (NCHW, Float32, normalized)

Pipeline Presets

Preset	Output Shape	Use Case
`imagenetClassification()`	`1, 3, 224, 224`	ResNet, VGG, etc.
`objectDetection()`	`1, 3, 640, 640`	YOLO, SSD
`faceRecognition()`	`1, 3, 112, 112`	ArcFace, FaceNet
`clip()`	`1, 3, 224, 224`	CLIP models
`mobileNet()`	`1, 3, 224, 224`	MobileNet family

Custom Pipeline

final pipeline = TensorPipeline([
  ResizeOp(height: 224, width: 224),
  ToTensorOp(normalize: true),  // HWC -> CHW, scale to [0,1]
  NormalizeOp.imagenet(),       // ImageNet mean/std
  UnsqueezeOp.batch(),          // Add batch dimension
]);

// Sync execution
final result = pipeline.run(input);

// Async execution (runs in isolate)
final result = await pipeline.runAsync(input);

// Async with custom isolate threshold (default: 100,000 elements)
// Small tensors skip isolate overhead and run synchronously
final result = await pipeline.runAsync(input, isolateThreshold: 50000);

Available Operations

Resize & Crop

ResizeOp - Resize to fixed dimensions (nearest, bilinear, bicubic)
ResizeShortestOp - Resize preserving aspect ratio
CenterCropOp - Center crop to fixed dimensions
ClipOp - Element-wise value clamping (presets: unit, symmetric, uint8)
PadOp - Padding with multiple modes (constant, reflect, replicate, circular)
SliceOp - Python-like tensor slicing with negative index support

Normalization

NormalizeOp - Channel-wise normalization (presets: ImageNet, CIFAR-10, symmetric)
ScaleOp - Scale values (e.g., 0-255 to 0-1)
BatchNormOp - Batch normalization for CNN inference (PyTorch compatible)
LayerNormOp - Layer normalization for Transformer inference (presets: BERT, BERT-Large)

Layout

PermuteOp - Axis reordering (e.g., HWC to CHW)
ToTensorOp - HWC uint8 to CHW float32 with optional scaling
ToImageOp - CHW float32 to HWC uint8

Data Augmentation

RandomCropOp - Random cropping with deterministic seed support
GaussianBlurOp - Gaussian blur using separable convolution

Utility

concat() - Concatenates tensors along specified axis

Shape

UnsqueezeOp - Add dimension
SqueezeOp - Remove size-1 dimensions
ReshapeOp - Reshape tensor (supports -1 for inference)
FlattenOp - Flatten dimensions

Type

TypeCastOp - Convert between data types

Core Classes

TensorBuffer

Tensor with shape and stride metadata over physical storage.

// Create tensors
final zeros = TensorBuffer.zeros([3, 224, 224]);
final ones = TensorBuffer.ones([3, 224, 224], dtype: DType.float32);
final fromData = TensorBuffer.fromFloat32List(data, [3, 224, 224]);

// Access elements
final value = tensor[[0, 100, 100]];

// Zero-copy operations
final transposed = tensor.transpose([2, 0, 1]);  // Changes strides only
final squeezed = tensor.squeeze();

// Copy operations
final contiguous = tensor.contiguous();  // Force contiguous memory
final cloned = tensor.clone();

DType

ONNX-compatible data types with onnxId for runtime integration.

DType.float32  // ONNX ID: 1
DType.int64    // ONNX ID: 7
DType.uint8    // ONNX ID: 2

Memory Formats

Format	Layout	Strides (for `1,3,224,224`)
`contiguous`	NCHW	`150528, 50176, 224, 1`
`channelsLast`	NHWC	`150528, 1, 672, 3`

PyTorch Compatibility

This library is designed to produce identical results to PyTorch/torchvision operations:

Operation	PyTorch Equivalent
`TensorBuffer.zeros()`	`torch.zeros()`
`TensorBuffer.ones()`	`torch.ones()`
`tensor.transpose()`	`tensor.permute()`
`tensor.reshape()`	`tensor.reshape()`
`tensor.squeeze()`	`tensor.squeeze()`
`tensor.unsqueeze()`	`tensor.unsqueeze()`
`tensor.sum()` / `sumAxis()`	`tensor.sum()`
`tensor.mean()` / `meanAxis()`	`tensor.mean()`
`tensor.min()` / `max()`	`tensor.min()` / `max()`
`NormalizeOp.imagenet()`	`transforms.Normalize(mean, std)`
`ResizeOp(mode: bilinear)`	`F.interpolate(mode='bilinear')`
`ToTensorOp()`	`transforms.ToTensor()`
`ClipOp(min, max)`	`torch.clamp(min, max)`
`PadOp(mode: reflect)`	`F.pad(mode='reflect')`
`SliceOp([(start, end, step)])`	`tensor[start:end:step]`
`concat(tensors, axis)`	`torch.cat(tensors, dim)`
`RandomCropOp`	`transforms.RandomCrop()`
`GaussianBlurOp`	`transforms.GaussianBlur()`
`AddOp` / `SubOp`	`torch.add()` / `torch.sub()`
`MulOp` / `DivOp`	`torch.mul()` / `torch.div()`
`PowOp`	`torch.pow()`
`AbsOp` / `NegOp`	`torch.abs()` / `torch.neg()`
`SqrtOp` / `ExpOp` / `LogOp`	`torch.sqrt()` / `exp()` / `log()`
`ReLUOp` / `LeakyReLUOp`	`F.relu()` / `F.leaky_relu()`
`SigmoidOp` / `TanhOp`	`torch.sigmoid()` / `torch.tanh()`
`SoftmaxOp`	`F.softmax()`
`BatchNormOp`	`torch.nn.BatchNorm2d` (inference)
`LayerNormOp`	`torch.nn.LayerNorm`
`TensorBuffer.full()`	`torch.full()`
`TensorBuffer.random()`	`torch.rand()`
`TensorBuffer.randn()`	`torch.randn()`
`TensorBuffer.eye()`	`torch.eye()`
`TensorBuffer.linspace()`	`torch.linspace()`
`TensorBuffer.arange()`	`torch.arange()`

Performance Benchmarks

Run benchmarks with dart run benchmark/run_all.dart.

Zero-Copy Operations (O(1))

Operation	Time	Ops/sec
`transpose()`	~1µs	700K+
`reshape()`	~1µs	1.6M+
`squeeze()`	<1µs	3.2M+
`unsqueeze()`	~1µs	780K+

Pipeline Performance

Pipeline	Input Shape	Time
Simple (Normalize + Unsqueeze)	`3, 224, 224`	~3.4ms
ImageNet Classification	`3, 224, 224`	~3.0ms
Object Detection	`3, 640, 640`	~25ms

Sync vs Async

Execution	224x224	640x640
`run()` (sync)	~3.5ms	~29ms
`runAsync()` (isolate)	~11ms	~93ms
Isolate overhead	~7ms	~64ms

Note: Use runAsync() for large tensors or when UI responsiveness is critical.

Requirements

Dart SDK ^3.0.0

License

MIT