dart_tensor_preprocessing

Pub Likes Pub Monthly Downloads Pub Points

Tensor preprocessing library for Flutter/Dart. NumPy-like transforms pipeline for ONNX Runtime, TFLite, and other AI inference engines.

Features

PyTorch Compatible: Matches PyTorch/torchvision tensor operations
Non-blocking: Isolate-based async execution prevents UI jank
Type-safe: ONNX-compatible tensor types (Float32, Int64, Uint8, etc.)
Zero-copy: View/stride manipulation for reshape/transpose operations
Declarative: Chain operations into reusable pipelines
SIMD Accelerated: Float32/Float64 vectorized operations for 2-4x speedup
Memory Efficient: Buffer pooling, uninitialized allocation, fused operations

Installation

dependencies:
  dart_tensor_preprocessing: ^0.8.2

Quick Start

import 'package:dart_tensor_preprocessing/dart_tensor_preprocessing.dart';

// Create a tensor from image data (HWC format, Uint8)
final imageData = Uint8List.fromList([/* RGBA pixel data */]);
final tensor = TensorBuffer.fromUint8List(imageData, [height, width, channels]);

// Use a preset pipeline for ImageNet models
final pipeline = PipelinePresets.imagenetClassification();
final result = await pipeline.runAsync(tensor);

// result.shape: [1, 3, 224, 224] (NCHW, Float32, normalized)

Pipeline Presets

Preset	Output Shape	Use Case
`imagenetClassification()`	`1, 3, 224, 224`	ResNet, VGG, etc.
`objectDetection()`	`1, 3, 640, 640`	YOLO, SSD
`faceRecognition()`	`1, 3, 112, 112`	ArcFace, FaceNet
`clip()`	`1, 3, 224, 224`	CLIP models
`mobileNet()`	`1, 3, 224, 224`	MobileNet family

Custom Pipeline

final pipeline = TensorPipeline([
  ResizeOp(height: 224, width: 224),
  ToTensorOp(normalize: true),  // HWC -> CHW, scale to [0,1]
  NormalizeOp.imagenet(),       // ImageNet mean/std
  UnsqueezeOp.batch(),          // Add batch dimension
]);

// Sync execution
final result = pipeline.run(input);

// Async execution (runs in isolate)
final result = await pipeline.runAsync(input);

// Async with custom isolate threshold (default: 100,000 elements)
// Small tensors skip isolate overhead and run synchronously
final result = await pipeline.runAsync(input, isolateThreshold: 50000);

Available Operations

Resize & Crop

ResizeOp - Resize to fixed dimensions (nearest, bilinear, bicubic, area, lanczos) with ONNX-compatible coordinate transform modes
ResizeShortestOp - Resize preserving aspect ratio
CenterCropOp - Center crop to fixed dimensions
ClipOp - Element-wise value clamping (presets: unit, symmetric, uint8)
PadOp - Padding with multiple modes (constant, reflect, replicate, circular)
SliceOp - Python-like tensor slicing with negative index support

Normalization

NormalizeOp - Channel-wise normalization (presets: ImageNet, CIFAR-10, symmetric)
ScaleOp - Scale values (e.g., 0-255 to 0-1)
BatchNormOp - Batch normalization for CNN inference (PyTorch compatible)
LayerNormOp - Layer normalization for Transformer inference (presets: BERT, BERT-Large)
GroupNormOp - Group normalization for modern CNNs (PyTorch compatible)
InstanceNormOp - Instance normalization for style transfer and GANs (PyTorch compatible)
RMSNormOp - Root Mean Square normalization for LLMs (LLaMA, Gemma)

Layout

PermuteOp - Axis reordering (e.g., HWC to CHW)
ToTensorOp - HWC uint8 to CHW float32 with optional scaling
ToImageOp - CHW float32 to HWC uint8

Data Augmentation

RandomCropOp - Random cropping with deterministic seed support
GaussianBlurOp - Gaussian blur using separable convolution
RandomHorizontalFlipOp / RandomVerticalFlipOp - Probabilistic flip augmentation
HorizontalFlipOp / VerticalFlipOp - Deterministic flip operations
RandomErasingOp - Random erasing (cutout) augmentation
ColorJitterOp - Random brightness, contrast, saturation, and hue jitter
AdjustBrightnessOp / AdjustContrastOp / AdjustSaturationOp / AdjustHueOp - Individual color adjustments

Fused Operations

ResizeNormalizeFusedOp - Combines resize + normalize in single pass (eliminates intermediate tensor)

Activation Functions

ReLUOp - Rectified Linear Unit (SIMD accelerated)
LeakyReLUOp - Leaky ReLU with configurable slope (SIMD accelerated)
GELUOp - Gaussian Error Linear Unit (Transformers: BERT, GPT, ViT)
SiLUOp / SwishOp - Sigmoid Linear Unit (EfficientNet, YOLOv5)
HardsigmoidOp - Hardware-efficient sigmoid (MobileNetV3)
HardswishOp - Hardware-efficient swish (MobileNetV3)
MishOp - Self-regularizing activation (YOLOv4+)
ELUOp - Exponential Linear Unit
SELUOp - Scaled Exponential Linear Unit
GLUOp - Gated Linear Unit
SigmoidOp - Sigmoid activation
TanhOp - Hyperbolic tangent activation
SoftmaxOp - Softmax along specified axis

Math Operations

AbsOp - Absolute value (SIMD accelerated)
NegOp - Negation (SIMD accelerated)
SqrtOp - Square root (SIMD accelerated)
ExpOp - Exponential (e^x)
LogOp - Natural logarithm
PowOp - Power operation
FloorOp / CeilOp / RoundOp - Element-wise rounding operations
SinOp / CosOp / TanOp - Trigonometric functions
AsinOp / AcosOp / AtanOp / Atan2Op - Inverse trigonometric functions

Arithmetic Operations

AddOp / SubOp - Element-wise addition/subtraction (SIMD accelerated)
MulOp / DivOp - Element-wise multiplication/division (SIMD accelerated)

Normalization (continued)

LpNormalizeOp - Lp normalization (L1, L2, Linf) along a dimension

Tensor Manipulation

tensorWhere() / WhereOp - Element-wise conditional selection
MaskedFillOp - Fill tensor positions where mask is true
GatherOp - Gather elements along a dimension by index
TileOp - Tile/repeat tensor contents
RepeatOp - Repeat tensor (PyTorch .repeat() semantics)
RollOp - Circular shift along dimensions
PositionalEncodingOp - Transformer positional encoding

Utility

concat() - Concatenates tensors along specified axis
stack() - Stacks tensors along a new dimension
split() / chunk() - Split tensor into parts along a dimension

Shape

UnsqueezeOp - Add dimension
SqueezeOp - Remove size-1 dimensions
ReshapeOp - Reshape tensor (supports -1 for inference)
FlattenOp - Flatten dimensions

Type

TypeCastOp - Convert between data types

Core Classes

TensorBuffer

Tensor with shape and stride metadata over physical storage.

// Create tensors
final zeros = TensorBuffer.zeros([3, 224, 224]);
final ones = TensorBuffer.ones([3, 224, 224], dtype: DType.float32);
final fromData = TensorBuffer.fromFloat32List(data, [3, 224, 224]);

// Access elements
final value = tensor[[0, 100, 100]];

// Zero-copy operations
final transposed = tensor.transpose([2, 0, 1]);  // Changes strides only
final squeezed = tensor.squeeze();

// Copy operations
final contiguous = tensor.contiguous();  // Force contiguous memory
final cloned = tensor.clone();

DType

ONNX-compatible data types with onnxId for runtime integration.

DType.float32  // ONNX ID: 1
DType.int64    // ONNX ID: 7
DType.uint8    // ONNX ID: 2

BufferPool

Memory pooling for buffer reuse, reducing GC pressure in hot paths.

final pool = BufferPool.instance;

// Acquire buffer (reuses from pool if available)
final buffer = pool.acquireFloat32(1000);

// ... use buffer ...

// Release back to pool for reuse
pool.release(buffer);

// Monitor pool usage
print('Pooled: ${pool.pooledCount} buffers, ${pool.pooledBytes} bytes');

Zero-Copy View Operations

TensorBuffer extension methods for zero-copy tensor manipulation:

// Slice along first dimension (batch slicing)
final batch = tensor.sliceFirst(2, 5);  // Views elements 2..4

// Split tensor into views
final items = tensor.unbind(0);  // List of views along dim 0

// Select single index (reduces rank)
final first = tensor.select(0, 0);  // First item, shape reduced

// Narrow dimension
final narrowed = tensor.narrow(0, 1, 3);  // 3 elements starting at 1

// Format conversion without copying
final nhwc = nchwTensor.toChannelsLast();   // NCHW -> NHWC view
final nchw = nhwcTensor.toChannelsFirst();  // NHWC -> NCHW view

// Flatten to 1D view
final flat = tensor.flatten();

In-Place Operations

Many operations support in-place modification to avoid allocation overhead:

// In-place operations (modify tensor directly)
ReLUOp().applyInPlace(tensor);
NormalizeOp.imagenet().applyInPlace(tensor);
ClipOp(min: 0, max: 1).applyInPlace(tensor);
BatchNormOp(...).applyInPlace(tensor);

// Query operation capabilities
final op = ReLUOp();
print(op.capabilities.supportsInPlace);    // true
print(op.capabilities.requiresContiguous); // true
print(op.capabilities.preservesShape);     // true

Operations supporting in-place: ReLUOp, LeakyReLUOp, SigmoidOp, TanhOp, AbsOp, NegOp, SqrtOp, ExpOp, LogOp, PowOp, AddOp, SubOp, MulOp, DivOp, ClipOp, NormalizeOp, ScaleOp, BatchNormOp, LayerNormOp, GroupNormOp, InstanceNormOp, RMSNormOp, SELUOp, LpNormalizeOp, MaskedFillOp, RandomErasingOp.

Memory Formats

Format	Layout	Strides (for `1,3,224,224`)
`contiguous`	NCHW	`150528, 50176, 224, 1`
`channelsLast`	NHWC	`150528, 1, 672, 3`

PyTorch Compatibility

This library is designed to produce identical results to PyTorch/torchvision operations:

Operation	PyTorch Equivalent
`TensorBuffer.zeros()`	`torch.zeros()`
`TensorBuffer.ones()`	`torch.ones()`
`tensor.transpose()`	`tensor.permute()`
`tensor.reshape()`	`tensor.reshape()`
`tensor.squeeze()`	`tensor.squeeze()`
`tensor.unsqueeze()`	`tensor.unsqueeze()`
`tensor.sum()` / `sumAxis()`	`tensor.sum()`
`tensor.sumAxes([...])`	`tensor.sum(dim=[...])`
`tensor.mean()` / `meanAxis()`	`tensor.mean()`
`tensor.meanAxes([...])`	`tensor.mean(dim=[...])`
`tensor.min()` / `max()`	`tensor.min()` / `max()`
`tensor.minAxes([...])`	`tensor.amin(dim=[...])`
`tensor.maxAxes([...])`	`tensor.amax(dim=[...])`
`NormalizeOp.imagenet()`	`transforms.Normalize(mean, std)`
`ResizeOp(mode: bilinear)`	`F.interpolate(mode='bilinear')`
`ResizeOp(mode: area)`	`F.interpolate(mode='area')`
`ResizeOp(mode: lanczos)`	Lanczos3 interpolation
`ResizeOp(coordinateMode: halfPixel)`	ONNX Resize `half_pixel`
`ResizeOp(coordinateMode: asymmetric)`	ONNX Resize `asymmetric` (TF default)
`ResizeOp(coordinateMode: pytorchHalfPixel)`	ONNX Resize `pytorch_half_pixel`
`ToTensorOp()`	`transforms.ToTensor()`
`ClipOp(min, max)`	`torch.clamp(min, max)`
`PadOp(mode: reflect)`	`F.pad(mode='reflect')`
`SliceOp([(start, end, step)])`	`tensor[start:end:step]`
`concat(tensors, axis)`	`torch.cat(tensors, dim)`
`stack(tensors, dim)`	`torch.stack(tensors, dim)`
`RandomCropOp`	`transforms.RandomCrop()`
`GaussianBlurOp`	`transforms.GaussianBlur()`
`AddOp` / `SubOp`	`torch.add()` / `torch.sub()`
`MulOp` / `DivOp`	`torch.mul()` / `torch.div()`
`PowOp`	`torch.pow()`
`AbsOp` / `NegOp`	`torch.abs()` / `torch.neg()`
`SqrtOp` / `ExpOp` / `LogOp`	`torch.sqrt()` / `exp()` / `log()`
`FloorOp` / `CeilOp` / `RoundOp`	`torch.floor()` / `ceil()` / `round()`
`SinOp` / `CosOp` / `TanOp`	`torch.sin()` / `cos()` / `tan()`
`AsinOp` / `AcosOp` / `AtanOp`	`torch.asin()` / `acos()` / `atan()`
`Atan2Op`	`torch.atan2()`
`ReLUOp` / `LeakyReLUOp`	`F.relu()` / `F.leaky_relu()`
`GELUOp`	`F.gelu()`
`SiLUOp` / `SwishOp`	`F.silu()`
`HardsigmoidOp`	`F.hardsigmoid()`
`HardswishOp`	`F.hardswish()`
`MishOp`	`F.mish()`
`ELUOp`	`F.elu()`
`SigmoidOp` / `TanhOp`	`torch.sigmoid()` / `torch.tanh()`
`SoftmaxOp`	`F.softmax()`
`SELUOp`	`F.selu()`
`GLUOp`	`F.glu()`
`BatchNormOp`	`torch.nn.BatchNorm2d` (inference)
`LayerNormOp`	`torch.nn.LayerNorm`
`GroupNormOp`	`torch.nn.GroupNorm`
`InstanceNormOp`	`torch.nn.InstanceNorm2d`
`RMSNormOp`	`torch.nn.RMSNorm` (PyTorch 2.4+)
`TensorBuffer.full()`	`torch.full()`
`TensorBuffer.random()`	`torch.rand()`
`TensorBuffer.randn()`	`torch.randn()`
`TensorBuffer.eye()`	`torch.eye()`
`TensorBuffer.linspace()`	`torch.linspace()`
`TensorBuffer.arange()`	`torch.arange()`
`tensor.select(dim, index)`	`tensor.select(dim, index)`
`tensor.narrow(dim, start, len)`	`tensor.narrow(dim, start, len)`
`tensor.unbind(dim)`	`tensor.unbind(dim)`
`tensor.flatten()`	`tensor.flatten()`
`LpNormalizeOp`	`F.normalize()`
`tensorWhere()` / `WhereOp`	`torch.where()`
`MaskedFillOp`	`Tensor.masked_fill_()`
`GatherOp`	`torch.gather()`
`split()` / `chunk()`	`torch.split()` / `torch.chunk()`
`TileOp`	`Tensor.repeat()` / ONNX `Tile`
`RepeatOp`	`Tensor.repeat()`
`RollOp`	`torch.roll()`
`RandomHorizontalFlipOp`	`transforms.RandomHorizontalFlip()`
`RandomVerticalFlipOp`	`transforms.RandomVerticalFlip()`
`RandomErasingOp`	`transforms.RandomErasing()`
`ColorJitterOp`	`transforms.ColorJitter()`
`PositionalEncodingOp`	Transformer positional encoding
`ResizeNormalizeFusedOp`	`F.interpolate()` + `transforms.Normalize()` (fused)

Performance Benchmarks

Run benchmarks with dart run benchmark/run_all.dart.

SIMD Acceleration

Operations with Float32x4/Float64x2 SIMD vectorization:

Operation	SIMD Throughput	Speedup
`ClipOp`	~6.2 GE/s (Float32)	~4x
`AbsOp`	~6.2 GE/s (Float32)	~4x
`SqrtOp`	~6.2 GE/s (Float32)	~4x
`NormalizeOp`	~6.2 GE/s (Float32)	~4x
`ReLUOp` / `LeakyReLUOp`	~6.2 GE/s (Float32)	~4x
`ScaleOp`	~6.2 GE/s (Float32)	~4x
`AddOp` / `SubOp` / `MulOp` / `DivOp`	~6.2 GE/s (Float32)	~4x

GE/s = Giga Elements per second. Float64 SIMD achieves ~53% of Float32 performance due to Float64x2 vs Float32x4.

Operation Complexity

Operation	Time Complexity	Space Complexity
`ResizeOp` (bilinear)	O(C × H × W)	O(C × H × W)
`ResizeOp` (bicubic)	O(C × H × W × 16)	O(C × H × W)
`ResizeOp` (lanczos)	O(C × H × W × 36)	O(C × H × W)
`NormalizeOp`	O(n)	O(n) or O(1) in-place
`BatchNormOp`	O(n)	O(n) or O(1) in-place
`LayerNormOp`	O(n)	O(n) or O(1) in-place
`GaussianBlurOp`	O(C × H × W × k)	O(C × H × W)
`ResizeNormalizeFusedOp`	O(C × H × W)	O(C × H × W)

Zero-Copy Operations (O(1))

Operation	Time	Ops/sec
`transpose()`	~1µs	700K+
`reshape()`	~1µs	1.6M+
`squeeze()`	<1µs	3.2M+
`unsqueeze()`	~1µs	780K+

Pipeline Performance

Pipeline	Input Shape	Time
Simple (Normalize + Unsqueeze)	`3, 224, 224`	~3.4ms
ImageNet Classification	`3, 224, 224`	~3.0ms
Object Detection	`3, 640, 640`	~25ms

Sync vs Async

Execution	224x224	640x640
`run()` (sync)	~3.5ms	~29ms
`runAsync()` (isolate)	~11ms	~93ms
Isolate overhead	~7ms	~64ms

Note: Use runAsync() for large tensors or when UI responsiveness is critical.

Requirements

Dart SDK ^3.0.0

License

MIT