flutter_ort_plugin 0.0.1
flutter_ort_plugin: ^0.0.1 copied to clipboard
High-performance ONNX Runtime plugin with WebGPU support for Flutter. Run ML models on Android, iOS, and Linux with optimized execution providers.
flutter_ort_plugin #
Flutter plugin for ONNX Runtime inference via Dart FFI. Load .onnx models and run them natively on Android, iOS, and Linux.
ONNX Runtime version: 1.24.1
Platform Support #
| Platform | Minimum Version | Execution Providers | Status |
|---|---|---|---|
| Android | API 24 (Android 7.0) | WebGPU, NNAPI, XNNPACK, CPU | ✅ Full support |
| iOS | iOS 15.1 | CoreML, CPU | ✅ Full support |
| Linux | Any | CPU only | ✅ CPU support |
Installation #
dependencies:
flutter_ort_plugin:
git:
url: https://github.com/adrinator/flutter_ort_plugin.git
Platform setup #
iOS #
- Minimum version: iOS 15.1
- Dependency: CocoaPods
onnxruntime-c(linked automatically by the plugin)
Recommended steps:
cd ios
pod install
If you run into linking/symbol issues when using DynamicLibrary.process() on iOS, ensure the plugin is properly registered in your app (Flutter plugin registrant) and do a clean build.
Android #
- Minimum SDK: Android 24 (
minSdk 24) - Runtime:
onnxruntime-android1.24.1 (pulled via Gradle by the plugin) - NDK: required to build the plugin FFI target (see the NDK version configured in the plugin)
ONNX Runtime Build Strategies
The plugin supports two different ONNX Runtime builds for Android:
| Strategy | Description | Size | When to Use |
|---|---|---|---|
| Standard | Basic CPU execution only | Smaller | Simple models, CPU-only inference |
| Providers | Full provider support (WebGPU, NNAPI, XNNPACK) | Larger | Performance-critical apps with GPU/NPU |
Building with Custom Strategy
By default, the plugin uses the standard build. To use the providers build with WebGPU/NNAPI/XNNPACK support:
# Build with providers (includes WebGPU, NNAPI, XNNPACK)
flutter build apk --android-project-arg=ORT_STRATEGY=providers
# Or for App Bundle
flutter build appbundle --android-project-arg=ORT_STRATEGY=providers
# For debug builds
flutter build apk --debug --android-project-arg=ORT_STRATEGY=providers
Provider Requirements
- WebGPU: Requires Android device with GPU support and Vulkan drivers
- NNAPI: Requires Android API 27+ for best compatibility
- XNNPACK: Works on all ARM devices (NEON SIMD)
The providers build is larger but enables hardware acceleration. Use the standard build for smaller app size or if you only need CPU inference.
Performance Considerations
CPU vs Providers: Many models actually perform better with CPU inference than with hardware providers, especially:
- Small to medium-sized models (<50MB)
- Models with many small operations
- Models not optimized for mobile GPUs/NPUs
- First-generation inference (warm-up overhead on providers)
Recommendation: Always test both strategies with your specific model:
# Test standard CPU build
flutter build apk --debug
# Run benchmarks with your model
# Test providers build
flutter build apk --debug --android-project-arg=ORT_STRATEGY=providers
# Run benchmarks with your model
# Compare inference latency and accuracy
The providers build shines with large models (>100MB) and operations well-suited for parallel GPU execution, but don't assume it's always faster.
Quick Start #
import 'dart:typed_data';
import 'package:flutter_ort_plugin/flutter_ort_plugin.dart';
// 1. Initialize runtime (once)
final runtime = OnnxRuntime.instance;
runtime.initialize();
runtime.createEnvironment();
// 2. Load model (auto-selects best provider for the platform)
final session = OrtSessionWrapper.create('path/to/model.onnx');
// 3. Create input tensor
final input = OrtValueWrapper.fromFloat(
runtime,
[1, 3, 224, 224], // shape
Float32List(1 * 3 * 224 * 224), // data
);
// 4. Run inference -> pure Dart output
final results = session.runFloat(
{session.inputNames.first: input},
[1000], // output element count
);
final predictions = results.first; // Float32List
// 5. Cleanup
input.release();
session.dispose();
runtime.dispose();
Avoid UI freezes (Isolate) #
FFI calls are synchronous. Heavy model loading or inference can block the Flutter UI thread.
Use OrtIsolateSession to run everything in a background isolate:
final runtime = OnnxRuntime.instance;
runtime.initialize();
runtime.createEnvironment();
final session = await OrtIsolateSession.create(
OrtIsolateSessionConfig(modelPath: 'path/to/model.onnx'),
);
final input = OrtIsolateInput(
shape: [1, 1, 28, 28],
data: Float32List(28 * 28),
);
final outputs = await session.runFloat(
{session.inputNames.first: input},
[10],
);
await session.dispose();
Execution Providers #
The plugin auto-detects the best provider per platform:
| Platform | Default providers | Supported | Notes |
|---|---|---|---|
| iOS | CoreML, CPU | ✅ Fully | CoreML via dedicated config |
| Android | WebGPU, NNAPI, XNNPACK, CPU | ✅ Fully | WebGPU via Dawn, NNAPI flags, XNNPACK threads |
| Linux | CPU | ✅ CPU only | CPU execution provider |
Note: Android providers (WebGPU, NNAPI, XNNPACK) require building with
--android-project-arg=ORT_STRATEGY=providers. See Android setup section for details.
Provider Implementation Status #
| Provider | Status | Platform | Notes |
|---|---|---|---|
| CPU | ✅ Ready | All | Always available, built-in |
| CoreML | ✅ Ready | iOS | Apple Neural Engine/GPU acceleration |
| WebGPU | ✅ Ready | Android | GPU acceleration via Dawn/WebGPU support |
| NNAPI | ✅ Ready | Android | NPU/GPU with FP16/NCHW/CPU-disabled flags |
| XNNPACK | ✅ Ready | Android | CPU SIMD optimization with thread config |
| QNN | ⚠️ Generic only | Android | Qualcomm NPU via generic API |
Automatic (default) #
// Providers are selected automatically
final session = OrtSessionWrapper.create('model.onnx');
Manual #
final session = OrtSessionWrapper.createWithProviders(
'model.onnx',
providers: [OrtProvider.coreML, OrtProvider.cpu],
providerOptions: {
OrtProvider.coreML: {'MLComputeUnits': 'ALL'},
},
);
XNNPACK (Android optimized CPU) #
XNNPACK is the recommended provider for Android devices without a dedicated NPU:
import 'package:flutter_ort_plugin/flutter_ort_plugin.dart';
final session = OrtSessionWrapper.createWithProviders(
'model.onnx',
providers: [OrtProvider.xnnpack, OrtProvider.cpu],
providerOptions: {
OrtProvider.xnnpack: XnnpackOptions(
numThreads: 4, // Use 4 threads (default: all cores)
).toMap(),
},
);
NNAPI (Android NPU/GPU) #
NNAPI supports hardware acceleration but may have compatibility issues with some models:
final session = OrtSessionWrapper.createWithProviders(
'model.onnx',
providers: [OrtProvider.nnapi, OrtProvider.cpu],
providerOptions: {
OrtProvider.nnapi: {
'use_fp16': 'true', // Use FP16 for faster inference
'use_nchw': 'false', // Keep NHWC format
},
},
);
WebGPU (Android GPU acceleration) #
WebGPU provides hardware-accelerated inference on Android devices with GPU support:
final session = OrtSessionWrapper.createWithProviders(
'model.onnx',
providers: [OrtProvider.webGpu, OrtProvider.cpu],
providerOptions: {
// WebGPU options can be added here if needed
OrtProvider.webGpu: {},
},
);
Querying available providers #
final providers = OrtProviders(OnnxRuntime.instance);
providers.getAvailableProviders();
// ['WebGpuExecutionProvider', 'NnapiExecutionProvider', 'CPUExecutionProvider']
providers.isProviderAvailable(OrtProvider.webGpu); // true
Performance Tuning #
Fine-tune session options for optimal performance on your target device:
import 'package:flutter_ort_plugin/flutter_ort_plugin.dart';
final session = OrtSessionWrapper.create(
'model.onnx',
sessionConfig: SessionConfig(
intraOpThreads: 4, // Threads within ops (0 = ORT default)
interOpThreads: 1, // Threads across ops (0 = ORT default)
graphOptimizationLevel: GraphOptLevel.all, // Max graph optimizations
executionMode: ExecutionMode.sequential, // Better on mobile
),
);
Android Big.LITTLE Optimization #
For Android devices with heterogeneous cores, limit intra-op threads to avoid contention:
final session = OrtSessionWrapper.create(
'model.onnx',
sessionConfig: SessionConfig.androidOptimized, // Pre-configured for Android
);
Available Options #
| Option | Values | Description |
|---|---|---|
intraOpThreads |
0 (auto) or integer |
Parallelism within a single operation |
interOpThreads |
0 (auto) or integer |
Parallelism across independent nodes |
graphOptimizationLevel |
disabled/basic/extended/all |
Graph transformation aggressiveness |
executionMode |
sequential/parallel |
Node execution order |
API Overview #
High-Level (no FFI pointers) #
| Class | Purpose |
|---|---|
OrtSessionWrapper |
Load model, run inference, manage lifecycle |
OrtValueWrapper |
Create/read tensors with Dart types |
OrtProviders |
Query and configure execution providers |
OrtIsolateSession |
Run inference off the UI thread (background isolate) |
OrtSessionWrapper
// Auto providers
OrtSessionWrapper.create(modelPath);
OrtSessionWrapper.create(modelPath, providerOptions: { ... });
// Manual providers
OrtSessionWrapper.createWithProviders(modelPath, providers: [...]);
// Inference
session.run(inputs) // -> List<OrtValueWrapper>
session.runFloat(inputs, outputSizes) // -> List<Float32List>
// Metadata
session.inputNames // List<String>
session.outputNames // List<String>
session.dispose();
OrtValueWrapper
// Create
OrtValueWrapper.fromFloat(runtime, shape, float32Data);
OrtValueWrapper.fromInt64(runtime, shape, int64Data);
// Read
value.toFloatList(elementCount); // -> Float32List
value.release();
Low-Level (FFI pointers) #
For advanced use cases, OnnxRuntime and OrtTensor expose the full C API with raw pointers. The generated bindings are also exported for direct access.
final rt = OnnxRuntime.instance;
final options = rt.createSessionOptions();
final session = rt.createSession('model.onnx', options);
final tensor = OrtTensor(rt);
final input = tensor.createFloat([1, 3], data);
final outputs = rt.run(session,
inputNames: ['input'],
inputValues: [input],
outputNames: ['output'],
);
final result = tensor.getDataFloat(outputs.first, 10);
// Manual cleanup required
tensor.release(input);
for (final o in outputs) { tensor.release(o); }
rt.releaseSession(session);
rt.releaseSessionOptions(options);
Example #
The example/ app demonstrates real-world computer vision inference with YOLO models and includes comprehensive performance tuning:
- YOLO Setup: Model selection, provider configuration, and performance tuning UI
- Camera Detection: Real-time YOLO inference on camera feed with FPS/inference stats
- Image Detection: Static image inference with bounding box overlay
- Video Detection: Frame-by-frame inference on video with detection overlay
- Performance Tuning: Configure threading, graph optimization, and execution mode
- Execution Providers: Test different providers (WebGPU, NNAPI, XNNPACK, CoreML)
Features demonstrated:
- Dynamic model loading (.onnx/.ort formats)
- Platform-aware provider selection (WebGPU/NNAPI/XNNPACK on Android, CoreML on iOS)
- Session configuration for Android Big.LITTLE optimization
- Provider-specific options (NNAPI flags, XNNPACK threads, CoreML compute units)
- Background isolate inference to prevent UI freezes
cd example
flutter run
Regenerating Bindings #
dart run ffigen --config ffigen.yaml
Recent Changes #
v1.0.3+ #
- WebGPU Support: Added WebGPU execution provider for Android GPU acceleration
- Session Configuration: New
SessionConfigclass for fine-tuning performance- Intra-op/inter-op thread control
- Graph optimization levels (disabled → all)
- Execution modes (sequential/parallel)
- Android Big.LITTLE optimization preset
- Performance Tuning UI: Example app now includes comprehensive tuning controls
- Video Detection: Fixed playback stuttering with self-scheduling inference loop
- Provider Summary: Fixed provider options display to respect manual selection
Provider Priority Updates #
- Android now prioritizes GPU providers: WebGPU → NNAPI → XNNPACK → CPU
- iOS: CoreML → CPU
- Linux: CPU only
License #
MIT. See LICENSE.