runanywhere_onnx 0.16.0
runanywhere_onnx: ^0.16.0 copied to clipboard
ONNX Runtime backend for RunAnywhere Flutter SDK. On-device Speech-to-Text, Text-to-Speech, and Voice Activity Detection.
RunAnywhere ONNX Backend #
ONNX Runtime backend for the RunAnywhere Flutter SDK. Provides on-device Speech-to-Text (STT), Text-to-Speech (TTS), and Voice Activity Detection (VAD) capabilities.
Features #
| Feature | Description |
|---|---|
| Speech-to-Text (STT) | Transcribe audio using Whisper models |
| Text-to-Speech (TTS) | Neural voice synthesis with Piper models |
| Voice Activity Detection | Real-time speech detection with Silero VAD |
| Streaming Support | Real-time transcription and synthesis |
| Privacy-First | All processing happens locally on device |
| Multi-Language | Support for 100+ languages (Whisper) |
Installation #
Add both the core SDK and this backend to your pubspec.yaml:
dependencies:
runanywhere: ^0.15.11
runanywhere_onnx: ^0.15.11
Then run:
flutter pub get
Note: This package requires the core
runanywherepackage. It won't work standalone.
Platform Support #
| Platform | Minimum Version | Requirements |
|---|---|---|
| iOS | 14.0+ | Microphone permission |
| Android | API 24+ | RECORD_AUDIO permission |
Platform Setup #
iOS #
Update ios/Podfile:
platform :ios, '14.0'
target 'Runner' do
use_frameworks! :linkage => :static # Required!
flutter_install_all_ios_pods File.dirname(File.realpath(__FILE__))
end
Add to ios/Runner/Info.plist:
<key>NSMicrophoneUsageDescription</key>
<string>Microphone access is needed for speech recognition</string>
Android #
Add to android/app/src/main/AndroidManifest.xml:
<uses-permission android:name="android.permission.RECORD_AUDIO" />
Quick Start #
1. Initialize & Register #
import 'package:runanywhere/runanywhere.dart';
import 'package:runanywhere_onnx/runanywhere_onnx.dart';
void main() async {
WidgetsFlutterBinding.ensureInitialized();
// Initialize SDK
await RunAnywhere.initialize();
// Register ONNX backend
await Onnx.register();
runApp(MyApp());
}
2. Add Models #
// STT Model (Whisper)
Onnx.addModel(
id: 'whisper-tiny-en',
name: 'Whisper Tiny English',
url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/sherpa-onnx-whisper-tiny.en.tar.gz',
modality: ModelCategory.speechRecognition,
memoryRequirement: 75000000, // ~75MB
);
// TTS Model (Piper)
Onnx.addModel(
id: 'piper-amy-medium',
name: 'Piper Amy (English)',
url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/vits-piper-en_US-amy-medium.tar.gz',
modality: ModelCategory.speechSynthesis,
memoryRequirement: 50000000, // ~50MB
);
3. Speech-to-Text #
// Download and load STT model
await for (final p in RunAnywhere.downloadModel('whisper-tiny-en')) {
if (p.state.isCompleted) break;
}
await RunAnywhere.loadSTTModel('whisper-tiny-en');
// Transcribe audio (PCM16 @ 16kHz mono)
final text = await RunAnywhere.transcribe(audioData);
print('Transcription: $text');
// With detailed result
final result = await RunAnywhere.transcribeWithResult(audioData);
print('Text: ${result.text}');
print('Confidence: ${result.confidence}');
print('Language: ${result.language}');
4. Text-to-Speech #
// Download and load TTS model
await for (final p in RunAnywhere.downloadModel('piper-amy-medium')) {
if (p.state.isCompleted) break;
}
await RunAnywhere.loadTTSVoice('piper-amy-medium');
// Synthesize speech
final result = await RunAnywhere.synthesize(
'Hello! Welcome to RunAnywhere.',
rate: 1.0, // Speech rate
pitch: 1.0, // Speech pitch
);
print('Duration: ${result.durationSeconds}s');
print('Sample rate: ${result.sampleRate} Hz');
print('Samples: ${result.samples.length}');
// Play with audioplayers package
// await audioPlayer.play(BytesSource(wavBytes));
API Reference #
Onnx Class #
register()
Register the ONNX backend with the SDK.
static Future<void> register({int priority = 100})
Parameters:
priority– Backend priority (higher = preferred). Default: 100.
addModel()
Add an ONNX model to the registry.
static void addModel({
required String id,
required String name,
required String url,
required ModelCategory modality,
int memoryRequirement = 0,
})
Parameters:
id– Unique model identifiername– Human-readable model nameurl– Download URL (supports .tar.gz, .tar.bz2, .zip)modality– Model category (speechRecognition,speechSynthesis)memoryRequirement– Estimated memory usage in bytes
Supported Models #
Speech-to-Text (Whisper) #
| Model | Size | Memory | Languages | Speed |
|---|---|---|---|---|
| whisper-tiny.en | ~40MB | ~75MB | English only | Fastest |
| whisper-tiny | ~75MB | ~150MB | Multilingual | Fast |
| whisper-base.en | ~75MB | ~150MB | English only | Fast |
| whisper-base | ~150MB | ~300MB | Multilingual | Medium |
| whisper-small.en | ~250MB | ~500MB | English only | Slower |
Recommendation: Use
whisper-tiny.enfor English-only apps. Usewhisper-tinyfor multilingual support.
Text-to-Speech (Piper) #
| Voice | Language | Size | Quality |
|---|---|---|---|
| amy-medium | English (US) | ~50MB | Medium |
| amy-low | English (US) | ~25MB | Lower |
| lessac-medium | English (US) | ~50MB | Medium |
| Various | 30+ languages | Varies | Medium |
Recommendation: Use
amy-mediumfor good quality English TTS.
Voice Agent Integration #
For full voice assistant functionality, combine STT + LLM + TTS:
import 'package:runanywhere/runanywhere.dart';
import 'package:runanywhere_onnx/runanywhere_onnx.dart';
import 'package:runanywhere_llamacpp/runanywhere_llamacpp.dart';
// Initialize all backends
await RunAnywhere.initialize();
await Onnx.register();
await LlamaCpp.register();
// Load all models
await RunAnywhere.loadSTTModel('whisper-tiny-en');
await RunAnywhere.loadModel('smollm2-360m');
await RunAnywhere.loadTTSVoice('piper-amy-medium');
// Check voice agent readiness
print('Voice agent ready: ${RunAnywhere.isVoiceAgentReady}');
// Start voice session
if (RunAnywhere.isVoiceAgentReady) {
final session = await RunAnywhere.startVoiceSession();
session.events.listen((event) {
if (event is VoiceSessionTranscribed) {
print('User: ${event.text}');
} else if (event is VoiceSessionResponded) {
print('AI: ${event.text}');
}
});
}
Audio Format Requirements #
STT Input #
| Property | Requirement |
|---|---|
| Format | PCM16 (signed 16-bit) |
| Sample Rate | 16000 Hz |
| Channels | Mono (1 channel) |
| Encoding | Little-endian |
TTS Output #
| Property | Value |
|---|---|
| Format | Float32 PCM |
| Sample Rate | 22050 Hz (Piper default) |
| Channels | Mono (1 channel) |
Troubleshooting #
STT Returns Empty Text #
Possible Causes:
- Audio too short (< 0.5 seconds)
- Audio too quiet (no speech detected)
- Wrong audio format (not PCM16 @ 16kHz)
Solutions:
- Ensure audio is at least 1 second
- Check microphone input levels
- Verify audio format matches requirements
TTS Sounds Robotic #
Solutions:
- Use
*-mediumquality models instead of*-low - Adjust rate/pitch parameters
- Try different voice models
Model Loading Fails #
Solutions:
- Verify model is fully downloaded
- Check model format compatibility
- Ensure sufficient memory available
Permission Denied #
iOS:
- Add
NSMicrophoneUsageDescriptionto Info.plist - Request permission before recording
Android:
- Add
RECORD_AUDIOpermission to AndroidManifest.xml - Use
permission_handlerpackage to request at runtime
Memory Management #
// Unload STT model to free memory
await RunAnywhere.unloadSTTModel();
// Unload TTS voice
await RunAnywhere.unloadTTSVoice();
// Check current loaded models
print('STT loaded: ${RunAnywhere.isSTTModelLoaded}');
print('TTS loaded: ${RunAnywhere.isTTSVoiceLoaded}');
Related Packages #
- runanywhere — Core SDK (required)
- runanywhere_llamacpp — LLM backend
- runanywhere_onnx — STT/TTS/VAD backend (this package)
Resources #
License #
This software is licensed under the RunAnywhere License, which is based on Apache 2.0 with additional terms for commercial use. See LICENSE for details.
For commercial licensing inquiries, contact: san@runanywhere.ai