mobile_rag_engine 0.3.4 copy "mobile_rag_engine: ^0.3.4" to clipboard
mobile_rag_engine: ^0.3.4 copied to clipboard

A high-performance, on-device RAG (Retrieval-Augmented Generation) engine for Flutter. Run semantic search completely offline on iOS and Android with HNSW vector indexing.

Mobile RAG Engine #

A Flutter package for fully local RAG (Retrieval-Augmented Generation) on mobile devices.

Why I Built This #

Implementing AI-powered search on mobile typically requires a server. Embedding generation, vector storage, similarity search—all handled server-side, with the app just making API calls.

But this approach has problems:

  • No internet, no functionality
  • User data gets sent to servers
  • Ongoing server costs

So I found a way to do everything on-device.

Technical Challenges #

I first tried pure Dart. Loading ONNX models, tokenizing, generating embeddings—it was too slow. Vector search became noticeably laggy with just 1,000 documents.

So I brought in Rust.

Rust + Flutter Architecture #

📖 Architecture Guide - Detailed system architecture documentation

Flutter (Dart)
    │
    ├── EmbeddingService (ONNX Runtime)
    │       └── text → 384-dim vector
    │
    └── flutter_rust_bridge (FFI)
            │
            ▼
        Rust
            ├── Tokenizer (HuggingFace tokenizers)
            ├── SQLite (vector storage)
            └── HNSW Index (O(log n) search)

Rust's tokenizers crate is 10x+ faster than Dart for tokenization. Vector search improved from O(n) to O(log n) using the instant-distance HNSW implementation.

How It Differs #

vs. Server-based RAG #

  • Works completely offline
  • Data never leaves the device
  • Zero network latency

vs. Pure Dart Implementation #

  • Native Rust performance
  • HNSW enables fast search even with large document sets
  • Memory-efficient vector storage

vs. Existing Flutter Vector DBs #

  • Direct ONNX model loading (no external APIs needed)
  • Swappable models for Korean/multilingual support
  • Integrated pipeline from embedding to search

Performance #

Tested on iOS Simulator (Apple Silicon Mac):

Operation Time
Tokenization (short text) 0.8ms
Embedding generation (short text) 4ms
Embedding generation (long text) 36ms
HNSW search (100 docs) 1ms

With 1ms search on 100 documents, real-time search is feasible up to 10,000+ documents.

Problems Solved During Development #

1. iOS Cross-Compilation #

Initially, the onig regex library blocked iOS builds. ___chkstk_darwin symbol missing error. Switched to pure Rust fancy-regex to fix it.

2. HNSW Index Timing #

Rebuilding HNSW on every document insert results in O(n²) complexity. Changed to rebuild once after bulk inserts.

3. Duplicate Document Handling #

Identical documents caused duplicates in search results. Added SHA256 content hashing to skip already-stored documents.

4. ONNX Runtime Thread Safety #

Tried parallel batch embedding, but onnxruntime's OrtSession isn't thread-safe. Switched to sequential processing—still fast enough for real-world use since individual embeddings are quick.

Quick Start #

1. Install the Package #

dependencies:
  mobile_rag_engine: ^0.3.2

2. Install Rust (Required for Build) #

This package uses Rust for high-performance tokenization and vector search.

macOS / Linux:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Windows:

winget install Rustlang.Rustup

Restart your terminal after installation. Required toolchains are auto-installed by cargokit.

3. Download Model Files #

Download the required model files to your project's assets/ folder:

# Create assets folder if it doesn't exist
[ ! -d "assets" ] && mkdir assets
cd assets

# Download BGE-m3 model (INT8 quantized, multilingual)
curl -L -o model.onnx "https://huggingface.co/Teradata/bge-m3/resolve/main/onnx/model_int8.onnx"
# 
curl -L -o tokenizer.json "https://huggingface.co/BAAI/bge-m3/resolve/main/tokenizer.json"

Alternative models:

Model Size Best For
Teradata/bge-m3 (INT8) ~200MB Multilingual (Korean, English, etc.)
all-MiniLM-L6-v2 ~25MB English only, faster

4. Update pubspec.yaml #

flutter:
  assets:
    - assets/

5. Model Deployment Strategies (Production) #

When releasing your app to production, consider these strategies for including ML models:

Strategy App Size Offline Ready Best For
Bundle in Assets +200MB ✅ Immediate Prototypes, offline-first apps
Download on First Launch ~10MB After download Production apps
On-Demand Resources ~10MB After download iOS/Android optimized delivery

Recommended for production — Download on first launch:

final modelFile = File('${appDir}/model.onnx');
if (!modelFile.existsSync()) {
  // Show download progress UI
  await downloadModel(MODEL_URL, modelFile);
}
await EmbeddingService.initFromFile(modelFile.path);

Usage #

iOS Setup #

For iOS, the Rust library is statically linked. Add this import and use ExternalLibrary.process():

import 'dart:io' show Platform;
import 'package:flutter_rust_bridge/flutter_rust_bridge_for_generated.dart';

Initialization #

import 'package:mobile_rag_engine/mobile_rag_engine.dart';
import 'package:flutter_rust_bridge/flutter_rust_bridge_for_generated.dart';

Future<void> initializeRag() async {
  // Initialize Rust library (platform-specific)
  if (Platform.isIOS || Platform.isMacOS) {
    // iOS/macOS: static library loaded via DynamicLibrary.process()
    await RustLib.init(
      externalLibrary: ExternalLibrary.process(iKnowHowToUseIt: true),
    );
  } else {
    // Android/Linux/Windows: dynamic library
    await RustLib.init();
  }

  // Load tokenizer
  await initTokenizer(tokenizerPath: 'path/to/tokenizer.json');

  // Load ONNX model
  final modelBytes = await rootBundle.load('assets/model.onnx');
  await EmbeddingService.init(modelBytes.buffer.asUint8List());

  // Initialize DB
  await initDb(dbPath: 'path/to/rag.db');
}

Note: On iOS, if you encounter symbol not found errors, ensure your Podfile has been updated with pod install after adding the dependency.

Adding Documents #

final text = "Flutter is a cross-platform UI framework.";
final embedding = await EmbeddingService.embed(text);

final result = await addDocument(
  dbPath: dbPath,
  content: text,
  embedding: embedding,
);

if (result.isDuplicate) {
  print("Document already exists");
}

// Rebuild index after bulk inserts
await rebuildHnswIndex(dbPath: dbPath);

Searching #

final query = "cross-platform development";
final queryEmbedding = await EmbeddingService.embed(query);

final results = await searchSimilar(
  dbPath: dbPath,
  queryEmbedding: queryEmbedding,
  topK: 5,
);

for (final doc in results) {
  print(doc);
}

Custom Models #

To use a different model, export any Sentence Transformer to ONNX:

pip install optimum[exporters]
optimum-cli export onnx --model sentence-transformers/YOUR_MODEL ./output

Then use output/model.onnx and output/tokenizer.json in your assets.

Releases #

  • v0.3.0 - Rust Semantic Chunking - Migrated to Unicode-based semantic chunking
  • v0.2.0 - LLM-Optimized Chunking - Added chunking and context assembly

Future Plans #

  • INT8 quantization to reduce model size
  • Korean-specific models (KoSimCSE, KR-SBERT)
  • Chunking strategies for long documents ✅ (v0.2.0, v0.3.0)
  • Hybrid search (keyword + semantic)

License #

MIT

Contributing #

Bug reports, feature requests, and PRs are all welcome.

5
likes
0
points
279
downloads

Publisher

verified publisherglasses-dev.win

Weekly Downloads

A high-performance, on-device RAG (Retrieval-Augmented Generation) engine for Flutter. Run semantic search completely offline on iOS and Android with HNSW vector indexing.

Repository (GitHub)
View/report issues

Topics

#ai #machine-learning #semantic-search #vector-database #rag

License

unknown (license)

Dependencies

flutter, flutter_rust_bridge, freezed_annotation, onnxruntime, path_provider, rag_engine_flutter

More

Packages that depend on mobile_rag_engine