AI Edge RAG (Retrieval Augmented Generation)
A Flutter plugin for on-device AI inference with Retrieval Augmented Generation (RAG) capabilities powered by MediaPipe GenAI. Enable your LLMs to access and use relevant information from your own documents while keeping everything on-device.
Features
- 📚 RAG Support - Enhance LLM responses with context from your own documents
- 🔍 Semantic Search - Find relevant information using vector similarity
- 💾 Vector Store - Store embeddings in memory or SQLite for persistence
- 🧠 Local Embeddings - Generate embeddings on-device (Gemma, Gecko models)
- ☁️ Gemini Embeddings - Alternative cloud-based embeddings via Gemini API
- 📄 Text Chunking - Automatically split large documents into manageable pieces
- 🚀 On-device inference - All processing happens locally (except Gemini embeddings)
- 🔒 Privacy-first - Your documents and queries stay on the device
- 🌊 Streaming responses - Real-time text generation with partial results
Installation
flutter pub add ai_edge_rag
Or add it manually to your pubspec.yaml:
dependencies:
ai_edge_rag:
Getting Started
1. Basic RAG Setup
import 'package:ai_edge_rag/ai_edge_rag.dart';
// Get the AI Edge RAG instance
final aiEdgeRag = AiEdgeRag.instance;
// Step 1: Initialize the language model
await aiEdgeRag.initialize(
modelPath: '/path/to/your/model.task',
maxTokens: 512,
temperature: 0.7,
);
// Step 2: Create an embedding model for RAG
await aiEdgeRag.createEmbeddingModel(
tokenizerModelPath: '/path/to/tokenizer.model',
embeddingModelPath: '/path/to/embedding.bin',
modelType: EmbeddingModelType.gemma, // Optional, defaults to gemma
vectorStore: VectorStore.sqlite, // Optional, defaults to inMemory
);
// Step 3: Set system instruction for RAG behavior
await aiEdgeRag.setSystemInstruction(
SystemInstruction(
instruction: 'Use the provided context to answer questions accurately. '
'If the answer is not in the context, say so explicitly.',
),
);
// Step 4: Add your documents to the vector store
await aiEdgeRag.memorizeChunkedText(
'''Flutter is Google's UI toolkit for building beautiful, natively compiled
applications for mobile, web, and desktop from a single codebase.
Dart is the programming language used by Flutter. It's optimized for
building user interfaces with features like hot reload.''',
chunkSize: 512,
chunkOverlap: 50,
);
// Step 5: Ask questions and get context-aware responses
final stream = aiEdgeRag.generateResponseAsync(
'What programming language does Flutter use?',
topK: 3, // Number of relevant chunks to retrieve
minSimilarityScore: 0.3, // Minimum relevance threshold
);
await for (final event in stream) {
print('Response: ${event.partialResult}');
if (event.done) {
print('Generation completed!');
}
}
// Clean up when done
await aiEdgeRag.close();
2. Model Requirements
This plugin requires:
- Language Model: A MediaPipe Task format model (
.taskfile) for text generation - Embedding Model: Either:
- Local embedding model files (tokenizer + embedding model)
- Gemini API key for cloud-based embeddings
Usage
Using Local Embeddings (Recommended for Privacy)
// Create a local embedding model
await aiEdgeRag.createEmbeddingModel(
tokenizerModelPath: '/path/to/tokenizer.model',
embeddingModelPath: '/path/to/embedding.bin',
modelType: EmbeddingModelType.gemma, // Optional: gemma (default) or gecko
vectorStore: VectorStore.sqlite, // Optional: sqlite or inMemory (default)
preferredBackend: PreferredBackend.gpu, // Optional: gpu or cpu (default), Android only
);
Using Gemini API Embeddings
// Create a Gemini-based embedder
await aiEdgeRag.createGeminiEmbedder(
geminiEmbeddingModel: 'models/text-embedding-004',
geminiApiKey: 'your-api-key-here',
vectorStore: VectorStore.sqlite, // Optional: sqlite or inMemory (default)
);
Adding Documents to RAG
Option 1: Add Pre-chunked Text
// Add a single chunk
await aiEdgeRag.memorizeChunk(
'Flutter is an open-source UI framework by Google.',
);
// Add multiple chunks
await aiEdgeRag.memorizeChunks([
'Flutter is an open-source UI framework by Google.',
'Dart is the programming language used by Flutter.',
'Flutter supports cross-platform development.',
]);
Option 2: Auto-chunk Large Documents
// Read a large document
final document = await File('documentation.txt').readAsString();
// Automatically chunk and store
await aiEdgeRag.memorizeChunkedText(
document,
chunkSize: 512, // Characters per chunk
chunkOverlap: 50, // Overlap for context continuity
);
Querying with RAG
// Basic query with default settings
final stream = aiEdgeRag.generateResponseAsync(
'How does Flutter handle state management?',
);
await for (final event in stream) {
// Display the response as it's generated
print(event.partialResult);
if (event.done) {
break;
}
}
Advanced RAG Configuration
// Customize retrieval parameters
final stream = aiEdgeRag.generateResponseAsync(
'What is Flutter?',
topK: 5, // Retrieve top 5 most relevant chunks
minSimilarityScore: 0.3, // Only use chunks with similarity > 0.3
);
await for (final event in stream) {
print(event.partialResult);
}
Vector Store Options
// In-memory vector store (default, fast but not persistent)
await aiEdgeRag.createEmbeddingModel(
tokenizerModelPath: tokenizerPath,
embeddingModelPath: embeddingPath,
vectorStore: VectorStore.inMemory,
);
// SQLite vector store (persistent across app restarts)
await aiEdgeRag.createEmbeddingModel(
tokenizerModelPath: tokenizerPath,
embeddingModelPath: embeddingPath,
vectorStore: VectorStore.sqlite,
);
Platform Setup
iOS
❌ Not yet supported - RAG features are currently Android-only. iOS support is planned for a future release.
Android
-
Minimum SDK: Android API level 24 (Android 7.0) or later
- This is a requirement from MediaPipe GenAI SDK
- Flutter's default minSdkVersion is 21, so you must update it
-
Add to your
android/app/build.gradle:android { defaultConfig { minSdkVersion 24 // Required by MediaPipe GenAI } } -
Recommended Devices:
- Optimal performance on Pixel 7 or newer
- Other high-end Android devices with comparable specs
-
For large models and documents, you may need to increase heap size in
android/app/src/main/AndroidManifest.xml:<application android:largeHeap="true" ...>
Model Preparation
Language Model
This plugin uses MediaPipe Task format (.task files) for the language model. See the ai_edge package documentation for details on obtaining .task models.
Embedding Models
Local Embedding Models
You need two files:
- Tokenizer Model: Converts text to tokens (e.g.,
tokenizer.model) - Embedding Model: Generates vector embeddings (e.g.,
embedding.bin)
Supported model types:
- Gemma: Text embedding models from Google's Gemma family
- Gecko: Text embedding models optimized for retrieval tasks
Gemini API Embeddings
Alternatively, use Google's Gemini API for embeddings:
- No local model files needed
- Requires internet connection
- Requires a Gemini API key
- Recommended models:
models/text-embedding-004
API Reference
Main Classes
AiEdgeRag
The main entry point for RAG capabilities.
Key Methods:
initialize()- Set up language model and sessioncreateEmbeddingModel()- Create local embedding modelcreateGeminiEmbedder()- Create Gemini API embeddermemorizeChunk()- Store a single text chunkmemorizeChunks()- Store multiple text chunksmemorizeChunkedText()- Auto-chunk and store large textsetSystemInstruction()- Configure RAG behaviorgenerateResponseAsync()- Generate context-aware responsesclose()- Clean up resources
EmbeddingModelConfig
Configuration for local embedding models:
tokenizerModelPath- Path to tokenizer model file (required)embeddingModelPath- Path to embedding model file (required)modelType- Type of embedding model:gemma(default) orgecko(optional)vectorStore- Storage type:inMemory(default) orsqlite(optional)preferredBackend- Hardware backend:cpu(default) orgpu(optional, Android only)
GeminiEmbedderConfig
Configuration for Gemini API embeddings:
geminiEmbeddingModel- Gemini model name (required, e.g., 'models/text-embedding-004')geminiApiKey- Your Gemini API key (required)vectorStore- Storage type:inMemory(default) orsqlite(optional)
SystemInstruction
RAG-specific system instruction:
instruction- Text guiding how the model uses retrieved context
VectorStore
Storage options for embeddings:
inMemory- Fast, not persistent (default)sqlite- Persistent across app restarts
EmbeddingModelType
Supported embedding models:
gemma- Gemma embedding modelsgecko- Gecko embedding models
GenerationEvent
Event emitted during streaming generation:
partialResult- The accumulated text generated so fardone- Whether generation is complete
Example App
Check out the examples/ai_chat_rag directory for a complete RAG chat application demonstrating:
- Document loading and chunking
- Semantic search and retrieval
- Context-aware response generation
- Real-time streaming responses
- Vector store management
- Error handling
Run the example:
cd examples/ai_chat_rag
flutter run
Use Cases
Knowledge Base Q&A
Build a chatbot that answers questions based on your documentation, manuals, or knowledge base.
Document Analysis
Let users query information from uploaded documents (PDFs, text files, etc.).
Code Documentation Assistant
Create an assistant that helps developers by referencing your codebase documentation.
Personal Note Assistant
Build a smart notes app where users can ask questions about their notes.
Best Practices
Chunking Strategy
- Chunk size: 256-512 characters works well for most use cases
- Overlap: 50-100 characters helps maintain context between chunks
- Use
memorizeChunkedText()for automatic chunking
System Instructions
Provide clear instructions on how to use retrieved context:
SystemInstruction(
instruction: '''You are a helpful assistant. Use the provided context to answer questions.
If the answer is not in the context, say "I don't have that information."
Always cite the context when answering.'''
)
Retrieval Parameters
- topK: Start with 3-5, increase if responses lack context
- minSimilarityScore: Start with 0.2-0.3, adjust based on quality
Vector Store Choice
- Use
VectorStore.inMemoryfor temporary data or prototyping - Use
VectorStore.sqlitefor persistent knowledge bases
Troubleshooting
Common Issues
Embeddings fail to create:
- Ensure embedding model files exist at specified paths
- Check file permissions
- Verify model format matches the selected model type
Responses don't use context:
- Check that documents were successfully added with
memorizeChunk/s/ChunkedText - Increase
topKto retrieve more context - Lower
minSimilarityScorethreshold - Improve system instructions to emphasize using context
Out of memory errors:
- Use
VectorStore.sqliteinstead ofinMemory - Reduce
chunkSizewhen processing documents - Process large documents in batches
- Enable
largeHeapon Android
Slow inference:
- Enable GPU acceleration with
PreferredBackend.gpu - Use smaller language models
- Reduce
maxTokensfor shorter outputs - Consider using Gemini API embeddings for faster embedding generation
Limitations
- Currently Android-only (iOS support planned)
- Embedding models must be in MediaPipe format
- SQLite vector store uses basic similarity search (no advanced indexing)
License
This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.
Acknowledgments
This plugin is built on top of:
- MediaPipe GenAI by Google for LLM inference
- MediaPipe RAG SDK for RAG capabilities
Links
- Pub.dev Package
- GitHub Repository
- Issue Tracker
- MediaPipe Documentation
- Related Packages:
- ai_edge - Basic on-device LLM inference
- ai_edge_fc - Function calling support