ai_edge_rag 0.0.1
ai_edge_rag: ^0.0.1 copied to clipboard
Flutter plugin for on-device AI inference with MediaPipe RAG.
AI Edge RAG (Retrieval Augmented Generation) #
A Flutter plugin for on-device AI inference with Retrieval Augmented Generation (RAG) capabilities powered by MediaPipe GenAI. Enable your LLMs to access and use relevant information from your own documents while keeping everything on-device.
Features #
- 📚 RAG Support - Enhance LLM responses with context from your own documents
- 🔍 Semantic Search - Find relevant information using vector similarity
- 💾 Vector Store - Store embeddings in memory or SQLite for persistence
- 🧠 Local Embeddings - Generate embeddings on-device (Gemma, Gecko models)
- ☁️ Gemini Embeddings - Alternative cloud-based embeddings via Gemini API
- 📄 Text Chunking - Automatically split large documents into manageable pieces
- 🚀 On-device inference - All processing happens locally (except Gemini embeddings)
- 🔒 Privacy-first - Your documents and queries stay on the device
- 🌊 Streaming responses - Real-time text generation with partial results
Installation #
flutter pub add ai_edge_rag
Or add it manually to your pubspec.yaml:
dependencies:
ai_edge_rag:
Getting Started #
1. Basic RAG Setup #
import 'package:ai_edge_rag/ai_edge_rag.dart';
// Get the AI Edge RAG instance
final aiEdgeRag = AiEdgeRag.instance;
// Step 1: Initialize the language model
await aiEdgeRag.initialize(
modelPath: '/path/to/your/model.task',
maxTokens: 512,
temperature: 0.7,
);
// Step 2: Create an embedding model for RAG
await aiEdgeRag.createEmbeddingModel(
tokenizerModelPath: '/path/to/tokenizer.model',
embeddingModelPath: '/path/to/embedding.bin',
modelType: EmbeddingModelType.gemma, // Optional, defaults to gemma
vectorStore: VectorStore.sqlite, // Optional, defaults to inMemory
);
// Step 3: Set system instruction for RAG behavior
await aiEdgeRag.setSystemInstruction(
SystemInstruction(
instruction: 'Use the provided context to answer questions accurately. '
'If the answer is not in the context, say so explicitly.',
),
);
// Step 4: Add your documents to the vector store
await aiEdgeRag.memorizeChunkedText(
'''Flutter is Google's UI toolkit for building beautiful, natively compiled
applications for mobile, web, and desktop from a single codebase.
Dart is the programming language used by Flutter. It's optimized for
building user interfaces with features like hot reload.''',
chunkSize: 512,
chunkOverlap: 50,
);
// Step 5: Ask questions and get context-aware responses
final stream = aiEdgeRag.generateResponseAsync(
'What programming language does Flutter use?',
topK: 3, // Number of relevant chunks to retrieve
minSimilarityScore: 0.3, // Minimum relevance threshold
);
await for (final event in stream) {
print('Response: ${event.partialResult}');
if (event.done) {
print('Generation completed!');
}
}
// Clean up when done
await aiEdgeRag.close();
2. Model Requirements #
This plugin requires:
- Language Model: A MediaPipe Task format model (
.taskfile) for text generation - Embedding Model: Either:
- Local embedding model files (tokenizer + embedding model)
- Gemini API key for cloud-based embeddings
Usage #
Using Local Embeddings (Recommended for Privacy) #
// Create a local embedding model
await aiEdgeRag.createEmbeddingModel(
tokenizerModelPath: '/path/to/tokenizer.model',
embeddingModelPath: '/path/to/embedding.bin',
modelType: EmbeddingModelType.gemma, // Optional: gemma (default) or gecko
vectorStore: VectorStore.sqlite, // Optional: sqlite or inMemory (default)
preferredBackend: PreferredBackend.gpu, // Optional: gpu or cpu (default), Android only
);
Using Gemini API Embeddings #
// Create a Gemini-based embedder
await aiEdgeRag.createGeminiEmbedder(
geminiEmbeddingModel: 'models/text-embedding-004',
geminiApiKey: 'your-api-key-here',
vectorStore: VectorStore.sqlite, // Optional: sqlite or inMemory (default)
);
Adding Documents to RAG #
Option 1: Add Pre-chunked Text
// Add a single chunk
await aiEdgeRag.memorizeChunk(
'Flutter is an open-source UI framework by Google.',
);
// Add multiple chunks
await aiEdgeRag.memorizeChunks([
'Flutter is an open-source UI framework by Google.',
'Dart is the programming language used by Flutter.',
'Flutter supports cross-platform development.',
]);
Option 2: Auto-chunk Large Documents
// Read a large document
final document = await File('documentation.txt').readAsString();
// Automatically chunk and store
await aiEdgeRag.memorizeChunkedText(
document,
chunkSize: 512, // Characters per chunk
chunkOverlap: 50, // Overlap for context continuity
);
Querying with RAG #
// Basic query with default settings
final stream = aiEdgeRag.generateResponseAsync(
'How does Flutter handle state management?',
);
await for (final event in stream) {
// Display the response as it's generated
print(event.partialResult);
if (event.done) {
break;
}
}
Advanced RAG Configuration #
// Customize retrieval parameters
final stream = aiEdgeRag.generateResponseAsync(
'What is Flutter?',
topK: 5, // Retrieve top 5 most relevant chunks
minSimilarityScore: 0.3, // Only use chunks with similarity > 0.3
);
await for (final event in stream) {
print(event.partialResult);
}
Vector Store Options #
// In-memory vector store (default, fast but not persistent)
await aiEdgeRag.createEmbeddingModel(
tokenizerModelPath: tokenizerPath,
embeddingModelPath: embeddingPath,
vectorStore: VectorStore.inMemory,
);
// SQLite vector store (persistent across app restarts)
await aiEdgeRag.createEmbeddingModel(
tokenizerModelPath: tokenizerPath,
embeddingModelPath: embeddingPath,
vectorStore: VectorStore.sqlite,
);
Platform Setup #
iOS #
❌ Not yet supported - RAG features are currently Android-only. iOS support is planned for a future release.
Android #
-
Minimum SDK: Android API level 24 (Android 7.0) or later
- This is a requirement from MediaPipe GenAI SDK
- Flutter's default minSdkVersion is 21, so you must update it
-
Add to your
android/app/build.gradle:android { defaultConfig { minSdkVersion 24 // Required by MediaPipe GenAI } } -
Recommended Devices:
- Optimal performance on Pixel 7 or newer
- Other high-end Android devices with comparable specs
-
For large models and documents, you may need to increase heap size in
android/app/src/main/AndroidManifest.xml:<application android:largeHeap="true" ...>
Model Preparation #
Language Model #
This plugin uses MediaPipe Task format (.task files) for the language model. See the ai_edge package documentation for details on obtaining .task models.
Embedding Models #
Local Embedding Models
You need two files:
- Tokenizer Model: Converts text to tokens (e.g.,
tokenizer.model) - Embedding Model: Generates vector embeddings (e.g.,
embedding.bin)
Supported model types:
- Gemma: Text embedding models from Google's Gemma family
- Gecko: Text embedding models optimized for retrieval tasks
Gemini API Embeddings
Alternatively, use Google's Gemini API for embeddings:
- No local model files needed
- Requires internet connection
- Requires a Gemini API key
- Recommended models:
models/text-embedding-004
API Reference #
Main Classes #
AiEdgeRag
The main entry point for RAG capabilities.
Key Methods:
initialize()- Set up language model and sessioncreateEmbeddingModel()- Create local embedding modelcreateGeminiEmbedder()- Create Gemini API embeddermemorizeChunk()- Store a single text chunkmemorizeChunks()- Store multiple text chunksmemorizeChunkedText()- Auto-chunk and store large textsetSystemInstruction()- Configure RAG behaviorgenerateResponseAsync()- Generate context-aware responsesclose()- Clean up resources
EmbeddingModelConfig
Configuration for local embedding models:
tokenizerModelPath- Path to tokenizer model file (required)embeddingModelPath- Path to embedding model file (required)modelType- Type of embedding model:gemma(default) orgecko(optional)vectorStore- Storage type:inMemory(default) orsqlite(optional)preferredBackend- Hardware backend:cpu(default) orgpu(optional, Android only)
GeminiEmbedderConfig
Configuration for Gemini API embeddings:
geminiEmbeddingModel- Gemini model name (required, e.g., 'models/text-embedding-004')geminiApiKey- Your Gemini API key (required)vectorStore- Storage type:inMemory(default) orsqlite(optional)
SystemInstruction
RAG-specific system instruction:
instruction- Text guiding how the model uses retrieved context
VectorStore
Storage options for embeddings:
inMemory- Fast, not persistent (default)sqlite- Persistent across app restarts
EmbeddingModelType
Supported embedding models:
gemma- Gemma embedding modelsgecko- Gecko embedding models
GenerationEvent
Event emitted during streaming generation:
partialResult- The accumulated text generated so fardone- Whether generation is complete
Example App #
Check out the examples/ai_chat_rag directory for a complete RAG chat application demonstrating:
- Document loading and chunking
- Semantic search and retrieval
- Context-aware response generation
- Real-time streaming responses
- Vector store management
- Error handling
Run the example:
cd examples/ai_chat_rag
flutter run
Use Cases #
Knowledge Base Q&A #
Build a chatbot that answers questions based on your documentation, manuals, or knowledge base.
Document Analysis #
Let users query information from uploaded documents (PDFs, text files, etc.).
Code Documentation Assistant #
Create an assistant that helps developers by referencing your codebase documentation.
Personal Note Assistant #
Build a smart notes app where users can ask questions about their notes.
Best Practices #
Chunking Strategy #
- Chunk size: 256-512 characters works well for most use cases
- Overlap: 50-100 characters helps maintain context between chunks
- Use
memorizeChunkedText()for automatic chunking
System Instructions #
Provide clear instructions on how to use retrieved context:
SystemInstruction(
instruction: '''You are a helpful assistant. Use the provided context to answer questions.
If the answer is not in the context, say "I don't have that information."
Always cite the context when answering.'''
)
Retrieval Parameters #
- topK: Start with 3-5, increase if responses lack context
- minSimilarityScore: Start with 0.2-0.3, adjust based on quality
Vector Store Choice #
- Use
VectorStore.inMemoryfor temporary data or prototyping - Use
VectorStore.sqlitefor persistent knowledge bases
Troubleshooting #
Common Issues #
Embeddings fail to create:
- Ensure embedding model files exist at specified paths
- Check file permissions
- Verify model format matches the selected model type
Responses don't use context:
- Check that documents were successfully added with
memorizeChunk/s/ChunkedText - Increase
topKto retrieve more context - Lower
minSimilarityScorethreshold - Improve system instructions to emphasize using context
Out of memory errors:
- Use
VectorStore.sqliteinstead ofinMemory - Reduce
chunkSizewhen processing documents - Process large documents in batches
- Enable
largeHeapon Android
Slow inference:
- Enable GPU acceleration with
PreferredBackend.gpu - Use smaller language models
- Reduce
maxTokensfor shorter outputs - Consider using Gemini API embeddings for faster embedding generation
Limitations #
- Currently Android-only (iOS support planned)
- Embedding models must be in MediaPipe format
- SQLite vector store uses basic similarity search (no advanced indexing)
License #
This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.
Acknowledgments #
This plugin is built on top of:
- MediaPipe GenAI by Google for LLM inference
- MediaPipe RAG SDK for RAG capabilities
Links #
- Pub.dev Package
- GitHub Repository
- Issue Tracker
- MediaPipe Documentation
- Related Packages:
- ai_edge - Basic on-device LLM inference
- ai_edge_fc - Function calling support