AI Edge RAG (Retrieval Augmented Generation) #

A Flutter plugin for on-device AI inference with Retrieval Augmented Generation (RAG) capabilities powered by MediaPipe GenAI. Enable your LLMs to access and use relevant information from your own documents while keeping everything on-device.

Features #

📚 RAG Support - Enhance LLM responses with context from your own documents
🔍 Semantic Search - Find relevant information using vector similarity
💾 Vector Store - Store embeddings in memory or SQLite for persistence
🧠 Local Embeddings - Generate embeddings on-device (Gemma, Gecko models)
☁️ Gemini Embeddings - Alternative cloud-based embeddings via Gemini API
📄 Text Chunking - Automatically split large documents into manageable pieces
🚀 On-device inference - All processing happens locally (except Gemini embeddings)
🔒 Privacy-first - Your documents and queries stay on the device
🌊 Streaming responses - Real-time text generation with partial results

Installation #

flutter pub add ai_edge_rag

Or add it manually to your pubspec.yaml:

dependencies:
  ai_edge_rag:

Getting Started #

1. Basic RAG Setup #

import 'package:ai_edge_rag/ai_edge_rag.dart';

// Get the AI Edge RAG instance
final aiEdgeRag = AiEdgeRag.instance;

// Step 1: Initialize the language model
await aiEdgeRag.initialize(
  modelPath: '/path/to/your/model.task',
  maxTokens: 512,
  temperature: 0.7,
);

// Step 2: Create an embedding model for RAG
await aiEdgeRag.createEmbeddingModel(
  tokenizerModelPath: '/path/to/tokenizer.model',
  embeddingModelPath: '/path/to/embedding.bin',
  modelType: EmbeddingModelType.gemma, // Optional, defaults to gemma
  vectorStore: VectorStore.sqlite, // Optional, defaults to inMemory
);

// Step 3: Set system instruction for RAG behavior
await aiEdgeRag.setSystemInstruction(
  SystemInstruction(
    instruction: 'Use the provided context to answer questions accurately. '
        'If the answer is not in the context, say so explicitly.',
  ),
);

// Step 4: Add your documents to the vector store
await aiEdgeRag.memorizeChunkedText(
  '''Flutter is Google's UI toolkit for building beautiful, natively compiled
  applications for mobile, web, and desktop from a single codebase.

  Dart is the programming language used by Flutter. It's optimized for
  building user interfaces with features like hot reload.''',
  chunkSize: 512,
  chunkOverlap: 50,
);

// Step 5: Ask questions and get context-aware responses
final stream = aiEdgeRag.generateResponseAsync(
  'What programming language does Flutter use?',
  topK: 3, // Number of relevant chunks to retrieve
  minSimilarityScore: 0.3, // Minimum relevance threshold
);

await for (final event in stream) {
  print('Response: ${event.partialResult}');

  if (event.done) {
    print('Generation completed!');
  }
}

// Clean up when done
await aiEdgeRag.close();

2. Model Requirements #

This plugin requires:

Language Model: A MediaPipe Task format model (.task file) for text generation
Embedding Model: Either:
- Local embedding model files (tokenizer + embedding model)
- Gemini API key for cloud-based embeddings

Usage #

Using Local Embeddings (Recommended for Privacy) #

// Create a local embedding model
await aiEdgeRag.createEmbeddingModel(
  tokenizerModelPath: '/path/to/tokenizer.model',
  embeddingModelPath: '/path/to/embedding.bin',
  modelType: EmbeddingModelType.gemma, // Optional: gemma (default) or gecko
  vectorStore: VectorStore.sqlite, // Optional: sqlite or inMemory (default)
  preferredBackend: PreferredBackend.gpu, // Optional: gpu or cpu (default), Android only
);

Using Gemini API Embeddings #

// Create a Gemini-based embedder
await aiEdgeRag.createGeminiEmbedder(
  geminiEmbeddingModel: 'models/text-embedding-004',
  geminiApiKey: 'your-api-key-here',
  vectorStore: VectorStore.sqlite, // Optional: sqlite or inMemory (default)
);

Adding Documents to RAG #

Option 1: Add Pre-chunked Text

// Add a single chunk
await aiEdgeRag.memorizeChunk(
  'Flutter is an open-source UI framework by Google.',
);

// Add multiple chunks
await aiEdgeRag.memorizeChunks([
  'Flutter is an open-source UI framework by Google.',
  'Dart is the programming language used by Flutter.',
  'Flutter supports cross-platform development.',
]);

Option 2: Auto-chunk Large Documents

// Read a large document
final document = await File('documentation.txt').readAsString();

// Automatically chunk and store
await aiEdgeRag.memorizeChunkedText(
  document,
  chunkSize: 512, // Characters per chunk
  chunkOverlap: 50, // Overlap for context continuity
);

Querying with RAG #

// Basic query with default settings
final stream = aiEdgeRag.generateResponseAsync(
  'How does Flutter handle state management?',
);

await for (final event in stream) {
  // Display the response as it's generated
  print(event.partialResult);

  if (event.done) {
    break;
  }
}

Advanced RAG Configuration #

// Customize retrieval parameters
final stream = aiEdgeRag.generateResponseAsync(
  'What is Flutter?',
  topK: 5, // Retrieve top 5 most relevant chunks
  minSimilarityScore: 0.3, // Only use chunks with similarity > 0.3
);

await for (final event in stream) {
  print(event.partialResult);
}

Vector Store Options #

// In-memory vector store (default, fast but not persistent)
await aiEdgeRag.createEmbeddingModel(
  tokenizerModelPath: tokenizerPath,
  embeddingModelPath: embeddingPath,
  vectorStore: VectorStore.inMemory,
);

// SQLite vector store (persistent across app restarts)
await aiEdgeRag.createEmbeddingModel(
  tokenizerModelPath: tokenizerPath,
  embeddingModelPath: embeddingPath,
  vectorStore: VectorStore.sqlite,
);

Platform Setup #

iOS #

❌ Not yet supported - RAG features are currently Android-only. iOS support is planned for a future release.

Android #

Minimum SDK: Android API level 24 (Android 7.0) or later
- This is a requirement from MediaPipe GenAI SDK
- Flutter's default minSdkVersion is 21, so you must update it

Add to your android/app/build.gradle:

android {
  defaultConfig {
      minSdkVersion 24  // Required by MediaPipe GenAI
  }
}

Recommended Devices:
- Optimal performance on Pixel 7 or newer
- Other high-end Android devices with comparable specs
For large models and documents, you may need to increase heap size in android/app/src/main/AndroidManifest.xml:
```
<application
  android:largeHeap="true"
  ...>
```

Model Preparation #

Language Model #

This plugin uses MediaPipe Task format (.task files) for the language model. See the ai_edge package documentation for details on obtaining .task models.

Embedding Models #

Local Embedding Models

You need two files:

Tokenizer Model: Converts text to tokens (e.g., tokenizer.model)
Embedding Model: Generates vector embeddings (e.g., embedding.bin)

Supported model types:

Gemma: Text embedding models from Google's Gemma family
Gecko: Text embedding models optimized for retrieval tasks

Gemini API Embeddings

Alternatively, use Google's Gemini API for embeddings:

No local model files needed
Requires internet connection
Requires a Gemini API key
Recommended models: models/text-embedding-004

API Reference #

Main Classes #

`AiEdgeRag`

The main entry point for RAG capabilities.

Key Methods:

initialize() - Set up language model and session
createEmbeddingModel() - Create local embedding model
createGeminiEmbedder() - Create Gemini API embedder
memorizeChunk() - Store a single text chunk
memorizeChunks() - Store multiple text chunks
memorizeChunkedText() - Auto-chunk and store large text
setSystemInstruction() - Configure RAG behavior
generateResponseAsync() - Generate context-aware responses
close() - Clean up resources

`EmbeddingModelConfig`

Configuration for local embedding models:

tokenizerModelPath - Path to tokenizer model file (required)
embeddingModelPath - Path to embedding model file (required)
modelType - Type of embedding model: gemma (default) or gecko (optional)
vectorStore - Storage type: inMemory (default) or sqlite (optional)
preferredBackend - Hardware backend: cpu (default) or gpu (optional, Android only)

`GeminiEmbedderConfig`

Configuration for Gemini API embeddings:

geminiEmbeddingModel - Gemini model name (required, e.g., 'models/text-embedding-004')
geminiApiKey - Your Gemini API key (required)
vectorStore - Storage type: inMemory (default) or sqlite (optional)

`SystemInstruction`

RAG-specific system instruction:

instruction - Text guiding how the model uses retrieved context

`VectorStore`

Storage options for embeddings:

inMemory - Fast, not persistent (default)
sqlite - Persistent across app restarts

`EmbeddingModelType`

Supported embedding models:

gemma - Gemma embedding models
gecko - Gecko embedding models

`GenerationEvent`

Event emitted during streaming generation:

partialResult - The accumulated text generated so far
done - Whether generation is complete

Example App #

Check out the examples/ai_chat_rag directory for a complete RAG chat application demonstrating:

Document loading and chunking
Semantic search and retrieval
Context-aware response generation
Real-time streaming responses
Vector store management
Error handling

Run the example:

cd examples/ai_chat_rag
flutter run

Use Cases #

Knowledge Base Q&A #

Build a chatbot that answers questions based on your documentation, manuals, or knowledge base.

Document Analysis #

Let users query information from uploaded documents (PDFs, text files, etc.).

Code Documentation Assistant #

Create an assistant that helps developers by referencing your codebase documentation.

Personal Note Assistant #

Build a smart notes app where users can ask questions about their notes.

Best Practices #

Chunking Strategy #

Chunk size: 256-512 characters works well for most use cases
Overlap: 50-100 characters helps maintain context between chunks
Use memorizeChunkedText() for automatic chunking

System Instructions #

Provide clear instructions on how to use retrieved context:

SystemInstruction(
  instruction: '''You are a helpful assistant. Use the provided context to answer questions.
  If the answer is not in the context, say "I don't have that information."
  Always cite the context when answering.'''
)

Retrieval Parameters #

topK: Start with 3-5, increase if responses lack context
minSimilarityScore: Start with 0.2-0.3, adjust based on quality

Vector Store Choice #

Use VectorStore.inMemory for temporary data or prototyping
Use VectorStore.sqlite for persistent knowledge bases

Troubleshooting #

Common Issues #

Embeddings fail to create:

Ensure embedding model files exist at specified paths
Check file permissions
Verify model format matches the selected model type

Responses don't use context:

Check that documents were successfully added with memorizeChunk/s/ChunkedText
Increase topK to retrieve more context
Lower minSimilarityScore threshold
Improve system instructions to emphasize using context

Out of memory errors:

Use VectorStore.sqlite instead of inMemory
Reduce chunkSize when processing documents
Process large documents in batches
Enable largeHeap on Android

Slow inference:

Enable GPU acceleration with PreferredBackend.gpu
Use smaller language models
Reduce maxTokens for shorter outputs
Consider using Gemini API embeddings for faster embedding generation

Limitations #

Currently Android-only (iOS support planned)
Embedding models must be in MediaPipe format
SQLite vector store uses basic similarity search (no advanced indexing)

License #

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

Acknowledgments #

This plugin is built on top of:

MediaPipe GenAI by Google for LLM inference
MediaPipe RAG SDK for RAG capabilities

Links #

Pub.dev Package
GitHub Repository
Issue Tracker
MediaPipe Documentation
Related Packages:
- ai_edge - Basic on-device LLM inference
- ai_edge_fc - Function calling support

ai_edge_rag 0.0.1 ai_edge_rag: ^0.0.1 copied to clipboard

Metadata