ai_edge_rag 0.0.1 copy "ai_edge_rag: ^0.0.1" to clipboard
ai_edge_rag: ^0.0.1 copied to clipboard

PlatformAndroid

Flutter plugin for on-device AI inference with MediaPipe RAG.

AI Edge RAG (Retrieval Augmented Generation) #

pub package Platform License: BSD-3-Clause

A Flutter plugin for on-device AI inference with Retrieval Augmented Generation (RAG) capabilities powered by MediaPipe GenAI. Enable your LLMs to access and use relevant information from your own documents while keeping everything on-device.

Features #

  • 📚 RAG Support - Enhance LLM responses with context from your own documents
  • 🔍 Semantic Search - Find relevant information using vector similarity
  • 💾 Vector Store - Store embeddings in memory or SQLite for persistence
  • 🧠 Local Embeddings - Generate embeddings on-device (Gemma, Gecko models)
  • ☁️ Gemini Embeddings - Alternative cloud-based embeddings via Gemini API
  • 📄 Text Chunking - Automatically split large documents into manageable pieces
  • 🚀 On-device inference - All processing happens locally (except Gemini embeddings)
  • 🔒 Privacy-first - Your documents and queries stay on the device
  • 🌊 Streaming responses - Real-time text generation with partial results

Installation #

flutter pub add ai_edge_rag

Or add it manually to your pubspec.yaml:

dependencies:
  ai_edge_rag:

Getting Started #

1. Basic RAG Setup #

import 'package:ai_edge_rag/ai_edge_rag.dart';

// Get the AI Edge RAG instance
final aiEdgeRag = AiEdgeRag.instance;

// Step 1: Initialize the language model
await aiEdgeRag.initialize(
  modelPath: '/path/to/your/model.task',
  maxTokens: 512,
  temperature: 0.7,
);

// Step 2: Create an embedding model for RAG
await aiEdgeRag.createEmbeddingModel(
  tokenizerModelPath: '/path/to/tokenizer.model',
  embeddingModelPath: '/path/to/embedding.bin',
  modelType: EmbeddingModelType.gemma, // Optional, defaults to gemma
  vectorStore: VectorStore.sqlite, // Optional, defaults to inMemory
);

// Step 3: Set system instruction for RAG behavior
await aiEdgeRag.setSystemInstruction(
  SystemInstruction(
    instruction: 'Use the provided context to answer questions accurately. '
        'If the answer is not in the context, say so explicitly.',
  ),
);

// Step 4: Add your documents to the vector store
await aiEdgeRag.memorizeChunkedText(
  '''Flutter is Google's UI toolkit for building beautiful, natively compiled
  applications for mobile, web, and desktop from a single codebase.

  Dart is the programming language used by Flutter. It's optimized for
  building user interfaces with features like hot reload.''',
  chunkSize: 512,
  chunkOverlap: 50,
);

// Step 5: Ask questions and get context-aware responses
final stream = aiEdgeRag.generateResponseAsync(
  'What programming language does Flutter use?',
  topK: 3, // Number of relevant chunks to retrieve
  minSimilarityScore: 0.3, // Minimum relevance threshold
);

await for (final event in stream) {
  print('Response: ${event.partialResult}');

  if (event.done) {
    print('Generation completed!');
  }
}

// Clean up when done
await aiEdgeRag.close();

2. Model Requirements #

This plugin requires:

  1. Language Model: A MediaPipe Task format model (.task file) for text generation
  2. Embedding Model: Either:
    • Local embedding model files (tokenizer + embedding model)
    • Gemini API key for cloud-based embeddings

Usage #

// Create a local embedding model
await aiEdgeRag.createEmbeddingModel(
  tokenizerModelPath: '/path/to/tokenizer.model',
  embeddingModelPath: '/path/to/embedding.bin',
  modelType: EmbeddingModelType.gemma, // Optional: gemma (default) or gecko
  vectorStore: VectorStore.sqlite, // Optional: sqlite or inMemory (default)
  preferredBackend: PreferredBackend.gpu, // Optional: gpu or cpu (default), Android only
);

Using Gemini API Embeddings #

// Create a Gemini-based embedder
await aiEdgeRag.createGeminiEmbedder(
  geminiEmbeddingModel: 'models/text-embedding-004',
  geminiApiKey: 'your-api-key-here',
  vectorStore: VectorStore.sqlite, // Optional: sqlite or inMemory (default)
);

Adding Documents to RAG #

Option 1: Add Pre-chunked Text

// Add a single chunk
await aiEdgeRag.memorizeChunk(
  'Flutter is an open-source UI framework by Google.',
);

// Add multiple chunks
await aiEdgeRag.memorizeChunks([
  'Flutter is an open-source UI framework by Google.',
  'Dart is the programming language used by Flutter.',
  'Flutter supports cross-platform development.',
]);

Option 2: Auto-chunk Large Documents

// Read a large document
final document = await File('documentation.txt').readAsString();

// Automatically chunk and store
await aiEdgeRag.memorizeChunkedText(
  document,
  chunkSize: 512, // Characters per chunk
  chunkOverlap: 50, // Overlap for context continuity
);

Querying with RAG #

// Basic query with default settings
final stream = aiEdgeRag.generateResponseAsync(
  'How does Flutter handle state management?',
);

await for (final event in stream) {
  // Display the response as it's generated
  print(event.partialResult);

  if (event.done) {
    break;
  }
}

Advanced RAG Configuration #

// Customize retrieval parameters
final stream = aiEdgeRag.generateResponseAsync(
  'What is Flutter?',
  topK: 5, // Retrieve top 5 most relevant chunks
  minSimilarityScore: 0.3, // Only use chunks with similarity > 0.3
);

await for (final event in stream) {
  print(event.partialResult);
}

Vector Store Options #

// In-memory vector store (default, fast but not persistent)
await aiEdgeRag.createEmbeddingModel(
  tokenizerModelPath: tokenizerPath,
  embeddingModelPath: embeddingPath,
  vectorStore: VectorStore.inMemory,
);

// SQLite vector store (persistent across app restarts)
await aiEdgeRag.createEmbeddingModel(
  tokenizerModelPath: tokenizerPath,
  embeddingModelPath: embeddingPath,
  vectorStore: VectorStore.sqlite,
);

Platform Setup #

iOS #

Not yet supported - RAG features are currently Android-only. iOS support is planned for a future release.

Android #

  • Minimum SDK: Android API level 24 (Android 7.0) or later

    • This is a requirement from MediaPipe GenAI SDK
    • Flutter's default minSdkVersion is 21, so you must update it
  • Add to your android/app/build.gradle:

    android {
      defaultConfig {
          minSdkVersion 24  // Required by MediaPipe GenAI
      }
    }
    
  • Recommended Devices:

    • Optimal performance on Pixel 7 or newer
    • Other high-end Android devices with comparable specs
  • For large models and documents, you may need to increase heap size in android/app/src/main/AndroidManifest.xml:

    <application
      android:largeHeap="true"
      ...>
    

Model Preparation #

Language Model #

This plugin uses MediaPipe Task format (.task files) for the language model. See the ai_edge package documentation for details on obtaining .task models.

Embedding Models #

Local Embedding Models

You need two files:

  1. Tokenizer Model: Converts text to tokens (e.g., tokenizer.model)
  2. Embedding Model: Generates vector embeddings (e.g., embedding.bin)

Supported model types:

  • Gemma: Text embedding models from Google's Gemma family
  • Gecko: Text embedding models optimized for retrieval tasks

Gemini API Embeddings

Alternatively, use Google's Gemini API for embeddings:

  • No local model files needed
  • Requires internet connection
  • Requires a Gemini API key
  • Recommended models: models/text-embedding-004

API Reference #

Main Classes #

AiEdgeRag

The main entry point for RAG capabilities.

Key Methods:

  • initialize() - Set up language model and session
  • createEmbeddingModel() - Create local embedding model
  • createGeminiEmbedder() - Create Gemini API embedder
  • memorizeChunk() - Store a single text chunk
  • memorizeChunks() - Store multiple text chunks
  • memorizeChunkedText() - Auto-chunk and store large text
  • setSystemInstruction() - Configure RAG behavior
  • generateResponseAsync() - Generate context-aware responses
  • close() - Clean up resources

EmbeddingModelConfig

Configuration for local embedding models:

  • tokenizerModelPath - Path to tokenizer model file (required)
  • embeddingModelPath - Path to embedding model file (required)
  • modelType - Type of embedding model: gemma (default) or gecko (optional)
  • vectorStore - Storage type: inMemory (default) or sqlite (optional)
  • preferredBackend - Hardware backend: cpu (default) or gpu (optional, Android only)

GeminiEmbedderConfig

Configuration for Gemini API embeddings:

  • geminiEmbeddingModel - Gemini model name (required, e.g., 'models/text-embedding-004')
  • geminiApiKey - Your Gemini API key (required)
  • vectorStore - Storage type: inMemory (default) or sqlite (optional)

SystemInstruction

RAG-specific system instruction:

  • instruction - Text guiding how the model uses retrieved context

VectorStore

Storage options for embeddings:

  • inMemory - Fast, not persistent (default)
  • sqlite - Persistent across app restarts

EmbeddingModelType

Supported embedding models:

  • gemma - Gemma embedding models
  • gecko - Gecko embedding models

GenerationEvent

Event emitted during streaming generation:

  • partialResult - The accumulated text generated so far
  • done - Whether generation is complete

Example App #

Check out the examples/ai_chat_rag directory for a complete RAG chat application demonstrating:

  • Document loading and chunking
  • Semantic search and retrieval
  • Context-aware response generation
  • Real-time streaming responses
  • Vector store management
  • Error handling

Run the example:

cd examples/ai_chat_rag
flutter run

Use Cases #

Knowledge Base Q&A #

Build a chatbot that answers questions based on your documentation, manuals, or knowledge base.

Document Analysis #

Let users query information from uploaded documents (PDFs, text files, etc.).

Code Documentation Assistant #

Create an assistant that helps developers by referencing your codebase documentation.

Personal Note Assistant #

Build a smart notes app where users can ask questions about their notes.

Best Practices #

Chunking Strategy #

  • Chunk size: 256-512 characters works well for most use cases
  • Overlap: 50-100 characters helps maintain context between chunks
  • Use memorizeChunkedText() for automatic chunking

System Instructions #

Provide clear instructions on how to use retrieved context:

SystemInstruction(
  instruction: '''You are a helpful assistant. Use the provided context to answer questions.
  If the answer is not in the context, say "I don't have that information."
  Always cite the context when answering.'''
)

Retrieval Parameters #

  • topK: Start with 3-5, increase if responses lack context
  • minSimilarityScore: Start with 0.2-0.3, adjust based on quality

Vector Store Choice #

  • Use VectorStore.inMemory for temporary data or prototyping
  • Use VectorStore.sqlite for persistent knowledge bases

Troubleshooting #

Common Issues #

Embeddings fail to create:

  • Ensure embedding model files exist at specified paths
  • Check file permissions
  • Verify model format matches the selected model type

Responses don't use context:

  • Check that documents were successfully added with memorizeChunk/s/ChunkedText
  • Increase topK to retrieve more context
  • Lower minSimilarityScore threshold
  • Improve system instructions to emphasize using context

Out of memory errors:

  • Use VectorStore.sqlite instead of inMemory
  • Reduce chunkSize when processing documents
  • Process large documents in batches
  • Enable largeHeap on Android

Slow inference:

  • Enable GPU acceleration with PreferredBackend.gpu
  • Use smaller language models
  • Reduce maxTokens for shorter outputs
  • Consider using Gemini API embeddings for faster embedding generation

Limitations #

  • Currently Android-only (iOS support planned)
  • Embedding models must be in MediaPipe format
  • SQLite vector store uses basic similarity search (no advanced indexing)

License #

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

Acknowledgments #

This plugin is built on top of:

0
likes
150
points
33
downloads

Publisher

verified publisherkyoheig3.jp

Weekly Downloads

Flutter plugin for on-device AI inference with MediaPipe RAG.

Repository (GitHub)
View/report issues

Documentation

Documentation
API reference

License

BSD-3-Clause (license)

Dependencies

ai_edge, flutter, plugin_platform_interface

More

Packages that depend on ai_edge_rag

Packages that implement ai_edge_rag