genkit_flutter_gemma
Genkit Dart plugin for flutter_gemma — local on-device AI inference via Google Gemma and other supported models.
Features
- Wraps
flutter_gemmaas a Genkit model provider - Supports text generation (blocking and streaming)
- Embeddings via
FlutterGemmaEmbedder - Multimodal input (images, audio) — supports
data:URIs,file://paths, andhttp(s)://URLs - Function calling / tool use with
toolChoicecontrol (auto,required,none) - Parallel tool calls — multiple function calls in a single model response
- Thinking mode (Gemma 4, DeepSeek)
- Generation latency tracking via
latencyMsin responses - Configurable via
@Schema()-annotated options
Supported Model Architectures
| Architecture | ModelType | Notes |
|---|---|---|
| Gemma 3 / Gemma 4 IT | ModelType.gemmaIt |
Default; multimodal (image, audio); thinking mode for Gemma 4 |
| DeepSeek | ModelType.deepSeek |
Thinking mode |
| Qwen / Qwen3 | ModelType.qwen / ModelType.qwen3 |
Qwen3 supports thinking mode |
| Llama | ModelType.llama |
|
| Phi | ModelType.phi |
Phi-4 |
| FunctionGemma | ModelType.functionGemma |
Specialized function calling |
Quick Start
import 'package:flutter_gemma/flutter_gemma.dart';
import 'package:genkit/genkit.dart';
import 'package:genkit_flutter_gemma/genkit_flutter_gemma.dart';
// Initialize and install model (host app responsibility)
await FlutterGemma.initialize();
await FlutterGemma.installModel(modelType: ModelType.gemmaIt)
.fromAsset('assets/gemma-3-1b-it-int4.task')
.install();
// Create Genkit with plugin
final ai = Genkit(plugins: [
GenkitFlutterGemmaPlugin(
models: [
FlutterGemmaModelConfig(
name: 'gemma-3-nano',
modelType: ModelType.gemmaIt,
),
],
embedders: [
FlutterGemmaEmbedderConfig(name: 'embedding-gemma-300m'),
],
),
]);
// Generate
final response = await ai.generate(
model: flutterGemma.model('gemma-3-nano'),
prompt: 'Hello!',
);
print(response.text);
Configuration
Pass FlutterGemmaModelOptions to customize inference:
final response = await ai.generate(
model: flutterGemma.model('gemma-3-nano'),
prompt: 'Hello!',
config: FlutterGemmaModelOptions(
maxTokens: 2048,
temperature: 0.5,
topK: 40,
supportImage: true,
),
);
| Option | Type | Default | Description |
|---|---|---|---|
maxTokens |
int? |
1024 | Maximum tokens to generate |
temperature |
double? |
0.8 | Sampling temperature |
topK |
int? |
1 | Top-K sampling |
topP |
double? |
null | Top-P (nucleus) sampling |
supportImage |
bool? |
false | Enable multimodal image input |
supportAudio |
bool? |
false | Enable audio input (Gemma 3n) |
isThinking |
bool? |
false | Enable thinking mode (Gemma 4, DeepSeek) |
randomSeed |
int? |
1 | Random seed for deterministic output |
toolChoice |
String? |
'auto' |
Tool calling mode: 'auto', 'required', 'none' |
systemInstruction |
String? |
null | System-level instruction (overrides system-role messages) |
maxFunctionBufferLength |
int? |
null | Max token buffer for streaming tool-call arguments (increase for large payloads) |
enableSpeculativeDecoding |
bool? |
null | MTP speculative decoding for Gemma 4 E2B/E4B (null = model default, true/false = force on/off) |
Streaming
final stream = ai.generateStream(
model: flutterGemma.model('gemma-3-nano'),
prompt: 'Write a story.',
);
await for (final chunk in stream) {
stdout.write(chunk.text);
}
Tool Use
final response = await ai.generate(
model: flutterGemma.model('gemma-3-nano'),
prompt: 'What is the weather in Paris?',
tools: [weatherTool],
);
Embeddings
// Install embedding model + tokenizer (host app responsibility)
await FlutterGemma.installEmbedder()
.modelFromNetwork('https://huggingface.co/.../embeddinggemma-300M.tflite')
.tokenizerFromNetwork('https://huggingface.co/.../sentencepiece.model')
.install();
// Generate embeddings
final embeddings = await ai.embed(
embedder: flutterGemma.embedder('embedding-gemma-300m'),
documents: [
DocumentData(content: [TextPart(text: 'Flutter is a UI toolkit.')]),
DocumentData(content: [TextPart(text: 'Dart is a programming language.')]),
],
);
for (final embedding in embeddings) {
print('Vector (${embedding.embedding.length} dims): '
'${embedding.embedding.take(5)}...');
}
Known Limitations
- Model installation: The plugin does NOT manage model installation. The host app must install models via
FlutterGemma.installModel()and embedders viaFlutterGemma.installEmbedder()before using the plugin. - System role: System messages are passed natively via
createChat(systemInstruction:)(requires flutter_gemma ^0.13.0). Only text content is supported in system messages. - Thinking mode: Requires
.litertlmmodel format. Supported on Android, iOS, and Desktop. Not supported on Web.
Libraries
- genkit_flutter_gemma
- Genkit Dart plugin for flutter_gemma — local on-device AI inference.