liquid_ai

Run powerful on-device AI models in your Flutter apps with the LEAP SDK. Supports text generation, streaming chat, structured JSON output, function calling, and vision models - all running locally on iOS and Android.

Features

On-Device Inference - Run AI models locally without internet connectivity
Streaming Responses - Real-time token-by-token text generation
Structured Output - Constrain model output to JSON schemas with automatic validation
Function Calling - Define tools the model can invoke with typed parameters
Vision Models - Analyze images with multimodal vision-language models
Model Catalog - Browse and filter 20+ optimized models for different tasks
Progress Tracking - Monitor download and loading progress with detailed events
Resource Management - Efficient memory handling with explicit lifecycle control

Platform Support

Platform	Supported	Notes
iOS	Yes	iOS 17.0+, SPM (default) or CocoaPods
Android	Yes	API 31+ (Android 12)
macOS	No	Not yet supported
Web	No	Native inference only

Quick Start

Installation

Add liquid_ai to your pubspec.yaml:

dependencies:
  liquid_ai: ^1.2.0

iOS Setup

Swift Package Manager (default, recommended):

SPM is enabled by default in Flutter 3.24+. No additional setup required.

CocoaPods (alternative):

If you need to use CocoaPods, add the LEAP SDK git source to your ios/Podfile:

target 'Runner' do
  # Add LEAP SDK from git (required for v0.9.x)
  pod 'Leap-SDK', :git => 'https://github.com/Liquid4All/leap-ios.git', :tag => 'v0.9.2'
  pod 'Leap-Model-Downloader', :git => 'https://github.com/Liquid4All/leap-ios.git', :tag => 'v0.9.2'

  # ... rest of your Podfile
end

Then disable SPM in your pubspec.yaml:

flutter:
  config:
    enable-swift-package-manager: false

Basic Usage

import 'package:liquid_ai/liquid_ai.dart';

// Initialize the SDK
final liquidAi = LiquidAi();

// Find a model from the catalog
final model = ModelCatalog.findBySlug('LFM2.5-1.2B-Instruct')!;
const quantization = ModelQuantization.q4KM;

// Load the model (downloads if needed)
ModelRunner? runner;
await for (final event in liquidAi.loadModel(model.slug, quantization.slug)) {
  if (event is LoadCompleteEvent) {
    runner = event.runner;
  }
}

// Create a conversation and generate text
final conversation = await runner!.createConversation(
  systemPrompt: 'You are a helpful assistant.',
);
final response = await conversation.generateText('Hello!');
print(response);

// Clean up
await conversation.dispose();
await runner.dispose();

Model Loading

Models are downloaded automatically on first use and cached locally. Track progress with load events:

// Use the catalog and enums for type safety
final model = ModelCatalog.findBySlug('LFM2.5-1.2B-Instruct')!;
const quantization = ModelQuantization.q4KM;

// Or use the model's default quantization
final defaultQuant = model.defaultQuantization;

await for (final event in liquidAi.loadModel(model.slug, quantization.slug)) {
  switch (event) {
    case LoadStartedEvent():
      print('Starting download...');
    case LoadProgressEvent(:final progress):
      print('${(progress.progress * 100).toStringAsFixed(1)}%');
      if (progress.speed != null) {
        print('Speed: ${progress.speed! ~/ 1024} KB/s');
      }
    case LoadCompleteEvent(:final runner):
      print('Ready!');
      // Use runner to create conversations
    case LoadErrorEvent(:final error):
      print('Failed: $error');
    case LoadCancelledEvent():
      print('Cancelled');
  }
}

Load Options

Configure the inference engine when loading models:

await for (final event in liquidAi.loadModel(
  model.slug,
  quantization.slug,
  options: LoadOptions(
    contextSize: 4096,   // Maximum context window
    batchSize: 512,      // Batch size for prompt processing
    threads: 4,          // Number of CPU threads
    gpuLayers: 32,       // Layers to offload to GPU (if available)
  ),
)) {
  // Handle events...
}

Load from Local File

Load a model directly from a file path (useful for custom or bundled models):

await for (final event in liquidAi.loadModelFromPath(
  '/path/to/model.gguf',
  options: LoadOptions(contextSize: 2048),
)) {
  if (event is LoadCompleteEvent) {
    runner = event.runner;
  }
}

Model Status

// Check if already downloaded
final downloaded = await liquidAi.isModelDownloaded(model.slug, quantization.slug);

// Get detailed status
final status = await liquidAi.getModelStatus(model.slug, quantization.slug);

// Delete to free storage
await liquidAi.deleteModel(model.slug, quantization.slug);

Download from URL (Hugging Face Support)

Download models directly from any URL, including Hugging Face:

await for (final event in liquidAi.downloadModelFromUrl(
  url: 'https://huggingface.co/user/model/resolve/main/model.gguf?download=true',
  modelId: 'my-custom-model',
  quantization: 'Q4_K_M', // Optional, defaults to 'custom'
)) {
  switch (event) {
    case DownloadProgressEvent(:final progress):
      print('${(progress.progress * 100).toStringAsFixed(1)}%');
    case DownloadCompleteEvent():
      print('Download complete!');
    case DownloadErrorEvent(:final error):
      print('Error: $error');
    default:
      break;
  }
}

// Then load the downloaded model
await for (final event in liquidAi.loadModel('my-custom-model', 'Q4_K_M')) {
  // Handle load events...
}

Cache Management

List and manage cached models:

// Check if a specific model is cached (useful for URL-downloaded models)
final isCached = await liquidAi.isModelCached('my-custom-model');
if (!isCached) {
  // Download the model...
}

// List all cached models
final cachedModels = await liquidAi.getCachedModels();
for (final manifest in cachedModels) {
  print('${manifest.modelSlug} (${manifest.quantizationSlug})');
  print('  Path: ${manifest.localModelPath}');
}

// Delete all cached models to free storage
await liquidAi.deleteAllModels();

Model Manifest

When a model is loaded, you can access extended metadata through the ModelManifest:

await for (final event in liquidAi.loadModel(model.slug, quantization.slug)) {
  if (event is LoadCompleteEvent) {
    final manifest = event.manifest;
    if (manifest != null) {
      print('Model: ${manifest.modelSlug}');
      print('Quantization: ${manifest.quantizationSlug}');
      print('Path: ${manifest.localModelPath}');

      // Access via runner as well
      print('Runner manifest: ${event.runner.manifest}');
    }
  }
}

Text Generation

Simple Generation

final response = await conversation.generateText('What is the capital of France?');
print(response); // "The capital of France is Paris."

Streaming Generation

Stream tokens as they're generated for real-time display:

final message = ChatMessage.user('Tell me a story.');

await for (final event in conversation.generateResponse(message)) {
  switch (event) {
    case GenerationChunkEvent(:final chunk):
      stdout.write(chunk); // Print token immediately
    case GenerationCompleteEvent(:final stats):
      print('\n${stats?.tokensPerSecond?.toStringAsFixed(1)} tokens/sec');
    case GenerationErrorEvent(:final error):
      print('Error: $error');
    default:
      break;
  }
}

Generation Options

Fine-tune generation with sampling parameters:

final options = GenerationOptions(
  temperature: 0.7,    // Creativity (0.0-2.0)
  topP: 0.9,           // Nucleus sampling
  topK: 40,            // Top-K sampling
  maxTokens: 256,      // Maximum output length
);

final response = await conversation.generateText(
  'Write a haiku.',
  options: options,
);

Structured Output

Generate JSON that conforms to a schema with automatic validation:

// Define the expected output structure
final recipeSchema = JsonSchema.object('A cooking recipe')
    .addString('name', 'The recipe name')
    .addArray('ingredients', 'List of ingredients',
        items: StringProperty(description: 'An ingredient'))
    .addInt('prepTime', 'Preparation time in minutes', minimum: 1)
    .addInt('cookTime', 'Cooking time in minutes', minimum: 0)
    .addObject('nutrition', 'Nutritional information',
        configureNested: (b) => b
            .addInt('calories', 'Calories per serving')
            .addNumber('protein', 'Protein in grams'))
    .build();

// Generate structured output
final message = ChatMessage.user('Give me a recipe for chocolate chip cookies.');

await for (final event in conversation.generateStructured(
  message,
  schema: recipeSchema,
  fromJson: Recipe.fromJson,
)) {
  switch (event) {
    case StructuredProgressEvent(:final tokenCount):
      print('Generating... ($tokenCount tokens)');
    case StructuredCompleteEvent<Recipe>(:final result):
      print('Recipe: ${result.name}');
      print('Ingredients: ${result.ingredients.join(", ")}');
      print('Calories: ${result.nutrition.calories}');
    case StructuredErrorEvent(:final error, :final rawResponse):
      print('Failed: $error');
  }
}

Schema Types

The schema builder supports these property types:

Method	JSON Type	Options
`addString`	`string`	`enumValues`, `minLength`, `maxLength`
`addInt`	`integer`	`minimum`, `maximum`
`addNumber`	`number`	`minimum`, `maximum`
`addBool`	`boolean`	-
`addArray`	`array`	`items`, `minItems`, `maxItems`
`addObject`	`object`	`configureNested`

Function Calling

Define tools the model can invoke to extend its capabilities:

// Define a function with typed parameters
final searchFunction = LeapFunction.withSchema(
  name: 'search_web',
  description: 'Search the web for current information',
  schema: JsonSchema.object('Search parameters')
      .addString('query', 'The search query')
      .addInt('limit', 'Maximum results', required: false, minimum: 1, maximum: 10)
      .build(),
);

// Register with the conversation
await conversation.registerFunction(searchFunction);

// Handle function calls during generation
await for (final event in conversation.generateResponse(message)) {
  switch (event) {
    case GenerationFunctionCallEvent(:final functionCalls):
      for (final call in functionCalls) {
        print('Calling ${call.name} with ${call.arguments}');

        // Execute your function
        final result = await executeSearch(call.arguments);

        // Return the result to continue generation
        await conversation.provideFunctionResult(
          LeapFunctionResult(callId: call.id, result: result),
        );
      }
    case GenerationChunkEvent(:final chunk):
      stdout.write(chunk);
    default:
      break;
  }
}

Vision Models

Analyze images with multimodal vision-language models:

// Load a vision model from the catalog
final visionModel = ModelCatalog.findBySlug('LFM2.5-VL-1.6B')!;

await for (final event in liquidAi.loadModel(
  visionModel.slug,
  visionModel.defaultQuantization.slug, // Q8_0 for vision models
)) {
  if (event is LoadCompleteEvent) {
    runner = event.runner;
  }
}

// Create a conversation and send an image
final conversation = await runner.createConversation();

// Load image as JPEG bytes
final imageBytes = await File('photo.jpg').readAsBytes();

final message = ChatMessage(
  role: ChatMessageRole.user,
  content: [
    ImageContent(data: imageBytes),
    TextContent(text: 'Describe what you see in this image.'),
  ],
);

await for (final event in conversation.generateResponse(message)) {
  if (event is GenerationChunkEvent) {
    stdout.write(event.chunk);
  }
}

Model Catalog

Browse available models programmatically:

// All available (non-deprecated) models
final models = ModelCatalog.available;

// Filter by capability
final visionModels = ModelCatalog.visionModels;
final reasoningModels = ModelCatalog.byTask(ModelTask.reasoning);
final japaneseModels = ModelCatalog.byLanguage('ja');

// Find a specific model
final model = ModelCatalog.findBySlug('LFM2.5-1.2B-Instruct');
if (model != null) {
  print('${model.name} - ${model.parameters} parameters');
  print('Context: ${model.contextLength} tokens');

  // Access available quantizations
  for (final quant in model.quantizations) {
    print('  ${quant.quantization.name}: ${quant.slug}');
  }

  // Get the recommended default quantization
  print('Default: ${model.defaultQuantization.slug}');
}

Available Models

Model	Parameters	Task	Modalities
LFM2.5-1.2B-Instruct	1.2B	General	Text
LFM2.5-1.2B-Thinking	1.2B	Reasoning	Text
LFM2.5-VL-1.6B	1.6B	General	Text, Image
LFM2-2.6B	2.6B	General	Text
LFM2-2.6B-Exp	2.6B	Reasoning	Text
LFM2-VL-3B	3B	General	Text, Image
LFM2-350M	350M	General	Text
LFM2-700M	700M	General	Text

See ModelCatalog.all for the complete list including specialized models for extraction, translation, and summarization.

Quantization Options

Models are available in multiple quantization levels via the ModelQuantization enum:

Enum	Slug	Size	Quality	Use Case
`ModelQuantization.q4_0`	`Q4_0`	Smallest	Good	Mobile devices, fast inference
`ModelQuantization.q4KM`	`Q4_K_M`	Small	Better	Balanced quality and size
`ModelQuantization.q5KM`	`Q5_K_M`	Medium	High	Quality-focused applications
`ModelQuantization.q8_0`	`Q8_0`	Large	Highest	Maximum quality
`ModelQuantization.f16`	`F16`	Largest	Reference	Vision models only

Error Handling

Handle errors gracefully with typed exceptions:

try {
  final response = await conversation.generateText('...');
} on LiquidAiException catch (e) {
  print('SDK error: ${e.message}');
} on StateError catch (e) {
  print('Invalid state: ${e.message}'); // e.g., disposed conversation
}

Common error scenarios:

Model not found - Invalid model slug or quantization
Download failed - Network issues during model download
Out of memory - Model too large for device
Context exceeded - Conversation history too long
Generation cancelled - User or timeout cancellation

Conversation Management

System Prompts

Set context for the conversation:

final conversation = await runner.createConversation(
  systemPrompt: 'You are a helpful coding assistant. Respond concisely.',
);

Conversation History

Access and restore conversation state:

// Get current history
final history = await conversation.getHistory();

// Export conversation
final json = await conversation.export();

// Create from existing history
final restored = await runner.createConversationFromHistory(history);

Clear History

Reset the conversation while keeping it active:

// Clear history but keep the system prompt
await conversation.clearHistory();

// Clear everything including system prompt
await conversation.clearHistory(keepSystemPrompt: false);

Fork Conversations

Create independent copies for exploring different conversation branches:

// Create a checkpoint before trying something
final checkpoint = await conversation.fork();

// Try something in the original conversation
await conversation.generateText('Tell me about quantum physics');

// Use the checkpoint to explore a different path
await checkpoint.generateText('Tell me about biology');

// Both conversations now have different histories
// Don't forget to dispose the forked conversation when done
await checkpoint.dispose();

Token Counting

Monitor context usage (iOS only):

final tokens = await conversation.getTokenCount();
if (tokens > 4000) {
  print('Warning: Approaching context limit');
}

Resource Management

Basic Cleanup

Always dispose of resources when done:

// Dispose in reverse order of creation
await conversation.dispose();
await runner.dispose();

// Or use try/finally
try {
  final conversation = await runner.createConversation();
  // Use conversation...
} finally {
  await conversation.dispose();
}

ModelManager for Single-Model Apps

For apps that load only one model at a time, use ModelManager to automatically manage model lifecycle:

final manager = ModelManager.instance;

// Load a model (automatically unloads any previous model)
final runner = await manager.loadModelAsync('LFM2.5-1.2B-Instruct', 'Q4_K_M');

// Check what's loaded
print('Loaded: ${manager.currentModelSlug}');
print('Has model: ${manager.hasLoadedModel}');

// Load a different model (previous one is automatically unloaded first)
final newRunner = await manager.loadModelAsync('LFM2-2.6B', 'Q4_K_M');

// Explicitly unload when done
await manager.unloadCurrentModel();

Hot-Reload Recovery

During Flutter hot-reload, Dart state is reset but native state persists. Use syncWithNative() to recover the loaded model state:

// In your app initialization
Future<void> initializeApp() async {
  final manager = ModelManager.instance;

  // Sync Dart state with native state
  final wasModelLoaded = await manager.syncWithNative();

  if (wasModelLoaded) {
    print('Recovered loaded model: ${manager.currentModelSlug}');
    // The runner is available at manager.currentRunner
  }
}

This is especially important for state management solutions like Provider:

class AppState extends ChangeNotifier {
  final _modelManager = ModelManager.instance;

  Future<void> initialize() async {
    // Recover model state after hot-reload
    await _modelManager.syncWithNative();

    if (_modelManager.hasLoadedModel) {
      // Update UI state to reflect loaded model
      notifyListeners();
    }
  }
}

Loading Models from Local Paths

Load models from custom locations (useful for bundled or custom models):

// Using ModelManager
final runner = await ModelManager.instance.loadModelFromPathAsync(
  '/path/to/model.gguf',
  options: LoadOptions(contextSize: 2048),
);

// Check if loaded from path
if (ModelManager.instance.isCurrentModelPathLoaded) {
  print('Model path: ${ModelManager.instance.currentPath}');
}

API Reference

For complete API documentation, see the API Reference.

Key classes:

LiquidAi - Main entry point for model management
ModelRunner - A loaded model ready for inference
ModelManager - Singleton for single-model lifecycle management
ModelManifest - Extended metadata for loaded models
Conversation - Chat session with history
JsonSchema - Schema builder for structured output
LeapFunction - Function definition for tool use
ModelCatalog - Model discovery and filtering

Examples

For a comprehensive example covering all features, see example/example.dart.

The example/ directory also contains a full Flutter demo app demonstrating:

Model selection and downloading
Chat interface with streaming
Structured output demos
Function calling examples
Settings and configuration

Contributing

Contributions are welcome! Please read our contributing guidelines before submitting a pull request.

License

MIT License - see the LICENSE file for details.