llm_ollama 0.1.9
llm_ollama: ^0.1.9 copied to clipboard
Ollama backend implementation for LLM interactions. Provides streaming chat, embeddings, tool calling, vision support, and model management.
llm_ollama #
Ollama backend implementation for LLM interactions in Dart.
Available on pub.dev.
Features #
- Streaming chat responses
- Tool/function calling
- Vision (image) support
- Embeddings
- Thinking mode support
- Model management (list, pull, show, version)
Streaming Reliability Guarantees #
- NDJSON chunk-boundary-safe parsing: stream frames are buffered across transport chunks and parsed only when newline-terminated
- Bounded malformed-line retries: parser tolerates up to 3 consecutive malformed non-empty lines
- Explicit failure on persistent corruption: parser throws
LLMApiExceptionafter retry budget exhaustion (no silent infinite dropping)
Installation #
dependencies:
llm_ollama: ^0.1.5
Prerequisites #
You need Ollama running locally or on a server. Install from ollama.com.
# Pull a model
ollama pull qwen3:0.6b
Usage #
Basic Chat #
import 'package:llm_ollama/llm_ollama.dart';
final repo = OllamaChatRepository(baseUrl: 'http://localhost:11434');
final stream = repo.streamChat('qwen3:0.6b', messages: [
LLMMessage(role: LLMRole.user, content: 'Hello!'),
]);
await for (final chunk in stream) {
print(chunk.message?.content ?? '');
}
With Thinking Mode #
final stream = repo.streamChat('qwen3:0.6b',
messages: messages,
think: true, // Enable thinking mode
);
await for (final chunk in stream) {
if (chunk.message?.thinking != null) {
print('Thinking: ${chunk.message!.thinking}');
}
print(chunk.message?.content ?? '');
}
Tool Calling #
final stream = repo.streamChat('qwen3:0.6b',
messages: messages,
tools: [MyTool()],
);
Vision #
import 'dart:convert';
import 'dart:io';
final imageBytes = await File('image.png').readAsBytes();
final base64Image = base64Encode(imageBytes);
final stream = repo.streamChat('llama3.2-vision:11b', messages: [
LLMMessage(
role: LLMRole.user,
content: 'What is in this image?',
images: [base64Image],
),
]);
Embeddings #
final embeddings = await repo.embed(
model: 'nomic-embed-text',
messages: ['Hello world', 'Goodbye world'],
);
Non-Streaming Response #
Get a complete response without streaming:
final response = await repo.chatResponse('qwen3:0.6b', messages: [
LLMMessage(role: LLMRole.user, content: 'Hello!'),
]);
print(response.content);
print('Tokens: ${response.evalCount}');
Using StreamChatOptions #
Encapsulate all options in a single object:
import 'package:llm_core/llm_core.dart';
final options = StreamChatOptions(
think: true,
tools: [MyTool()],
toolAttempts: 5,
timeout: Duration(minutes: 5),
retryConfig: RetryConfig(maxAttempts: 3),
);
final stream = repo.streamChat('qwen3:0.6b', messages: messages, options: options);
Python-Style Parity Options #
llm_ollama defaults to automatic tool execution for backward compatibility.
If you want ollama-python style manual tool loops, disable auto-execution and
consume toolCalls directly from streamed chunks.
final stream = repo.streamChat(
'qwen3:0.6b',
messages: messages,
tools: [MyTool()],
options: const StreamChatOptions(
autoExecuteTools: false, // Manual tool loop
),
);
await for (final chunk in stream) {
final toolCalls = chunk.message?.toolCalls ?? const [];
if (toolCalls.isNotEmpty) {
// Execute tools yourself, append tool messages, then call streamChat again.
}
}
You can also pass Ollama chat fields supported by ollama-python:
final stream = repo.streamChat(
'qwen3:0.6b',
messages: messages,
options: const StreamChatOptions(
backendOptions: {
'format': 'json',
'options': {'temperature': 0},
'keep_alive': '5m', // or keepAlive
},
),
);
Model Management #
final ollamaRepo = OllamaRepository(baseUrl: 'http://localhost:11434');
// List models
final models = await ollamaRepo.models();
// Show model info
final info = await ollamaRepo.showModel('qwen3:0.6b');
// Pull a model
await for (final progress in ollamaRepo.pullModel('qwen3:0.6b')) {
print('${progress.status}: ${progress.progress * 100}%');
}
// Get version
final version = await ollamaRepo.version();
Advanced Configuration #
Builder Pattern #
Use the builder for complex configurations:
import 'package:llm_core/llm_core.dart';
final repo = OllamaChatRepository.builder()
.baseUrl('http://localhost:11434')
.maxToolAttempts(10)
.retryConfig(RetryConfig(
maxAttempts: 5,
initialDelay: Duration(seconds: 1),
maxDelay: Duration(seconds: 30),
))
.timeoutConfig(TimeoutConfig(
connectionTimeout: Duration(seconds: 10),
readTimeout: Duration(minutes: 3),
totalTimeout: Duration(minutes: 10),
))
.build();
Retry Configuration #
Configure automatic retries for failed requests:
import 'package:llm_core/llm_core.dart';
final repo = OllamaChatRepository(
baseUrl: 'http://localhost:11434',
retryConfig: RetryConfig(
maxAttempts: 3,
initialDelay: Duration(seconds: 1),
maxDelay: Duration(seconds: 30),
retryableStatusCodes: [429, 500, 502, 503, 504],
),
);
Timeout Configuration #
Configure timeouts for different scenarios:
import 'package:llm_core/llm_core.dart';
final repo = OllamaChatRepository(
baseUrl: 'http://localhost:11434',
timeoutConfig: TimeoutConfig(
connectionTimeout: Duration(seconds: 10),
readTimeout: Duration(minutes: 2),
totalTimeout: Duration(minutes: 10),
readTimeoutForLargePayloads: Duration(minutes: 5), // For large images
),
);