text_sight 0.1.1
text_sight: ^0.1.1 copied to clipboard
Live, on-device text recognition — Apple Vision on iOS, ML Kit on Android. The text-scanning sibling to mobile_scanner.
Live, on-device text recognition for Flutter — Apple Vision on iOS, ML Kit on Android. Like
mobile_scanner, but for text instead of barcodes.
- Why text_sight?
- A quick taste
- Platform support
- Install
- The recognition model
- Performance
- Going deeper
Why text_sight? #
Most cross-platform OCR plugins run Google ML Kit on both platforms. That quietly pulls
GoogleMLKit into your iOS build — and with it the arm64 and Swift Package Manager warnings
that have been nagging Flutter iOS builds for a while.
text_sight takes the other road. On iOS it uses Apple Vision, a system framework, so your app
links zero third-party ML libraries there — no GoogleMLKit, no warnings. Android keeps ML Kit,
declared only in its own Gradle file. Nothing recognition-related ever reaches your pubspec.yaml,
so the two platforms can't bleed into each other. Clean, native text scanning on both. That's the
whole idea.
A quick taste #
Point the camera at some text:
final controller = TextSightController();
TextSightView(
controller: controller,
onResult: (capture) => capture.lines.forEach((line) => print(line.text)),
overlayBuilder: (context, capture, constraints) => /* paint line.boundingBox */,
);
await controller.requestCameraPermission(); // prompts via the OS — no permission package needed
await controller.start();
Or read a single still — no camera, no permission:
final capture = await TextSight.recognizeImage(bytes); // or .recognizePath('/photo.jpg')
Either way, boxes come back normalized [0, 1] from the top-left, identical on both platforms, so
your overlay never has to know which engine drew them.
Want a scan-box? Hand the controller a region of interest —
TextSightController(options: TextSightOptions(roi: Rect.fromLTWH(0.1, 0.4, 0.8, 0.2))) — or change
it, the recognition level, or the torch while the session runs. It applies to the live preview and
the one-shot alike.
One Android thing worth knowing up front: the model downloads on first use, so give it a head start when the user opens your scanner — otherwise that first scan comes back empty.
The example/ app is where to look next — a live overlay, torch, region-of-interest,
permission handling, and the one-shot screen, all wired up and ready to crib from.
![]() Android · ML Kit |
![]() iOS · Apple Vision |
Platform support #
| Platform | Minimum | Engine |
|---|---|---|
| iOS | 13.0 | Apple Vision — RecognizeTextRequest (18+) / VNRecognizeTextRequest (13–17) |
| Android | API 24 | ML Kit Text Recognition v2 (Latin) |
A few things worth knowing before you start: iOS supports 13.0+ — recognition uses Apple
Vision's modern Swift RecognizeTextRequest on iOS 18+ and falls back to the legacy
VNRecognizeTextRequest on iOS 13–17 (the same engine, chosen automatically). Android recognizes
Latin script only for now, and live scanning needs a real device — the iOS Simulator has no
camera. The one-shot runs anywhere.
⚠️ iOS 13–16: the live preview and recognition don't follow device rotation. These versions predate
AVCaptureDevice.RotationCoordinator, so live capture isn't rotated to match how the device is held (iOS 17+ is unaffected, and one-shot recognition is fine on every version — it reads the image's own orientation). It's a deliberate, low-maintenance trade-off for a device population we don't expect in practice; if it affects you, please open an issue and a proper rotation fallback will follow.
Install #
flutter pub add text_sight
On iOS, add a camera-usage string to ios/Runner/Info.plist — this is required: iOS terminates
the app the moment the camera is requested without it.
<key>NSCameraUsageDescription</key>
<string>Used to recognize text from the camera.</string>
Then let text_sight drive the runtime prompt: call controller.requestCameraPermission() (or
controller.checkCameraPermission() to gate a priming screen) before controller.start(). It goes
straight to the platform APIs — AVFoundation on iOS, the Android permission flow on Android — so no
permission package is required. Already using
permission_handler or similar? That still works.
Android's manifest already has what it needs.
The recognition model #
On iOS there's nothing to see here — recognition is Apple Vision, a system framework that's always on hand. No download, no waiting.
Android is the interesting one. The ML Kit model ships unbundled by default: it's a tiny ~260 KB and gets pulled from Google Play Services the first time you actually use it. We don't grab it at install time on purpose — most apps don't need OCR the second they launch, so there's no point making everyone pay for it up front. The one catch: a scan you kick off before the model has landed comes back empty.
So give it a nudge when the user wanders into your scanner:
final state = await TextSightModel.ensureReady();
if (state is ModelUnavailable) {
// No Play Services, or the download didn't make it. Tell the user, maybe offer a retry.
}
Call it as often as you like — it returns right away once the model's around (which is always, on iOS). Want a progress bar in front of the user while it downloads? Listen to the readiness stream and switch over it. It's a sealed type, so the compiler makes sure you've handled every case:
TextSightModel.readiness.listen((state) {
final label = switch (state) {
ModelReady() => 'Ready to scan',
ModelDownloading(:final progress) => 'Downloading… ${((progress ?? 0) * 100).round()}%',
ModelUnavailable(:final reason) => 'Model unavailable ($reason)',
};
// ...show `label`, or feed `progress` straight into a progress indicator
});
The example/ live scanner does exactly this — ensureReady() to gate, the stream for a
real download bar.
Or just bundle it #
Don't fancy any of that? Ship the model inside your APK — instant, offline, Play Services out of the
picture. One line in your app's android/gradle.properties:
com.lahaluhem.text_sight.useBundled=true
Now ensureReady() returns immediately and ModelUnavailable never shows up. You're trading size
for it, mind:
| Mode | App size | First use | Offline | Needs Play Services |
|---|---|---|---|---|
| Unbundled (default) | ~260 KB | downloads on demand | after first download | yes |
| Bundled | ~4 MB/script/arch | instant | yes | no |
Performance #
Recognition results cross from native to Dart as a small per-frame map over an EventChannel.
Decoding it on the UI isolate costs microseconds — even a dense ~127-line frame is ~55 µs, well
under 1% of a 60 fps budget. The native engine's inference, not the transport, sets the pace.

These measure the pure-Dart codec only — not native inference or end-to-end latency, which dominate.
Leaner transports (list, Pigeon, packed-binary) win big in percent but stay tiny in absolute µs,
so the self-describing Map stays. Full methodology and numbers: benchmark/.
Going deeper #
How it all fits together — coordinate handling, the per-line confidence contract, how region-of-interest differs across platforms, and what's next — lives in APPENDIX.md.


