romanize 0.0.3
romanize: ^0.0.3 copied to clipboard
A lightweight Dart package for converting Arabic, Chinese, Cyrillic, Hebrew, Japanese and Korean text to Romanized form with auto-detection support.
A powerful Dart package for seamlessly converting multilingual text into its Romanized form.
Features #
- π Multi-language support: Korean, Japanese, Chinese, Cyrillic, Hebrew and Arabic
- π Auto-detection: Automatically detects the languages present in the input text
- π οΈ Flexible & extensible: Easily create your own custom romanizer for any language or writing system
- π¦ Lightweight: Minimal dependencies, fast performance
Installation #
Add romanize to your pubspec.yaml dependencies:
dart pub add romanize
dart pub get
Usage #
Import the package:
import 'package:romanize/romanize.dart';
Romanize Text #
The romanize method automatically detects and romanizes each word separately, making it perfect for multi-language text:
// Multi-language text - each word is detected and romanized independently
final text = 'δ½ ε₯½ Hello μλ
';
final romanized = TextRomanizer.romanize(text);
print(romanized); // ni hao Hello annyeong
// Single language text also works
final koreanText = 'μλ
νμΈμ';
final koreanRomanized = TextRomanizer.romanize(koreanText);
print(koreanRomanized); // annyeonghaseyo
It will fail to detect multiple languagues if they are not separated by spaces.
Detect Language #
Detect the first language present in the text:
final romanizer = TextRomanizer.detectLanguage('μλ
νμΈμ');
print(romanizer.language); // korean
Or detect all languages present in the text:
final romanizers = TextRomanizer.detectLanguages('μλ
Hello δ½ ε₯½ ΠΡΠΈΠ²Π΅Ρ ΠΌΠΈΡ');
print(romanizers.map((r) => r.language)); // {korean, chinese, cyrillic}
Specify Language #
When using TextRomanizer.romanize, the language detection may not always be accurate. In such cases, you can specify the language directly:
final japaneseText = 'γγγ«γ‘γ―';
final japaneseRomanizer = TextRomanizer.forLanguage('japanese');
print(japaneseRomanizer.romanize(japaneseText)); // konnichiwa
Or you can instantiate the romanizer directly:
final chineseText = 'δ½ ε₯½';
final chineseRomanizer = ChineseRomanizer(toneAnnotation: ToneAnnotation.mark);
print(chineseRomanizer.romanize(chineseText)); // nΗ hΗo
Some romanizers have additional options. For example, the ChineseRomanizer has the toneAnnotation option to specify the tone annotation to use.
Load resources #
Pre initialize the resources:
await TextRomanizer.ensureInitialized();
This initializes all the necessary resources, such as the Japanese and Chinese dictionaries. This operation is expensive and should be done, preferably, on another isolate. On the web platform, prefer server side initialization.
Supported Languages #
- Korean (νκ΅μ΄)
- Japanese (ζ₯ζ¬θͺ) - Using
kuromojifor Kanji conversion andkana_kitfor Kana and Katakana conversion - Chinese (δΈζ) - Using
pinyinfor Pinyin conversion (Simplified and Traditional) - Cyrillic (ΠΠΈΡΠΈΠ»Π»ΠΈΡΠ°) - Custom transliteration for Russian, Ukrainian, Serbian, and more
- Arabic (Ψ§ΩΨΉΨ±Ψ¨ΩΨ©) - Custom transliteration based on ISO 233 and DIN 31635
- Hebrew (Χ’ΧΧ¨ΧΧͺ) - Custom transliteration based on ISO 259-2
API Reference #
TextRomanizer #
Main class for romanizing text.
Static Methods
ensureInitialized()- Ensures that all resources are loaded and initialized.romanize(String input)- Processes each word separately, auto-detecting and romanizing each word. Perfect for multi-language text.detectLanguage(String input)- Detects the first matching language and returns the correspondingRomanizer. ReturnsEmptyRomanizerif no match is found.detectLanguages(String input)- Detects all matching languages and returns aSet<Romanizer>. Returns empty set if no matches are found.forLanguage(String language)- Returns aRomanizerfor the specified language. ThrowsUnimplementedErrorif not found.forLanguageOrNull(String? language)- Returns aRomanizer?for the specified language, ornullif not found.supportedLanguages- Returns a list of all supported language names.
Romanizer #
Interface for language-specific romanizers.
language- The language name (e.g., 'korean', 'japanese', 'arabic')isValid(String input)- Checks if the input is valid for this romanizerromanize(String input)- Converts the input to Romanized form
Example #
See the example directory for a complete example.
Contributing #
Contributions are welcome! Please feel free to submit a Pull Request.
Creating a Custom Romanizer #
To create a custom romanizer for a new language or writing system, you can extend the Romanizer class and implement the romanize and isValid methods.
class EmojiRomanizer extends Romanizer {
const EmojiRomanizer() : super(language: 'emoji');
static const Map<String, String> _transliterationMap = {
'π': 'wave',
'π': 'earth',
'π': 'rocket',
'π': 'party',
};
@override
bool isValid(String input) {
return RegExp(r'[\uD800-\uDBFF][\uDC00-\uDFFF]').hasMatch(input);
}
@override
String romanize(String input) {
final buffer = StringBuffer();
for (final char in input.runes) {
final charString = String.fromCharCode(char);
if (isValid(charString)) {
if (_transliterationMap.containsKey(charString)) {
buffer.write(':${_transliterationMap[charString]}:');
} else {
buffer.write(':$charString:');
}
} else {
buffer.write(charString);
}
}
return buffer.toString();
}
}
Then you can use your custom romanizer like this:
final emojiText = 'π π π π π';
final emojiOutput = EmojiRomanizer().romanize(emojiText);
print('Emoji Romanization: \n$emojiOutput'); // :wave: :earth: :rocket: :party: :π:
Benchmarking #
Add your custom romanizer to the benchmark suite in benchmark/romanize_benchmark.dart and run the benchmarks. To run benchmarks, use the following command:
dart run benchmark_harness:bench --flavor aot --target=benchmark/romanize_benchmark.dart
The results will be logged to the console.
KoreanRomanize(RunTime): 149.55134011433663 us.
JapaneseRomanize(RunTime): 3528.963286713287 us.
ChineseRomanize(RunTime): 6650.877133105802 us.
CyrillicRomanize(RunTime): 332.25094868833526 us.
ArabicRomanize(RunTime): 222.99420225220203 us.
HebrewRomanize(RunTime): 548.14425 us.
MultiLanguageRomanize(RunTime): 2852.415 us.
LanguageDetection(RunTime): 10.955939698271358 us.
DirectRomanizer(RunTime): 10337.76 us.
LongTextRomanize(RunTime): 24233.97619047619 us.
StressTestRomanize(RunTime): 18138.834782608697 us.