word_count 1.0.4 copy "word_count: ^1.0.4" to clipboard
word_count: ^1.0.4 copied to clipboard

Fastest multi-language word counting library (8.0 million words/s) for Dart that supports 85+ languages.

word_count #

Fastest multi-language word counting library (8.0 million words/s) for Dart that supports 85+ languages including CJK (Chinese, Japanese, Korean), European, South Asian, African, and Middle Eastern languages.

Features #

  • Multi-language support: Count words in 85+ languages with proper handling for different writing systems
  • CJK language support: Proper tokenization for Chinese, Japanese, and Korean text
  • Configurable punctuation: Choose to treat punctuation as word separators or remove them entirely
  • Custom punctuation: Add your own punctuation characters or disable defaults
  • Multiple output formats: Get word count, word array, or both
  • Comprehensive testing: 118+ tests covering all supported languages and edge cases

Performance #

Method Words/s GB/s Relative Speed
New Implementation (Bitmap) 8.0M 0.044 1.0x
Old Implementation (RegExp) 0.5M 0.003 0.06x
Regex 9.7M 0.053 1.21x
Split 20.9M 0.115 2.61x
StringStats 2.2M 0.012 0.27x

Installation #

Add this to your package's pubspec.yaml file:

dependencies:
  word_count: ^1.0.0

Then run:

dart pub get

Usage #

Basic Usage #

import 'package:word_count/word_count.dart';

// Count words (returns int)
int count = wordsCount('Hello World'); // 2
int chineseCount = wordsCount('你好,世界'); // 4
int mixedCount = wordsCount('Hello, 你好。'); // 3

// Get word array (returns List<String>)
List<String> words = wordsSplit('Hello World'); // ['Hello', 'World']

// Get both count and words (returns WordCountResult)
WordCountResult result = wordsDetect('Hello World');
print('Count: ${result.count}'); // Count: 2
print('Words: ${result.words}'); // Words: [Hello, World]

Configuration Options #

// Treat punctuation as word separators instead of removing them
const config1 = WordCountConfig(punctuationAsBreaker: true);
wordsCount("don't", config1); // 2 words: ['don', 't']

// Add custom punctuation
const config2 = WordCountConfig(punctuation: ['-']);
wordsCount('multi-word', config2); // 1 word: ['multiword']

// Use only custom punctuation (disable defaults)
const config3 = WordCountConfig(
  punctuation: ['-'], 
  disableDefaultPunctuation: true
);
wordsCount("multi-word text!", config3); // 2 words: ['multiword', 'text!']

Supported Languages #

The library supports proper word counting for:

  • East Asian: Chinese (Simplified/Traditional), Japanese (Hiragana, Katakana, Kanji), Korean
  • European: English, French, German, Spanish, Italian, Russian, Polish, Dutch, and 40+ more
  • South Asian: Hindi, Bengali, Urdu, Telugu, Tamil, Gujarati, Punjabi, and more
  • Southeast Asian: Thai, Vietnamese, Indonesian, Malay, Filipino
  • Middle Eastern: Arabic, Hebrew, Persian, Turkish, Kurdish
  • African: Swahili, Amharic, Yoruba, Hausa, Igbo, and more
  • Other: Armenian, Georgian, Estonian, Welsh, Irish, and many more

API Reference #

Functions #

  • wordsCount(String? text, [WordCountConfig config])int

    • Returns the word count for the given text
  • wordsSplit(String? text, [WordCountConfig config])List<String>

    • Returns an array of words from the given text
  • wordsDetect(String? text, [WordCountConfig config])WordCountResult

    • Returns both word count and word array in a structured result

Classes #

  • WordCountConfig - Configuration options for word counting behavior
  • WordCountResult - Result object containing both count and words array

License #

This project is licensed under the MIT License - see the LICENSE file for details.

0
likes
160
points
70
downloads

Publisher

verified publisherwaltertay.com

Weekly Downloads

Fastest multi-language word counting library (8.0 million words/s) for Dart that supports 85+ languages.

Repository (GitHub)
View/report issues

Documentation

API reference

License

MIT (license)

More

Packages that depend on word_count