dart_wordpiece 1.1.0
dart_wordpiece: ^1.1.0 copied to clipboard
Pure Dart BERT-compatible WordPiece tokenizer. Outputs input_ids, attention_mask, and token_type_ids. No Flutter or native dependencies.
1.1.0 #
1.0.0 #
Initial release.
Added #
WordPieceTokenizer— BERT-compatible WordPiece tokenizer in pure Dart.encode(text)— single-sequence encoding with padding and truncation.encodePair(textA, textB)— sentence-pair encoding withtoken_type_ids.encodeAll(texts)— batch encoding.tokenize(text)— returns raw token strings without padding.TokenizerOutput— typed result withinputIds,attentionMask,tokenTypeIds,Int64Listgetters for ONNX tensor creation.TokenizerConfig— configurablemaxLength,stopwords,normalizeText, andSpecialTokens.SpecialTokens— defaultSpecialTokens.bert()and custom constructor.TextNormalizer— standalone lowercase / punctuation / stopword normalizer.VocabLoader— vocabulary loading fromdart:ioFile, String, or Map.