text_analysis 0.15.0
text_analysis: ^0.15.0 copied to clipboard
Text analyzer that tokenize text, compute readibility scores for a document and evaluate similarity of terms.
0.15.0 #
BREAKING CHANGES
Breaking Changes:
- Added field
Map<String, String> get abbreviationstoTextAnalyzerclass.
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
New:
- Implemented field
English.abbreviationsinEnglishclass.
0.14.0 #
BREAKING CHANGES
Breaking Changes:
- Removed library
package_exports. - The
Porter2Stemmerclass from theporter_2_stemmerpackage is exported by thetext_indexerlibrary. - The
Porter2StemmerExtensionString extension is exported by theextensionslibrary.
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.13.0 #
BREAKING CHANGES
Breaking Changes:
- Added field
TextAnalyzer.stemmertoTextAnalyzerclass. - Added field
TextAnalyzer.stopWordstoTextAnalyzerclass. - Added field
TextAnalyzer.lemmatizertoTextAnalyzerclass. - Added field
TextAnalyzer.termExceptionstoTextAnalyzerclass. - Removed static field
TextTokenizer.defaultTokenFilter. - Changed
TextTokenizer.tokenizemethod to applyanalyzer.stemmer,analyzer.stopWords,analyzer.lemmatizerananalyzer.termExceptionsto all tokens/terms.
New:
- Implemented field
English.stemmerinEnglishclass. - Implemented field
English.stopWordsinEnglishclass. - Implemented field
English.lemmatizerinEnglishclass. - Implemented field
English.termExceptionsinEnglishclass.
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.13.0-1 #
BREAKING CHANGES
Breaking Changes:
- Added field
TextAnalyzer.stemmertoTextAnalyzerclass. - Added field
TextAnalyzer.stopWordstoTextAnalyzerclass. - Added field
TextAnalyzer.lemmatizertoTextAnalyzerclass. - Added field
TextAnalyzer.termExceptionstoTextAnalyzerclass. - Removed static field
TextTokenizer.defaultTokenFilter. - Changed
TextTokenizer.tokenizemethod to applyanalyzer.stemmer,analyzer.stopWords,analyzer.lemmatizerananalyzer.termExceptionsto all tokens/terms.
New:
- Implemented field
English.stemmerinEnglishclass. - Implemented field
English.stopWordsinEnglishclass. - Implemented field
English.lemmatizerinEnglishclass. - Implemented field
English.termExceptionsinEnglishclass.
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.12.1+1 #
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.12.1 #
New:
- Added extension on String
editDistanceMap; - Added method
TermSimilarity.editSimilarityMap. - Added method
TermSimilarity.editDistanceMap.
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.12.0 #
BREAKING CHANGES
Breaking Changes:
- String extensions
extension TermSimilarityExtensions on Stringremoved fromtext_analysislibrary. Import theextensionslibrary in stead. - Type definitions removed from
text_analysislibrary. Import thetype_definitionslibrary in stead. - Package export
porter_2_stemmerremoved fromtext_analysislibrary, import thepackage_exportslibrary in stead. - Changed definition/computation of [lengthDistance].
- Changed definition/computation of [lengthSimilarity].
- Changed definition/computation of [termSimilarity].
New:
- Added
int term.editDistance(String other)extension on String. - Added
double term.editDistanceSimilarity(String other)extension on String. - Added class
TermSimilaritythat exposes static methods for comparing terms.
Bug fixes:
- Fixed issue with tokenizer not incrementing term positions
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.12.0-2 #
BREAKING CHANGES
Breaking Changes:
- String extensions
extension TermSimilarityExtensions on Stringremoved fromtext_analysislibrary. Import theextensionslibrary in stead. - Type definitions removed from
text_analysislibrary. Import thetype_definitionslibrary in stead. - Package export
porter_2_stemmerremoved fromtext_analysislibrary, import thepackage_exportslibrary in stead. - Changed definition/computation of [lengthDistance].
- Changed definition/computation of [lengthSimilarity].
- Changed definition/computation of [termSimilarity].
New:
- Added
int term.editDistance(String other)extension on String. - Added
double term.editDistanceSimilarity(String other)extension on String. - Added class
TermSimilaritythat exposes static methods for comparing terms.
Bug fixes:
- Fixed issue with tokenizer not incrementing term positions
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.12.0-1 #
BREAKING CHANGES
Breaking Changes:
- String extensions
extension TermSimilarityExtensions on Stringremoved fromtext_analysislibrary. Import theextensionslibrary in stead. - Type definitions removed from
text_analysislibrary. Import thetype_definitionslibrary in stead. - Package export
porter_2_stemmerremoved fromtext_analysislibrary, import thepackage_exportslibrary in stead. - Changed definition/computation of [lengthDistance].
- Changed definition/computation of [lengthSimilarity].
- Changed definition/computation of [termSimilarity].
New:
- Added
int term.editDistance(String other)extension on String. - Added
double term.editDistanceSimilarity(String other)extension on String. - Added class
TermSimilaritythat exposes static methods for comparing terms.
Bug fixes:
- Fixed issue with tokenizer not incrementing term positions
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.11.2 #
New:
- Added extension on String
List<Term> matches(Iterable<Term> terms, {int k = 2, int limit = 10}).
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.11.1 #
New:
- Added extension on String
double termSimilarity(Term other, [int k = 2]). - Added extension on String
double termSimilarity(Term other, [int k = 2]).
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.11.0 #
BREAKING CHANGES
This version sees numerous breaking changes, including the re-naming of the primary interfaces of the library.
Breaking Changes:
- Renamed
TextAnalyzerinterface toTextTokenizer. - Renamed
TextAnalyzerConfigurationinterface toTextAnalyzer. - Added
SentenceSplitter get sentenceSplittertoTextAnalyzerinterface. - Added
ParagraphSplitter get paragraphSplittertoTextAnalyzerinterface. - Added
SyllableCounter get syllableCountertoTextAnalyzerinterface. - Added
List<String> paragraphs(SourceText source)toITextTokenizerinterface. - Moved class
TextTokenizerto a private implementation class_TextTokenizerImpland renamedITextTokenizerinterface toTextTokenizer.
New:
- Added mixin class
TextTokenizerMixin. - Added object model
TextDocument. - Added typedef
SyllableCounter. - Added unnamed factory constructor to
TextTokenizerthat initializes a_TextTokenizerImpl. - Added
SentenceSplitter get sentenceSplittertoEnglishclass. - Added
ParagraphSplitter get paragraphSplittertoEnglishclass. - Added
SyllableCounter get syllableCountertoEnglishclass. - Added
TextDocumentinterface. - Added
TextDocumentMixinmixin class. - Added
TextDocumentunnamed factory with private implementation class. - Added
TextDocument.analyzefactory constructor. - Added
TextDocument.analyzeJsonfactory constructor. - Added extension on String
double lengthDistance(Term other). - Added extension on String
double lengthSimilarity(Term other). - Added extension on String
Map<Term, double> lengthSimilarityMap(Iterable<Term> terms).
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
Re-organized code repository
0.10.0 #
BREAKING CHANGES
This version sees numerous breaking changes, including the re-naming of the primary interfaces of the library.
Breaking Changes:
- Renamed
TextAnalyzerinterface toTextTokenizer. - Renamed
TextAnalyzerConfigurationinterface toTextAnalyzer. - Added
SentenceSplitter get sentenceSplittertoTextAnalyzerinterface. - Added
ParagraphSplitter get paragraphSplittertoTextAnalyzerinterface. - Added
SyllableCounter get syllableCountertoTextAnalyzerinterface. - Added
List<String> paragraphs(SourceText source)toITextTokenizerinterface. - Moved class
TextTokenizerto a private implementation class_TextTokenizerImpland renamedITextTokenizerinterface toTextTokenizer.
New:
- Added mixin class
TextTokenizerMixin. - Added object model
TextDocument. - Added typedef
SyllableCounter. - Added unnamed factory constructor to
TextTokenizerthat initializes a_TextTokenizerImpl. - Added
SentenceSplitter get sentenceSplittertoEnglishclass. - Added
ParagraphSplitter get paragraphSplittertoEnglishclass. - Added
SyllableCounter get syllableCountertoEnglishclass. - Added
TextDocumentinterface. - Added
TextDocumentMixinmixin class. - Added
TextDocumentunnamed factory with private implementation class. - Added
TextDocument.analyzefactory constructor. - Added
TextDocument.analyzeJsonfactory constructor. - Added extension on String
double lengthDistance(Term other). - Added extension on String
double lengthSimilarity(Term other). - Added extension on String
Map<Term, double> lengthSimilarityMap(Iterable<Term> terms).
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
- Re-organized code repository.
0.9.1 #
New:
- Added extension on String
double jaccardSimilarity(Term other, [int k = 2]). - Added extension on String
double jaccardSimilarity(Term other, [int k = 2]).
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.9.0 #
BREAKING CHANGES
Breaking Changes:
- Removed class
TextSource. - Removed class
Sentence. - Removed class
TermPair. - Removed
TextAnalyzer.sentenceSplitterfromTextAnalyzerinterface. - Changed
TextTokenizer.tokenizereturn value toList<Token>. - Changed
TextTokenizer.tokenizeJsonreturn value toList<Token>.
0.8.1 #
PRE-RELEASE, BUG FIX
Bug Fixes:
- Fixed
TextTokenizerBase.tokenizeJsonwould not tokenize documents ifIterable<Zone> zonesparameter is empty.
Non-breaking Changes:
TextTokenizerBase.tokenizeJsonrequired non-nullable parameterIterable<Zone> zonesto optional nullable[Iterable<Zone>? zones].
0.8.0 #
PRE-RELEASE, BREAKING CHANGES
Breaking Changes:
- Added type definitions for
kGramandTrigram. - New extension method
Set<kGram> Term.kGrams([int k = 2]). - New extension method
Set<kGram> Iterable<Token>.kGrams([int k = 2]).
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.7.0 #
PRE-RELEASE, BREAKING CHANGES
Breaking Changes:
- Renamed
FieldNametype alias toZone. - Renamed parameter
FieldName? fieldtoZone? zonewherever it is used.
New:
- Type alias
IdFt. - Type alias
Ft. - Type alias
ZoneWeightMap.
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.6.5+1 #
PRE-RELEASE
Minor bug fixes, updated dependencies, tests, examples and documentation.
0.6.5 #
PRE-RELEASE
New:
- Added custom implementation of
TermPair.toString().
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.6.4 #
PRE-RELEASE
New:
- Added
==operator andhashCodegetter toTermPair.
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.6.3 #
PRE-RELEASE
New:
- Added object model
TermPair.
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.6.2 #
PRE-RELEASE
New:
- Added extension getter
List<String> get allTermsonIterable<Token>.
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.6.1 #
PRE-RELEASE
- Added type aliases to improve code readability.
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.6.0+1 #
PRE-RELEASE
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
- Codebase formatted.
0.6.0 #
PRE-RELEASE, BREAKING CHANGES
Breaking Changes:
- Changed parameters for
JsonTokenizertype defintion.
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.5.0 #
PRE-RELEASE
New:
- Added
JsonTokenizertype defintion.
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.4.1 #
PRE-RELEASE
New:
- Added optional, nullable
FieldName? fieldoptional parameter toTokenizerdefinition.
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.4.0+1 #
PRE-RELEASE
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.4.0 #
PRE-RELEASE, BREAKING CHANGES
Breaking Changes:
- Added
Token.fieldproperty to token, breaks default generative constructor. - Added
FieldName? fieldoptional parameter toTextTokenizer.tokenizemethod. - Removed deprecated property
Token.index, useToken.termPositioninstead. - Removed deprecated property
Token.position, useToken.termPositioninstead. - Removed deprecated extension method
Iterable<Token>.maxIndex, useIterable<Token>.Iterable - Removed extension method
Iterable<Token>.minIndex, useIterable<Token>.Iterable
New:
- Added new method
ITextAnalyser,tokenizeJson. - Added new tests.
- Added new examples.
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.3.1 #
PRE-RELEASE
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.3.0+1 #
PRE-RELEASE
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.3.0 #
PRE-RELEASE, BREAKING CHANGES
Breaking Changes:
TextAnalyzer.characterFilterchanged to non-nullable. Use(phrase) => phraseif nocharacterFilteris required.TextAnalyzer.termFilterchanged to non-nullable. Use(phrase) => [phrase]if notermFilteris required.
New:
- Added
porter_2_stemmerpackage export so it does not need to be imported separately.
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.2.0+1 #
PRE-RELEASE
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.2.0 #
PRE-RELEASE
New:
- Added abstract class
TextTokenizerBase.
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.1.0+1 #
PRE-RELEASE
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.1.0 #
PRE-RELEASE, BREAKING CHANGES
Breaking Changes:
- Added
Token.termPositionproperty to token, breaks default generative constructor.
Deprecated:
- Property
Token.index, useToken.termPositioninstead. - Property
Token.position, useToken.termPositioninstead. - Extension method
Iterable<Token>.maxIndex.
1.0.0+1 #
PRE-RELEASE
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
1.0.0 #
PRE-RELEASE
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.0.12+1 #
PRE-RELEASE
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.0.12 #
PRE-RELEASE
New:
- Added == operator to
Token,SentenceandTextSource.
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.0.11+build.1.e8af2efb #
- PRE-RELEASE
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.0.11 #
PRE-RELEASE
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.0.11-beta.1 #
PRE-RELEASE
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.0.10 #
PRE-RELEASE
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.0.9-beta.1 #
Breaking Changes:
- Changed definition of
Token.position.
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.0.8 #
PRE-RELEASE, BREAKING CHANGES
Breaking Changes:
- Removed
relevanceextension method fromTokenCollectionExtension.
0.0.7 #
PRE-RELEASE
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.0.6 #
PRE-RELEASE, BREAKING CHANGES
New:
- Added
TokenCollectionExtensiononIterable<Token>.
0.0.5 #
PRE-RELEASE, BREAKING CHANGES
Breaking Changes:
- added
positionproperty toTokenclass.
0.0.4 #
PRE-RELEASE, BREAKING CHANGES
New:
- Added
Tokenizertype definition.
0.0.3 #
PRE-RELEASE, BREAKING CHANGES
Breaking Changes:
- Stemmer removed from English configuration.
- Stemmer incorporated into default tokenFilter for
TextTokenizer.
0.0.2 #
PRE-RELEASE, BREAKING CHANGES
Updated
- Dependencies.
- Tests.
- Examples.
- Documentation.
0.0.1-beta.1 #
PRE-RELEASE
Initial version.