Package com.yahoo.language.simple
Class SimpleTokenizer
java.lang.Object
com.yahoo.language.simple.SimpleTokenizer
- All Implemented Interfaces:
Tokenizer
A tokenizer which splits on whitespace, normalizes and transforms using the given implementations and stems using the kstem algorithm.
This is not multithread safe.
- Author:
- Mathias Mølster Lidal, bratseth
-
Constructor Summary
ConstructorsConstructorDescriptionSimpleTokenizer(Normalizer normalizer) SimpleTokenizer(Normalizer normalizer, Transformer transformer) SimpleTokenizer(Normalizer normalizer, Transformer transformer, SpecialTokenRegistry specialTokenRegistry) -
Method Summary
Modifier and TypeMethodDescriptiontokenize(String input, LinguisticsParameters parameters) Tokenize the input, applying the transform of this to each token string.Tokenize the input, and apply the given transform to each token string.
-
Constructor Details
-
SimpleTokenizer
public SimpleTokenizer() -
SimpleTokenizer
-
SimpleTokenizer
-
SimpleTokenizer
public SimpleTokenizer(Normalizer normalizer, Transformer transformer, SpecialTokenRegistry specialTokenRegistry)
-
-
Method Details