Package com.yahoo.language.process
Interface Tokenizer
- All Known Implementing Classes:
SimpleTokenizer
public interface Tokenizer
Language-sensitive tokenization of a text string.
- Author:
- Mathias Mølster Lidal
-
Method Summary
Modifier and TypeMethodDescriptionDeprecated.use #tokenize(String, LinguisticsParameters)tokenize(String input, LinguisticsParameters parameters) Tokenizes the given input string.
-
Method Details
-
tokenize
@Deprecated default Iterable<Token> tokenize(String input, Language language, StemMode stemMode, boolean removeAccents) Deprecated.use #tokenize(String, LinguisticsParameters)Tokenizes the given input string.- Parameters:
input- the string to tokenize. May be arbitrarily large.language- the language of the input string.stemMode- the stem mode applied on the returned tokensremoveAccents- whether to normalize accents and similar- Returns:
- the tokens of the input String
- Throws:
ProcessingException- If the underlying library throws an Exception.
-
tokenize
Tokenizes the given input string. This default implementation calls tokenize(input, language, stemMode, removeAccents)
-