Package com.yahoo.language.process
Interface Chunker
- All Known Implementing Classes:
Chunker.FailingChunker,FixedLengthChunker,SentenceChunker
public interface Chunker
A chunker converts splits a text string into multiple smaller strings (chunks).
This is typically used for large pieces of text that should be split into many chunks for
vector embedding.
- Author:
- bratseth
-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic final recordstatic classstatic class -
Field Summary
Fields -
Method Summary
Modifier and TypeMethodDescriptionasMap()Returns this chunker instance as a map with the default chunked nameReturns this chunker instance as a map with the given namechunk(String text, Chunker.Context context) Splits a text into multiple chunks.
-
Field Details
-
defaultChunkerId
ID of chunker when none is explicitly given- See Also:
-
throwsOnUse
An instance of this which throws IllegalStateException if attempted used
-
-
Method Details
-
asMap
Returns this chunker instance as a map with the default chunked name -
asMap
Returns this chunker instance as a map with the given name -
chunk
Splits a text into multiple chunks. The chunks should preferably contain all the content of the original text, and can be overlapping.- Parameters:
text- the text to split into chunkscontext- the context which may influence a chunker's behavior- Returns:
- the resulting chunks
- Throws:
IllegalArgumentException- if the language is not supported by this
-