Interface Chunker

All Known Implementing Classes:
Chunker.FailingChunker, FixedLengthChunker, SentenceChunker

public interface Chunker
A chunker converts splits a text string into multiple smaller strings (chunks). This is typically used for large pieces of text that should be split into many chunks for vector embedding.
Author:
bratseth
  • Field Details

    • defaultChunkerId

      static final String defaultChunkerId
      ID of chunker when none is explicitly given
      See Also:
    • throwsOnUse

      static final Chunker throwsOnUse
      An instance of this which throws IllegalStateException if attempted used
  • Method Details

    • asMap

      default Map<String,Chunker> asMap()
      Returns this chunker instance as a map with the default chunked name
    • asMap

      default Map<String,Chunker> asMap(String name)
      Returns this chunker instance as a map with the given name
    • chunk

      List<Chunker.Chunk> chunk(String text, Chunker.Context context)
      Splits a text into multiple chunks. The chunks should preferably contain all the content of the original text, and can be overlapping.
      Parameters:
      text - the text to split into chunks
      context - the context which may influence a chunker's behavior
      Returns:
      the resulting chunks
      Throws:
      IllegalArgumentException - if the language is not supported by this