Base_splitter

This module contains functionality related to the the base_splitter module for embedding.splitters.

Base_splitter

BaseSplitter

Bases: ABC, Generic[DocType]

Abstract base class for document splitter.

This class defines a common interface for document splitters that transform various document types into text nodes for further processing. It leverages generic typing to support different document formats while maintaining type safety.

Implementations should handle the specific logic required to parse and split different document types into meaningful text chunks.

Source code in src/embedding/splitters/base_splitter.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
class BaseSplitter(ABC, Generic[DocType]):
    """Abstract base class for document splitter.

    This class defines a common interface for document splitters that transform
    various document types into text nodes for further processing. It leverages
    generic typing to support different document formats while maintaining type safety.

    Implementations should handle the specific logic required to parse and split
    different document types into meaningful text chunks.
    """

    @abstractmethod
    def split(self, document: DocType) -> TextNode:
        """Split a document into a text node.

        This method processes a single document and converts it into a TextNode
        representation suitable for embedding or other processing. Implementing
        classes should define the specific logic for parsing different document types.

        Args:
            document: The document to split or process

        Returns:
            TextNode: The processed text node generated from the document
        """
        pass

split(document) abstractmethod

Split a document into a text node.

This method processes a single document and converts it into a TextNode representation suitable for embedding or other processing. Implementing classes should define the specific logic for parsing different document types.

Parameters:
  • document (DocType) –

    The document to split or process

Returns:
  • TextNode( TextNode ) –

    The processed text node generated from the document

Source code in src/embedding/splitters/base_splitter.py
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
@abstractmethod
def split(self, document: DocType) -> TextNode:
    """Split a document into a text node.

    This method processes a single document and converts it into a TextNode
    representation suitable for embedding or other processing. Implementing
    classes should define the specific logic for parsing different document types.

    Args:
        document: The document to split or process

    Returns:
        TextNode: The processed text node generated from the document
    """
    pass