Base_embedder

This module contains functionality related to the the base_embedder module for embedding.embedders.

Base_embedder

BaseEmbedder

Bases: ABC

Abstract base class for text node embedding operations.

This class provides core functionality for embedding text nodes, with derived classes implementing specific embedding strategies.

Source code in src/embedding/embedders/base_embedder.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
class BaseEmbedder(ABC):
    """Abstract base class for text node embedding operations.

    This class provides core functionality for embedding text nodes,
    with derived classes implementing specific embedding strategies.
    """

    def __init__(
        self,
        configuration: EmbeddingConfiguration,
        embedding_model: BaseEmbedding,
        vector_store: VectorStore,
    ):
        """Initialize embedder with configuration, model and storage.

        Args:
            configuration: Configuration parameters for the embedding process
            embedding_model: Model to generate text embeddings
            vector_store: Storage system for persisting embedding vectors
        """
        super().__init__()
        self.configuration = configuration
        self.embedding_model = embedding_model
        self.vector_store = vector_store

    @abstractmethod
    def embed(self, nodes: List[TextNode]) -> None:
        """Generate embeddings for text nodes using batch processing.

        This method should implement a strategy for processing the provided nodes,
        potentially splitting them into batches for efficient embedding generation.

        Args:
            nodes: Collection of text nodes to embed

        Note:
            Implementation should modify nodes in-place by adding embeddings
        """
        pass

    @abstractmethod
    def embed_flush(self) -> None:
        """Process and generate embeddings for any remaining nodes.

        This method should handle any nodes that remain in the buffer, ensuring all nodes receive embeddings.

        Note:
            Should be called at the end of processing to ensure no nodes remain
            unembedded in the buffer
        """
        pass

__init__(configuration, embedding_model, vector_store)

Initialize embedder with configuration, model and storage.

Parameters:
  • configuration (EmbeddingConfiguration) –

    Configuration parameters for the embedding process

  • embedding_model (BaseEmbedding) –

    Model to generate text embeddings

  • vector_store (VectorStore) –

    Storage system for persisting embedding vectors

Source code in src/embedding/embedders/base_embedder.py
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def __init__(
    self,
    configuration: EmbeddingConfiguration,
    embedding_model: BaseEmbedding,
    vector_store: VectorStore,
):
    """Initialize embedder with configuration, model and storage.

    Args:
        configuration: Configuration parameters for the embedding process
        embedding_model: Model to generate text embeddings
        vector_store: Storage system for persisting embedding vectors
    """
    super().__init__()
    self.configuration = configuration
    self.embedding_model = embedding_model
    self.vector_store = vector_store

embed(nodes) abstractmethod

Generate embeddings for text nodes using batch processing.

This method should implement a strategy for processing the provided nodes, potentially splitting them into batches for efficient embedding generation.

Parameters:
  • nodes (List[TextNode]) –

    Collection of text nodes to embed

Note

Implementation should modify nodes in-place by adding embeddings

Source code in src/embedding/embedders/base_embedder.py
38
39
40
41
42
43
44
45
46
47
48
49
50
51
@abstractmethod
def embed(self, nodes: List[TextNode]) -> None:
    """Generate embeddings for text nodes using batch processing.

    This method should implement a strategy for processing the provided nodes,
    potentially splitting them into batches for efficient embedding generation.

    Args:
        nodes: Collection of text nodes to embed

    Note:
        Implementation should modify nodes in-place by adding embeddings
    """
    pass

embed_flush() abstractmethod

Process and generate embeddings for any remaining nodes.

This method should handle any nodes that remain in the buffer, ensuring all nodes receive embeddings.

Note

Should be called at the end of processing to ensure no nodes remain unembedded in the buffer

Source code in src/embedding/embedders/base_embedder.py
53
54
55
56
57
58
59
60
61
62
63
@abstractmethod
def embed_flush(self) -> None:
    """Process and generate embeddings for any remaining nodes.

    This method should handle any nodes that remain in the buffer, ensuring all nodes receive embeddings.

    Note:
        Should be called at the end of processing to ensure no nodes remain
        unembedded in the buffer
    """
    pass