Embedding_model_configuration

This module contains functionality related to the the embedding_model_configuration module for embedding.bootstrap.configuration.

Embedding_model_configuration

EmbeddingModelConfiguration

Bases: BaseConfigurationWithSecrets

Configuration class for embedding models.

This class defines the necessary parameters and settings for configuring an embedding model, including provider information, model name, and tokenization settings.

Source code in src/embedding/bootstrap/configuration/embedding_model_configuration.py
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
class EmbeddingModelConfiguration(BaseConfigurationWithSecrets):
    """Configuration class for embedding models.

    This class defines the necessary parameters and settings for configuring
    an embedding model, including provider information, model name, and
    tokenization settings.
    """

    provider: EmbeddingModelProviderName = Field(
        ..., description="The provider of the embedding model."
    )
    name: str = Field(..., description="The name of the embedding model.")
    tokenizer_name: str = Field(
        ...,
        description="The name of the tokenizer used by the embedding model.",
    )
    batch_size: int = Field(64, description="The batch size for embedding.")

    splitter: Any = Field(
        None, description="The splitter configuration for the embedding model."
    )

    @field_validator("splitter")
    @classmethod
    def _validate_splitter(cls, value: Any, info: ValidationInfo) -> Any:
        """Validate the splitter configuration.

        This method ensures that the provided splitter configuration is valid
        according to the SplitterConfigurationRegistry.

        Args:
            value: The splitter configuration value to validate.
            info: Validation context information.

        Returns:
            The validated splitter configuration.
        """
        return super()._validate(
            value,
            info=info,
            registry=SplitterConfigurationRegistry,
        )

EmbeddingModelConfigurationRegistry

Bases: ConfigurationRegistry

Registry for embedding model configurations.

This registry maps embedding model provider names to their respective configuration classes.

Source code in src/embedding/bootstrap/configuration/embedding_model_configuration.py
69
70
71
72
73
74
75
76
class EmbeddingModelConfigurationRegistry(ConfigurationRegistry):
    """Registry for embedding model configurations.

    This registry maps embedding model provider names to their
    respective configuration classes.
    """

    _key_class: Type = EmbeddingModelProviderName

EmbeddingModelProviderName

Bases: str, Enum

Enumeration of supported embedding model providers.

This enum lists all the providers that can be used for embedding models.

Source code in src/embedding/bootstrap/configuration/embedding_model_configuration.py
13
14
15
16
17
18
19
20
21
class EmbeddingModelProviderName(str, Enum):
    """Enumeration of supported embedding model providers.

    This enum lists all the providers that can be used for embedding models.
    """

    HUGGING_FACE = "hugging_face"
    OPENAI = "openai"
    VOYAGE = "voyage"