Configuration

This module contains functionality related to the the configuration module for embedding.embedding_models.openai.

Configuration

OpenAIEmbeddingModelConfiguration

Bases: EmbeddingModelConfiguration

Configuration for OpenAI embedding models.

This class defines the configuration parameters needed to use OpenAI embedding models, including API credentials, model parameters, and request size limitations.

Source code in src/embedding/embedding_models/openai/configuration.py
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
class OpenAIEmbeddingModelConfiguration(EmbeddingModelConfiguration):
    """
    Configuration for OpenAI embedding models.

    This class defines the configuration parameters needed to use OpenAI
    embedding models, including API credentials, model parameters, and
    request size limitations.
    """

    class Secrets(BaseSecrets):
        """
        Secrets configuration for OpenAI embedding models.

        Contains sensitive credentials required for API authentication with
        appropriate environment variable mappings.
        """

        model_config = ConfigDict(
            env_file_encoding="utf-8",
            env_prefix="RAGKB__EMBEDDING_MODELS__OPEN_AI__",
            env_nested_delimiter="__",
            extra="ignore",
        )

        api_key: SecretStr = Field(..., description="API key for the model")

    provider: Literal[EmbeddingModelProviderName.OPENAI] = Field(
        ..., description="The provider of the embedding model."
    )
    max_request_size_in_tokens: int = Field(
        8191,
        description="Maximum size of the request in tokens.",
    )
    secrets: Secrets = Field(
        None, description="The secrets for the language model."
    )

    def model_post_init(self, __context: dict):
        """
        Post-initialization processing for the model configuration.

        Calculates the appropriate batch size based on the maximum request size
        and the configured text splitter's chunk size if a splitter is defined.

        Args:
            __context: Context information provided by Pydantic during initialization
        """
        super().model_post_init(__context)
        if self.splitter:
            self.batch_size = (
                self.max_request_size_in_tokens
                // self.splitter.chunk_size_in_tokens
            )

Secrets

Bases: BaseSecrets

Secrets configuration for OpenAI embedding models.

Contains sensitive credentials required for API authentication with appropriate environment variable mappings.

Source code in src/embedding/embedding_models/openai/configuration.py
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
class Secrets(BaseSecrets):
    """
    Secrets configuration for OpenAI embedding models.

    Contains sensitive credentials required for API authentication with
    appropriate environment variable mappings.
    """

    model_config = ConfigDict(
        env_file_encoding="utf-8",
        env_prefix="RAGKB__EMBEDDING_MODELS__OPEN_AI__",
        env_nested_delimiter="__",
        extra="ignore",
    )

    api_key: SecretStr = Field(..., description="API key for the model")

model_post_init(__context)

Post-initialization processing for the model configuration.

Calculates the appropriate batch size based on the maximum request size and the configured text splitter's chunk size if a splitter is defined.

Parameters:
  • __context (dict) –

    Context information provided by Pydantic during initialization

Source code in src/embedding/embedding_models/openai/configuration.py
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
def model_post_init(self, __context: dict):
    """
    Post-initialization processing for the model configuration.

    Calculates the appropriate batch size based on the maximum request size
    and the configured text splitter's chunk size if a splitter is defined.

    Args:
        __context: Context information provided by Pydantic during initialization
    """
    super().model_post_init(__context)
    if self.splitter:
        self.batch_size = (
            self.max_request_size_in_tokens
            // self.splitter.chunk_size_in_tokens
        )