Embed

This module contains functionality related to the the embed script.

Embed

This script is the entry point for the embedding process. It initializes the embedding orchestrator and starts the embedding workflow. To run the script, execute the following command from the root directory of the project:

python src/embed.py

run(logger=LoggerConfiguration.get_logger(__name__)) async

Execute the embedding process.

Parameters:
  • logger (Logger, default: get_logger(__name__) ) –

    Logger instance for logging messages

Source code in src/embed.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
async def run(
    logger: logging.Logger = LoggerConfiguration.get_logger(__name__),
):
    """
    Execute the embedding process.

    Args:
        logger: Logger instance for logging messages
    """
    initializer = EmbeddingInitializer()
    configuration = initializer.get_configuration()

    vector_store = configuration.embedding.vector_store
    validator = VectorStoreValidatorRegistry.get(vector_store.name).create(
        vector_store
    )
    try:
        validator.validate()
    except CollectionExistsException as e:
        logger.info(
            f"Collection '{e.collection_name}' already exists. "
            "Skipping embedding process."
        )
        return

    logger.info("Starting embedding process.")
    orchestrator = EmbeddingOrchestratorRegistry.get(
        configuration.embedding.orchestrator_name
    ).create(configuration)

    await orchestrator.embed()
    logger.info("Embedding process finished.")