Embed

This module contains functionality related to the the embed script.

Embed

This script is used to process datasources documents and embed them into a vector storage. In summary, this script reads, cleans, splits, and embeds datasources documents into a vector storage. To run the script execute the following command from the root directory of the project:

python src/embed.py

main(injector)

Execute embedding workflow with validation.

Parameters:
  • injector (Injector) –

    Dependency injection container

Note

Exits with code 100 if collection already exists

Source code in src/embed.py
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def main(injector: Injector):
    """Execute embedding workflow with validation.

    Args:
        injector: Dependency injection container

    Note:
        Exits with code 100 if collection already exists
    """
    try:
        vector_store_validator = injector.get(VectorStoreValidator)
        vector_store_validator.validate()
    except CollectionExistsException as e:
        logging.info(f"{e.message}. Skipping embedding.")
        exit(100)

    asyncio.run(run_embedding(injector))

run_embedding(injector) async

Process and embed documents from datasources.

Parameters:
  • injector (Injector) –

    Dependency injection container

Note

Executes extraction, embedding and storage operations Exits with code 0 on success

Source code in src/embed.py
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
async def run_embedding(injector: Injector):
    """Process and embed documents from datasources.

    Args:
        injector: Dependency injection container

    Note:
        Executes extraction, embedding and storage operations
        Exits with code 0 on success
    """
    datasource_orchestrator = injector.get(DatasourceOrchestrator)

    logging.info("Starting embedding...")

    await datasource_orchestrator.extract()
    datasource_orchestrator.embed()
    datasource_orchestrator.save_to_vector_storage()

    logging.info("Embedding finished...")
    exit(0)