Evaluate

This module contains functionality related to the the evaluate script.

Evaluate

This script is used to evaluate RAG system using langfuse datasets. To add a new item to datasets, visit Langfuse UI. Qdrant vector storage should be running with ready collection of embeddings. To run the script execute the following command from the root directory of the project:

python src/evaluate.py

main(injector)

Execute RAG system evaluation workflow.

Parameters:
  • injector (Injector) –

    Dependency injection container

Note

Evaluates both feedback and manual datasets Results are recorded in Langfuse

Source code in src/evaluate.py
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def main(
    injector: Injector,
):
    """Execute RAG system evaluation workflow.

    Args:
        injector: Dependency injection container

    Note:
        Evaluates both feedback and manual datasets
        Results are recorded in Langfuse
    """
    configuration = injector.get(Configuration)
    langfuse_evaluator = injector.get(LangfuseEvaluator)

    logging.info(f"Evaluating {langfuse_evaluator.run_name}...")

    langfuse_evaluator.evaluate(
        dataset_name=configuration.pipeline.augmentation.langfuse.datasets.feedback_dataset.name
    )
    langfuse_evaluator.evaluate(
        dataset_name=configuration.pipeline.augmentation.langfuse.datasets.manual_dataset.name
    )

    logging.info(
        f"Evaluation complete for {configuration.metadata.build_name}."
    )