Evaluate

This module contains functionality related to the the evaluate script.

Evaluate

This script is used to evaluate RAG system using langfuse datasets. To add a new item to manual dataset, visit Langfuse UI. Vector storage should be running with ready collection of embeddings. To run the script execute the following command from the root directory of the project:

python src/evaluate.py

run(logger=LoggerConfiguration.get_logger(__name__))

Execute RAG system evaluation workflow.

Parameters:
  • logger (Logger, default: get_logger(__name__) ) –

    Logger instance for logging messages

Source code in src/evaluate.py
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def run(
    logger: logging.Logger = LoggerConfiguration.get_logger(__name__),
) -> None:
    """
    Execute RAG system evaluation workflow.

    Args:
        logger: Logger instance for logging messages
    """
    initializer = EvaluationInitializer()
    configuration = initializer.get_configuration()
    langfuse_evaluator = LangfuseEvaluatorFactory.create(configuration)

    logger.info(f"Evaluating {langfuse_evaluator.run_name}...")

    langfuse_evaluator.evaluate(
        dataset_name=configuration.augmentation.langfuse.datasets.feedback_dataset.name
    )
    langfuse_evaluator.evaluate(
        dataset_name=configuration.augmentation.langfuse.datasets.manual_dataset.name
    )

    logger.info(f"Evaluation complete for {configuration.metadata.build_name}.")