Local Development Setup
This guide outlines the steps required to set up the RAG system on your local machine for development purposes.
Requirements:
- Python 3.12
- Docker
Configuration & Secrets
The local configuration is located in configuration.local.json. This file configures toy PDF dataset as the document datasource and defines local settings for embedding, augmentation, and evaluation stages. To customize the setup, refer to the configuration QuickStart Setup.
Secrets Configuration
Create a secrets file at configurations/secrets.local.env
. Below is a template:
# LLMs
RAG__LLMS__OPENAI__API_KEY=...
# Langfuse
RAG__LANGFUSE__DATABASE__USER=user
RAG__LANGFUSE__DATABASE__PASSWORD=password
RAG__LANGFUSE__SECRET_KEY=...
RAG__LANGFUSE__PUBLIC_KEY=...
RAG__LLMS__OPENAI__API_KEY
: Required for connecting to OpenAI LLM.- Langfuse Keys:
RAG__LANGFUSE__SECRET_KEY
andRAG__LANGFUSE__PUBLIC_KEY
are generated during initialization and will need to be updated later.
Initialization
Python Environment
-
Install uv on your OS following this installation guide.
-
In the root of the project, create a virtual environment and activate it:
uv venv
source .venv/bin/activate
- Install the required dependencies:
uv sync --all-extras
Services Initialization
To initialize the Langfuse and vector store services, run the initialization script:
build/workstation/init.sh --env local
NOTE: Depending on your OS and the setup you might need to give execute permission to the initialization script e.g. chmod u+x build/workstation/init.sh
Once initialized, access the Langfuse web server on your localhost (port defined in configuration.local.json under pipeline.augmentation.langfuse.port
). Use the Langfuse UI to:
- Create a user.
- Set up a project for the application.
- Generate secret and public keys for the project.
Add the generated keys to the configurations/secrets.local.env
file as follows:
RAG__LANGFUSE__SECRET_KEY=<generated_secret_key>
RAG__LANGFUSE__PUBLIC_KEY=<generated_public_key>
Development
Running RAG
For the first run, it is recommended to execute the scripts in the specified order to ensure proper initialization of resources like vector store collections.
Embedding Stage
Run the embedding stage script:
python src/embed.py --env local
Note: The embedding process may take significant time, depending on the size of your datasource.
Augmentation Stage
Run the augmentation stage script:
python src/chat.py --env local
This initializes the RAG system's query engine and the Chainlit application, leveraging the embeddings generated in the previous step.
Evaluation Stage
Run the evaluation stage script:
python src/evaluation.py --env local
Important: For evaluation to proceed, Langfuse datasets must be populated either manually or via Chainlit's human feedback feature. For additional details, refer to the Evaluation Docs.
Git setup
The .pre-commit-config.yaml
file configures code formatters to enforce consistency before committing changes. After cloning the repository and installing dependencies, enable pre-commit hooks:
pre-commit install
````