msse-ai-engineering / project-plan.md
Tobias Pasquale
docs: Verify LLM integration operational status
9452a54
|
raw
history blame
6.61 kB

RAG Application Project Plan

This plan outlines the steps to design, build, and deploy a Retrieval-Augmented Generation (RAG) application as per the project requirements, with a focus on achieving a grade of 5. The approach prioritizes early deployment and continuous integration, following Test-Driven Development (TDD) principles.

1. Foundational Setup

  • Repository: Create a new GitHub repository.
  • Virtual Environment: Set up a local Python virtual environment (venv).
  • Initial Files:
    • Create requirements.txt with initial dependencies (Flask, pytest).
    • Create a .gitignore file for Python.
    • Create a README.md with initial setup instructions.
    • Create placeholder files: deployed.md and design-and-evaluation.md.
  • Testing Framework: Establish a tests/ directory and configure pytest.

2. "Hello World" Deployment

  • Minimal App: Develop a minimal Flask application (app.py) with a /health endpoint that returns a JSON status object.
  • Unit Test: Write a test for the /health endpoint to ensure it returns a 200 OK status and the correct JSON payload.
  • Local Validation: Run the app and tests locally to confirm everything works.

3. CI/CD and Initial Deployment

  • Render Setup: Create a new Web Service on Render and link it to the GitHub repository.
  • Environment Configuration: Configure necessary environment variables on Render (e.g., PYTHON_VERSION).
  • GitHub Actions: Create a CI/CD workflow (.github/workflows/main.yml) that:
    • Triggers on push/PR to the main branch.
    • Installs dependencies from requirements.txt.
    • Runs the pytest test suite.
    • On success, triggers a deployment to Render.
  • Deployment Validation: Push a change and verify that the workflow runs successfully and the application is deployed.
  • Documentation: Update deployed.md with the live URL of the deployed application.

CI/CD optimizations added

  • Add pip cache to CI to speed up dependency installation.
  • Optimize pre-commit in PRs to run only changed-file hooks (use pre-commit run --from-ref ... --to-ref ...).

4. Data Ingestion and Processing

  • Corpus Assembly: Collect or generate 5-20 policy documents (PDF, TXT, MD) and place them in a synthetic_policies/ directory.
  • Parsing Logic: Implement and test functions to parse different document formats.
  • Chunking Strategy: Implement and test a document chunking strategy (e.g., recursive character splitting with overlap).
  • Reproducibility: Set fixed seeds for any processes involving randomness (e.g., chunking, sampling) to ensure deterministic outcomes.

5. Embedding and Vector Storage βœ… PHASE 2B COMPLETED

  • Vector DB Setup: Integrate a vector database (ChromaDB) into the project.
  • Embedding Model: Select and integrate a free embedding model (sentence-transformers/all-MiniLM-L6-v2).
  • Ingestion Pipeline: Create enhanced ingestion pipeline that:
    • Loads documents from the corpus.
    • Chunks the documents with metadata.
    • Embeds the chunks using sentence-transformers.
    • Stores the embeddings in ChromaDB vector database.
    • Provides detailed processing statistics.
  • Testing: Write comprehensive tests (60+ tests) verifying each step of the ingestion pipeline.
  • Search API: Implement POST /search endpoint for semantic search with:
    • JSON request/response format
    • Configurable parameters (top_k, threshold)
    • Comprehensive input validation
    • Detailed error handling
  • End-to-End Testing: Complete pipeline testing from ingestion through search.
  • Documentation: Full API documentation with examples and performance metrics.

6. RAG Core Implementation βœ… PHASE 3 COMPLETED

  • Retrieval Logic: Implement a function to retrieve the top-k relevant document chunks from the vector store based on a user query.
  • Prompt Engineering: Design a prompt template that injects the retrieved context into the query for the LLM.
  • LLM Integration: Connect to a free-tier LLM (e.g., via OpenRouter or Groq) to generate answers.
  • Basic Guardrails: Implement and test basic guardrails for context validation and response length limits.
  • Enhanced Guardrails (Issue #24): βœ… COMPLETED - Comprehensive guardrails and response quality system:
    • Content Safety Filtering: PII detection, bias mitigation, inappropriate content filtering
    • Response Quality Scoring: Multi-dimensional quality assessment (relevance, completeness, coherence, source fidelity)
    • Source Attribution: Automated citation generation and validation
    • Error Handling: Circuit breaker patterns and graceful degradation
    • Configuration System: Flexible thresholds and feature toggles
    • Testing: 13 comprehensive tests with 100% pass rate
    • Integration: Enhanced RAG pipeline with backward compatibility

7. Web Application Completion

  • Chat Interface: Implement a simple web chat interface for the / endpoint.
  • API Endpoint: Create the /chat API endpoint that receives user questions (POST) and returns model-generated answers with citations and snippets.
  • UI/UX: Ensure the web interface is clean, user-friendly, and handles loading/error states gracefully.
  • Testing: Write end-to-end tests for the chat functionality.

8. Evaluation

  • Evaluation Set: Create an evaluation set of 15-30 questions and corresponding "gold" answers covering various policy topics.
  • Metric Implementation: Develop scripts to calculate:
    • Answer Quality: Groundedness and Citation Accuracy.
    • System Metrics: Latency (p50/p95).
  • Execution: Run the evaluation and record the results.
  • Documentation: Summarize the evaluation results in design-and-evaluation.md.

9. Final Documentation and Submission

  • Design Document: Complete design-and-evaluation.md, justifying all major design choices (embedding model, chunking strategy, vector store, LLM, etc.).
  • README: Finalize the README.md with comprehensive setup, run, and testing instructions.
  • Demonstration Video: Record a 5-10 minute screen-share video demonstrating the deployed application, walking through the code architecture, explaining the evaluation results, and showing a successful CI/CD run.
  • Submission: Share the GitHub repository with the grader and submit the repository and video links.