Spaces:

sethmcknight
/

msse-ai-engineering

Sleeping

App Files Files Community

msse-ai-engineering / project-plan.md

Tobias Pasquale

docs: Verify LLM integration operational status

9452a54 2 months ago

preview code

raw

history blame

6.61 kB

RAG Application Project Plan

This plan outlines the steps to design, build, and deploy a Retrieval-Augmented Generation (RAG) application as per the project requirements, with a focus on achieving a grade of 5. The approach prioritizes early deployment and continuous integration, following Test-Driven Development (TDD) principles.

1. Foundational Setup

Repository: Create a new GitHub repository.
Virtual Environment: Set up a local Python virtual environment (venv).
Initial Files:
- Create requirements.txt with initial dependencies (Flask, pytest).
- Create a .gitignore file for Python.
- Create a README.md with initial setup instructions.
- Create placeholder files: deployed.md and design-and-evaluation.md.
Testing Framework: Establish a tests/ directory and configure pytest.

2. "Hello World" Deployment

Minimal App: Develop a minimal Flask application (app.py) with a /health endpoint that returns a JSON status object.
Unit Test: Write a test for the /health endpoint to ensure it returns a 200 OK status and the correct JSON payload.
Local Validation: Run the app and tests locally to confirm everything works.

3. CI/CD and Initial Deployment

Render Setup: Create a new Web Service on Render and link it to the GitHub repository.
Environment Configuration: Configure necessary environment variables on Render (e.g., PYTHON_VERSION).
GitHub Actions: Create a CI/CD workflow (.github/workflows/main.yml) that:
- Triggers on push/PR to the main branch.
- Installs dependencies from requirements.txt.
- Runs the pytest test suite.
- On success, triggers a deployment to Render.
Deployment Validation: Push a change and verify that the workflow runs successfully and the application is deployed.
Documentation: Update deployed.md with the live URL of the deployed application.

CI/CD optimizations added

Add pip cache to CI to speed up dependency installation.
Optimize pre-commit in PRs to run only changed-file hooks (use pre-commit run --from-ref ... --to-ref ...).

4. Data Ingestion and Processing

Corpus Assembly: Collect or generate 5-20 policy documents (PDF, TXT, MD) and place them in a synthetic_policies/ directory.
Parsing Logic: Implement and test functions to parse different document formats.
Chunking Strategy: Implement and test a document chunking strategy (e.g., recursive character splitting with overlap).
Reproducibility: Set fixed seeds for any processes involving randomness (e.g., chunking, sampling) to ensure deterministic outcomes.

5. Embedding and Vector Storage ✅ PHASE 2B COMPLETED

Vector DB Setup: Integrate a vector database (ChromaDB) into the project.
Embedding Model: Select and integrate a free embedding model (sentence-transformers/all-MiniLM-L6-v2).
Ingestion Pipeline: Create enhanced ingestion pipeline that:
- Loads documents from the corpus.
- Chunks the documents with metadata.
- Embeds the chunks using sentence-transformers.
- Stores the embeddings in ChromaDB vector database.
- Provides detailed processing statistics.
Testing: Write comprehensive tests (60+ tests) verifying each step of the ingestion pipeline.
Search API: Implement POST /search endpoint for semantic search with:
- JSON request/response format
- Configurable parameters (top_k, threshold)
- Comprehensive input validation
- Detailed error handling
End-to-End Testing: Complete pipeline testing from ingestion through search.
Documentation: Full API documentation with examples and performance metrics.

6. RAG Core Implementation ✅ PHASE 3 COMPLETED

Retrieval Logic: Implement a function to retrieve the top-k relevant document chunks from the vector store based on a user query.
Prompt Engineering: Design a prompt template that injects the retrieved context into the query for the LLM.
LLM Integration: Connect to a free-tier LLM (e.g., via OpenRouter or Groq) to generate answers.
Basic Guardrails: Implement and test basic guardrails for context validation and response length limits.
Enhanced Guardrails (Issue #24): ✅ COMPLETED - Comprehensive guardrails and response quality system:
- Content Safety Filtering: PII detection, bias mitigation, inappropriate content filtering
- Response Quality Scoring: Multi-dimensional quality assessment (relevance, completeness, coherence, source fidelity)
- Source Attribution: Automated citation generation and validation
- Error Handling: Circuit breaker patterns and graceful degradation
- Configuration System: Flexible thresholds and feature toggles
- Testing: 13 comprehensive tests with 100% pass rate
- Integration: Enhanced RAG pipeline with backward compatibility

7. Web Application Completion

Chat Interface: Implement a simple web chat interface for the / endpoint.
API Endpoint: Create the /chat API endpoint that receives user questions (POST) and returns model-generated answers with citations and snippets.
UI/UX: Ensure the web interface is clean, user-friendly, and handles loading/error states gracefully.
Testing: Write end-to-end tests for the chat functionality.

8. Evaluation

Evaluation Set: Create an evaluation set of 15-30 questions and corresponding "gold" answers covering various policy topics.
Metric Implementation: Develop scripts to calculate:
- Answer Quality: Groundedness and Citation Accuracy.
- System Metrics: Latency (p50/p95).
Execution: Run the evaluation and record the results.
Documentation: Summarize the evaluation results in design-and-evaluation.md.

9. Final Documentation and Submission

Design Document: Complete design-and-evaluation.md, justifying all major design choices (embedding model, chunking strategy, vector store, LLM, etc.).
README: Finalize the README.md with comprehensive setup, run, and testing instructions.
Demonstration Video: Record a 5-10 minute screen-share video demonstrating the deployed application, walking through the code architecture, explaining the evaluation results, and showing a successful CI/CD run.
Submission: Share the GitHub repository with the grader and submit the repository and video links.