# RAG Application Project Plan This plan outlines the steps to design, build, and deploy a Retrieval-Augmented Generation (RAG) application as per the project requirements, with a focus on achieving a grade of 5. The approach prioritizes early deployment and continuous integration, following Test-Driven Development (TDD) principles. ## 1. Foundational Setup - [x] **Repository:** Create a new GitHub repository. - [x] **Virtual Environment:** Set up a local Python virtual environment (`venv`). - [x] **Initial Files:** - Create `requirements.txt` with initial dependencies (`Flask`, `pytest`). - Create a `.gitignore` file for Python. - Create a `README.md` with initial setup instructions. - Create placeholder files: `deployed.md` and `design-and-evaluation.md`. - [x] **Testing Framework:** Establish a `tests/` directory and configure `pytest`. ## 2. "Hello World" Deployment - [x] **Minimal App:** Develop a minimal Flask application (`app.py`) with a `/health` endpoint that returns a JSON status object. - [x] **Unit Test:** Write a test for the `/health` endpoint to ensure it returns a `200 OK` status and the correct JSON payload. - [x] **Local Validation:** Run the app and tests locally to confirm everything works. ## 3. CI/CD and Initial Deployment - [x] **Render Setup:** Create a new Web Service on Render and link it to the GitHub repository. - [x] **Environment Configuration:** Configure necessary environment variables on Render (e.g., `PYTHON_VERSION`). - [x] **GitHub Actions:** Create a CI/CD workflow (`.github/workflows/main.yml`) that: - Triggers on push/PR to the `main` branch. - Installs dependencies from `requirements.txt`. - Runs the `pytest` test suite. - On success, triggers a deployment to Render. - [x] **Deployment Validation:** Push a change and verify that the workflow runs successfully and the application is deployed. - [ ] **Documentation:** Update `deployed.md` with the live URL of the deployed application. ### CI/CD optimizations added - [x] Add pip cache to CI to speed up dependency installation. - [x] Optimize pre-commit in PRs to run only changed-file hooks (use `pre-commit run --from-ref ... --to-ref ...`). ## 4. Data Ingestion and Processing - [x] **Corpus Assembly:** Collect or generate 5-20 policy documents (PDF, TXT, MD) and place them in a `synthetic_policies/` directory. - [x] **Parsing Logic:** Implement and test functions to parse different document formats. - [x] **Chunking Strategy:** Implement and test a document chunking strategy (e.g., recursive character splitting with overlap). - [x] **Reproducibility:** Set fixed seeds for any processes involving randomness (e.g., chunking, sampling) to ensure deterministic outcomes. ## 5. Embedding and Vector Storage ✅ **PHASE 2B COMPLETED** - [x] **Vector DB Setup:** Integrate a vector database (ChromaDB) into the project. - [x] **Embedding Model:** Select and integrate a free embedding model (sentence-transformers/all-MiniLM-L6-v2). - [x] **Ingestion Pipeline:** Create enhanced ingestion pipeline that: - Loads documents from the corpus. - Chunks the documents with metadata. - Embeds the chunks using sentence-transformers. - Stores the embeddings in ChromaDB vector database. - Provides detailed processing statistics. - [x] **Testing:** Write comprehensive tests (60+ tests) verifying each step of the ingestion pipeline. - [x] **Search API:** Implement POST `/search` endpoint for semantic search with: - JSON request/response format - Configurable parameters (top_k, threshold) - Comprehensive input validation - Detailed error handling - [x] **End-to-End Testing:** Complete pipeline testing from ingestion through search. - [x] **Documentation:** Full API documentation with examples and performance metrics. ## 6. RAG Core Implementation ✅ **PHASE 3 COMPLETED** - [x] **Retrieval Logic:** Implement a function to retrieve the top-k relevant document chunks from the vector store based on a user query. - [x] **Prompt Engineering:** Design a prompt template that injects the retrieved context into the query for the LLM. - [x] **LLM Integration:** Connect to a free-tier LLM (e.g., via OpenRouter or Groq) to generate answers. - [x] **Basic Guardrails:** Implement and test basic guardrails for context validation and response length limits. - [x] **Enhanced Guardrails (Issue #24):** ✅ **COMPLETED** - Comprehensive guardrails and response quality system: - [x] **Content Safety Filtering:** PII detection, bias mitigation, inappropriate content filtering - [x] **Response Quality Scoring:** Multi-dimensional quality assessment (relevance, completeness, coherence, source fidelity) - [x] **Source Attribution:** Automated citation generation and validation - [x] **Error Handling:** Circuit breaker patterns and graceful degradation - [x] **Configuration System:** Flexible thresholds and feature toggles - [x] **Testing:** 13 comprehensive tests with 100% pass rate - [x] **Integration:** Enhanced RAG pipeline with backward compatibility ## 7. Web Application Completion - [ ] **Chat Interface:** Implement a simple web chat interface for the `/` endpoint. - [x] **API Endpoint:** Create the `/chat` API endpoint that receives user questions (POST) and returns model-generated answers with citations and snippets. - [ ] **UI/UX:** Ensure the web interface is clean, user-friendly, and handles loading/error states gracefully. - [x] **Testing:** Write end-to-end tests for the chat functionality. ## 8. Evaluation - [ ] **Evaluation Set:** Create an evaluation set of 15-30 questions and corresponding "gold" answers covering various policy topics. - [ ] **Metric Implementation:** Develop scripts to calculate: - **Answer Quality:** Groundedness and Citation Accuracy. - **System Metrics:** Latency (p50/p95). - [ ] **Execution:** Run the evaluation and record the results. - [ ] **Documentation:** Summarize the evaluation results in `design-and-evaluation.md`. ## 9. Final Documentation and Submission - [ ] **Design Document:** Complete `design-and-evaluation.md`, justifying all major design choices (embedding model, chunking strategy, vector store, LLM, etc.). - [ ] **README:** Finalize the `README.md` with comprehensive setup, run, and testing instructions. - [ ] **Demonstration Video:** Record a 5-10 minute screen-share video demonstrating the deployed application, walking through the code architecture, explaining the evaluation results, and showing a successful CI/CD run. - [ ] **Submission:** Share the GitHub repository with the grader and submit the repository and video links.