Spaces:
Sleeping
Sleeping
| # RAG Application Project Plan | |
| This plan outlines the steps to design, build, and deploy a Retrieval-Augmented Generation (RAG) application as per the project requirements, with a focus on achieving a grade of 5. The approach prioritizes early deployment and continuous integration, following Test-Driven Development (TDD) principles. | |
| ## 1. Foundational Setup | |
| - [x] **Repository:** Create a new GitHub repository. | |
| - [x] **Virtual Environment:** Set up a local Python virtual environment (`venv`). | |
| - [x] **Initial Files:** | |
| - Create `requirements.txt` with initial dependencies (`Flask`, `pytest`). | |
| - Create a `.gitignore` file for Python. | |
| - Create a `README.md` with initial setup instructions. | |
| - Create placeholder files: `deployed.md` and `design-and-evaluation.md`. | |
| - [x] **Testing Framework:** Establish a `tests/` directory and configure `pytest`. | |
| ## 2. "Hello World" Deployment | |
| - [x] **Minimal App:** Develop a minimal Flask application (`app.py`) with a `/health` endpoint that returns a JSON status object. | |
| - [x] **Unit Test:** Write a test for the `/health` endpoint to ensure it returns a `200 OK` status and the correct JSON payload. | |
| - [x] **Local Validation:** Run the app and tests locally to confirm everything works. | |
| ## 3. CI/CD and Initial Deployment | |
| - [x] **Render Setup:** Create a new Web Service on Render and link it to the GitHub repository. | |
| - [x] **Environment Configuration:** Configure necessary environment variables on Render (e.g., `PYTHON_VERSION`). | |
| - [x] **GitHub Actions:** Create a CI/CD workflow (`.github/workflows/main.yml`) that: | |
| - Triggers on push/PR to the `main` branch. | |
| - Installs dependencies from `requirements.txt`. | |
| - Runs the `pytest` test suite. | |
| - On success, triggers a deployment to Render. | |
| - [x] **Deployment Validation:** Push a change and verify that the workflow runs successfully and the application is deployed. | |
| - [ ] **Documentation:** Update `deployed.md` with the live URL of the deployed application. | |
| ### CI/CD optimizations added | |
| - [x] Add pip cache to CI to speed up dependency installation. | |
| - [x] Optimize pre-commit in PRs to run only changed-file hooks (use `pre-commit run --from-ref ... --to-ref ...`). | |
| ## 4. Data Ingestion and Processing | |
| - [x] **Corpus Assembly:** Collect or generate 5-20 policy documents (PDF, TXT, MD) and place them in a `synthetic_policies/` directory. | |
| - [x] **Parsing Logic:** Implement and test functions to parse different document formats. | |
| - [x] **Chunking Strategy:** Implement and test a document chunking strategy (e.g., recursive character splitting with overlap). | |
| - [x] **Reproducibility:** Set fixed seeds for any processes involving randomness (e.g., chunking, sampling) to ensure deterministic outcomes. | |
| ## 5. Embedding and Vector Storage β **PHASE 2B COMPLETED** | |
| - [x] **Vector DB Setup:** Integrate a vector database (ChromaDB) into the project. | |
| - [x] **Embedding Model:** Select and integrate a free embedding model (`paraphrase-MiniLM-L3-v2` chosen for memory efficiency). | |
| - [x] **Ingestion Pipeline:** Create enhanced ingestion pipeline that: | |
| - Loads documents from the corpus. | |
| - Chunks the documents with metadata. | |
| - Embeds the chunks using sentence-transformers. | |
| - Stores the embeddings in ChromaDB vector database. | |
| - Provides detailed processing statistics. | |
| - [x] **Testing:** Write comprehensive tests (60+ tests) verifying each step of the ingestion pipeline. | |
| - [x] **Search API:** Implement POST `/search` endpoint for semantic search with: | |
| - JSON request/response format | |
| - Configurable parameters (top_k, threshold) | |
| - Comprehensive input validation | |
| - Detailed error handling | |
| - [x] **End-to-End Testing:** Complete pipeline testing from ingestion through search. | |
| - [x] **Documentation:** Full API documentation with examples and performance metrics. | |
| ## 6. RAG Core Implementation β **PHASE 3 COMPLETED** | |
| - [x] **Retrieval Logic:** Implement a function to retrieve the top-k relevant document chunks from the vector store based on a user query. | |
| - [x] **Prompt Engineering:** Design a prompt template that injects the retrieved context into the query for the LLM. | |
| - [x] **LLM Integration:** Connect to a free-tier LLM (e.g., via OpenRouter or Groq) to generate answers. | |
| - [x] **Basic Guardrails:** Implement and test basic guardrails for context validation and response length limits. | |
| - [x] **Enhanced Guardrails (Issue #24):** β **COMPLETED** - Comprehensive guardrails and response quality system: | |
| - [x] **Content Safety Filtering:** PII detection, bias mitigation, inappropriate content filtering | |
| - [x] **Response Quality Scoring:** Multi-dimensional quality assessment (relevance, completeness, coherence, source fidelity) | |
| - [x] **Source Attribution:** Automated citation generation and validation | |
| - [x] **Error Handling:** Circuit breaker patterns and graceful degradation | |
| - [x] **Configuration System:** Flexible thresholds and feature toggles | |
| - [x] **Testing:** 13 comprehensive tests with 100% pass rate | |
| - [x] **Integration:** Enhanced RAG pipeline with backward compatibility | |
| ## 7. Web Application Completion | |
| - [x] **Chat Interface:** β **COMPLETED** - Implement a simple web chat interface for the `/` endpoint. | |
| - [x] **Modern Chat UI:** Interactive chat interface with real-time messaging | |
| - [x] **Message History:** Conversation display with user and assistant messages | |
| - [x] **Source Citations:** Visual display of source documents and confidence scores | |
| - [x] **Responsive Design:** Mobile-friendly interface with modern styling | |
| - [x] **Error Handling:** Graceful error display and loading states | |
| - [x] **System Health:** Status indicators and health monitoring | |
| - [x] **API Endpoint:** Create the `/chat` API endpoint that receives user questions (POST) and returns model-generated answers with citations and snippets. | |
| - [x] **UI/UX:** β **COMPLETED** - Ensure the web interface is clean, user-friendly, and handles loading/error states gracefully. | |
| - [x] **Testing:** Write end-to-end tests for the chat functionality. | |
| ## 7.5. Memory Management & Production Optimization β **COMPLETED** | |
| - [x] **Memory Architecture Redesign:** β **COMPLETED** - Comprehensive memory optimization for cloud deployment: | |
| - [x] **App Factory Pattern:** Migrated from monolithic to factory pattern with lazy loading | |
| - **Impact:** 87% reduction in startup memory (400MB β 50MB) | |
| - **Benefit:** Services initialize only when needed, improving resource efficiency | |
| - [x] **Embedding Model Optimization:** Changed from `all-MiniLM-L6-v2` to `paraphrase-MiniLM-L3-v2` | |
| - **Memory Savings:** 75-85% reduction (550-1000MB β 132MB) | |
| - **Quality Impact:** <5% reduction in similarity scoring (acceptable trade-off) | |
| - **Deployment Viability:** Enables deployment on Render free tier (512MB limit) | |
| - [x] **Gunicorn Production Configuration:** Optimized for memory-constrained environments | |
| - **Configuration:** Single worker, 2 threads, max_requests=50 | |
| - **Memory Control:** Prevent memory leaks with automatic worker restart | |
| - **Performance:** Balanced for I/O-bound LLM operations | |
| - [x] **Memory Management Utilities:** β **COMPLETED** - Comprehensive memory monitoring and optimization: | |
| - [x] **MemoryManager Class:** Context manager for memory tracking and cleanup | |
| - [x] **Real-time Monitoring:** Memory usage tracking with automatic garbage collection | |
| - [x] **Memory Statistics:** Detailed memory reporting for production monitoring | |
| - [x] **Error Recovery:** Memory-aware error handling with graceful degradation | |
| - [x] **Health Integration:** Memory metrics exposed via `/health` endpoint | |
| - [x] **Database Pre-building Strategy:** β **COMPLETED** - Eliminate deployment memory spikes: | |
| - [x] **Local Database Building:** `build_embeddings.py` script for development | |
| - [x] **Repository Commitment:** Pre-built vector database (25MB) committed to git | |
| - [x] **Deployment Optimization:** Zero embedding generation on production startup | |
| - [x] **Memory Impact:** Avoid 150MB+ memory spikes during embedding generation | |
| - [x] **Production Deployment Optimization:** β **COMPLETED** - Full production readiness: | |
| - [x] **Memory Profiling:** Comprehensive memory usage analysis and optimization | |
| - [x] **Performance Testing:** Load testing with memory constraints validation | |
| - [x] **Error Handling:** Production-grade error recovery for memory pressure | |
| - [x] **Monitoring Integration:** Real-time memory tracking and alerting | |
| - [x] **Documentation:** Complete memory management documentation across all files | |
| - [x] **Testing & Validation:** β **COMPLETED** - Memory-aware testing infrastructure: | |
| - [x] **Memory Constraint Testing:** All 138 tests pass with memory optimizations | |
| - [x] **Performance Regression Testing:** Response time validation maintained | |
| - [x] **Memory Leak Detection:** Long-running tests validate memory stability | |
| - [x] **Production Simulation:** Testing in memory-constrained environments | |
| ## 8. Evaluation | |
| - [ ] **Evaluation Set:** Create an evaluation set of 15-30 questions and corresponding "gold" answers covering various policy topics. | |
| - [ ] **Metric Implementation:** Develop scripts to calculate: | |
| - **Answer Quality:** Groundedness and Citation Accuracy. | |
| - **System Metrics:** Latency (p50/p95). | |
| - [ ] **Execution:** Run the evaluation and record the results. | |
| - [ ] **Documentation:** Summarize the evaluation results in `design-and-evaluation.md`. | |
| ## 9. Final Documentation and Submission | |
| - [x] **Design Document:** β **COMPLETED** - Complete `design-and-evaluation.md` with comprehensive technical analysis: | |
| - [x] **Memory Architecture Design:** Detailed analysis of memory-constrained architecture decisions | |
| - [x] **Performance Evaluation:** Comprehensive memory usage, response time, and quality metrics | |
| - [x] **Model Selection Analysis:** Embedding model comparison with memory vs quality trade-offs | |
| - [x] **Production Deployment Evaluation:** Platform compatibility and scalability analysis | |
| - [x] **Design Trade-offs Documentation:** Lessons learned and future considerations | |
| - [x] **README:** β **COMPLETED** - Comprehensive documentation with memory management focus: | |
| - [x] **Memory Management Section:** Detailed memory optimization architecture and utilities | |
| - [x] **Production Configuration:** Gunicorn, database pre-building, and deployment strategies | |
| - [x] **Performance Metrics:** Memory usage breakdown and production performance data | |
| - [x] **Setup Instructions:** Memory-aware development and deployment guidelines | |
| - [x] **Deployment Documentation:** β **COMPLETED** - Updated `deployed.md` with production details: | |
| - [x] **Memory-Optimized Configuration:** Production memory profile and optimization results | |
| - [x] **Performance Metrics:** Real-time memory monitoring and capacity analysis | |
| - [x] **Production Features:** Memory management system and error handling documentation | |
| - [x] **Deployment Pipeline:** CI/CD integration with memory validation | |
| - [x] **Contributing Guidelines:** β **COMPLETED** - Updated `CONTRIBUTING.md` with memory-conscious development: | |
| - [x] **Memory Development Principles:** Guidelines for memory-efficient code patterns | |
| - [x] **Memory Testing Procedures:** Development workflow for memory constraint validation | |
| - [x] **Code Review Guidelines:** Memory-focused review checklist and best practices | |
| - [x] **Production Testing:** Memory leak detection and performance validation procedures | |
| - [ ] **Demonstration Video:** Record a 5-10 minute screen-share video demonstrating the deployed application, walking through the code architecture, explaining the evaluation results, and showing a successful CI/CD run. | |
| - [ ] **Submission:** Share the GitHub repository with the grader and submit the repository and video links. | |