Spaces:
Sleeping
Sleeping
File size: 11,802 Bytes
2d9ce15 e3e3a84 2d9ce15 c4b28eb 2d9ce15 df316c5 2d9ce15 c4b28eb 2d9ce15 2d593b8 5665dd3 2d9ce15 5665dd3 2d9ce15 5665dd3 0a7f9b4 5665dd3 2d9ce15 5665dd3 2d9ce15 623bc2c 2d9ce15 74e758d 9452a54 74e758d 9452a54 2d9ce15 32e4125 0a7f9b4 32e4125 2d9ce15 32e4125 2d9ce15 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
# RAG Application Project Plan
This plan outlines the steps to design, build, and deploy a Retrieval-Augmented Generation (RAG) application as per the project requirements, with a focus on achieving a grade of 5. The approach prioritizes early deployment and continuous integration, following Test-Driven Development (TDD) principles.
## 1. Foundational Setup
- [x] **Repository:** Create a new GitHub repository.
- [x] **Virtual Environment:** Set up a local Python virtual environment (`venv`).
- [x] **Initial Files:**
- Create `requirements.txt` with initial dependencies (`Flask`, `pytest`).
- Create a `.gitignore` file for Python.
- Create a `README.md` with initial setup instructions.
- Create placeholder files: `deployed.md` and `design-and-evaluation.md`.
- [x] **Testing Framework:** Establish a `tests/` directory and configure `pytest`.
## 2. "Hello World" Deployment
- [x] **Minimal App:** Develop a minimal Flask application (`app.py`) with a `/health` endpoint that returns a JSON status object.
- [x] **Unit Test:** Write a test for the `/health` endpoint to ensure it returns a `200 OK` status and the correct JSON payload.
- [x] **Local Validation:** Run the app and tests locally to confirm everything works.
## 3. CI/CD and Initial Deployment
- [x] **Render Setup:** Create a new Web Service on Render and link it to the GitHub repository.
- [x] **Environment Configuration:** Configure necessary environment variables on Render (e.g., `PYTHON_VERSION`).
- [x] **GitHub Actions:** Create a CI/CD workflow (`.github/workflows/main.yml`) that:
- Triggers on push/PR to the `main` branch.
- Installs dependencies from `requirements.txt`.
- Runs the `pytest` test suite.
- On success, triggers a deployment to Render.
- [x] **Deployment Validation:** Push a change and verify that the workflow runs successfully and the application is deployed.
- [ ] **Documentation:** Update `deployed.md` with the live URL of the deployed application.
### CI/CD optimizations added
- [x] Add pip cache to CI to speed up dependency installation.
- [x] Optimize pre-commit in PRs to run only changed-file hooks (use `pre-commit run --from-ref ... --to-ref ...`).
## 4. Data Ingestion and Processing
- [x] **Corpus Assembly:** Collect or generate 5-20 policy documents (PDF, TXT, MD) and place them in a `synthetic_policies/` directory.
- [x] **Parsing Logic:** Implement and test functions to parse different document formats.
- [x] **Chunking Strategy:** Implement and test a document chunking strategy (e.g., recursive character splitting with overlap).
- [x] **Reproducibility:** Set fixed seeds for any processes involving randomness (e.g., chunking, sampling) to ensure deterministic outcomes.
## 5. Embedding and Vector Storage β
**PHASE 2B COMPLETED**
- [x] **Vector DB Setup:** Integrate a vector database (ChromaDB) into the project.
- [x] **Embedding Model:** Select and integrate a free embedding model (`paraphrase-MiniLM-L3-v2` chosen for memory efficiency).
- [x] **Ingestion Pipeline:** Create enhanced ingestion pipeline that:
- Loads documents from the corpus.
- Chunks the documents with metadata.
- Embeds the chunks using sentence-transformers.
- Stores the embeddings in ChromaDB vector database.
- Provides detailed processing statistics.
- [x] **Testing:** Write comprehensive tests (60+ tests) verifying each step of the ingestion pipeline.
- [x] **Search API:** Implement POST `/search` endpoint for semantic search with:
- JSON request/response format
- Configurable parameters (top_k, threshold)
- Comprehensive input validation
- Detailed error handling
- [x] **End-to-End Testing:** Complete pipeline testing from ingestion through search.
- [x] **Documentation:** Full API documentation with examples and performance metrics.
## 6. RAG Core Implementation β
**PHASE 3 COMPLETED**
- [x] **Retrieval Logic:** Implement a function to retrieve the top-k relevant document chunks from the vector store based on a user query.
- [x] **Prompt Engineering:** Design a prompt template that injects the retrieved context into the query for the LLM.
- [x] **LLM Integration:** Connect to a free-tier LLM (e.g., via OpenRouter or Groq) to generate answers.
- [x] **Basic Guardrails:** Implement and test basic guardrails for context validation and response length limits.
- [x] **Enhanced Guardrails (Issue #24):** β
**COMPLETED** - Comprehensive guardrails and response quality system:
- [x] **Content Safety Filtering:** PII detection, bias mitigation, inappropriate content filtering
- [x] **Response Quality Scoring:** Multi-dimensional quality assessment (relevance, completeness, coherence, source fidelity)
- [x] **Source Attribution:** Automated citation generation and validation
- [x] **Error Handling:** Circuit breaker patterns and graceful degradation
- [x] **Configuration System:** Flexible thresholds and feature toggles
- [x] **Testing:** 13 comprehensive tests with 100% pass rate
- [x] **Integration:** Enhanced RAG pipeline with backward compatibility
## 7. Web Application Completion
- [x] **Chat Interface:** β
**COMPLETED** - Implement a simple web chat interface for the `/` endpoint.
- [x] **Modern Chat UI:** Interactive chat interface with real-time messaging
- [x] **Message History:** Conversation display with user and assistant messages
- [x] **Source Citations:** Visual display of source documents and confidence scores
- [x] **Responsive Design:** Mobile-friendly interface with modern styling
- [x] **Error Handling:** Graceful error display and loading states
- [x] **System Health:** Status indicators and health monitoring
- [x] **API Endpoint:** Create the `/chat` API endpoint that receives user questions (POST) and returns model-generated answers with citations and snippets.
- [x] **UI/UX:** β
**COMPLETED** - Ensure the web interface is clean, user-friendly, and handles loading/error states gracefully.
- [x] **Testing:** Write end-to-end tests for the chat functionality.
## 7.5. Memory Management & Production Optimization β
**COMPLETED**
- [x] **Memory Architecture Redesign:** β
**COMPLETED** - Comprehensive memory optimization for cloud deployment:
- [x] **App Factory Pattern:** Migrated from monolithic to factory pattern with lazy loading
- **Impact:** 87% reduction in startup memory (400MB β 50MB)
- **Benefit:** Services initialize only when needed, improving resource efficiency
- [x] **Embedding Model Optimization:** Changed from `all-MiniLM-L6-v2` to `paraphrase-MiniLM-L3-v2`
- **Memory Savings:** 75-85% reduction (550-1000MB β 132MB)
- **Quality Impact:** <5% reduction in similarity scoring (acceptable trade-off)
- **Deployment Viability:** Enables deployment on Render free tier (512MB limit)
- [x] **Gunicorn Production Configuration:** Optimized for memory-constrained environments
- **Configuration:** Single worker, 2 threads, max_requests=50
- **Memory Control:** Prevent memory leaks with automatic worker restart
- **Performance:** Balanced for I/O-bound LLM operations
- [x] **Memory Management Utilities:** β
**COMPLETED** - Comprehensive memory monitoring and optimization:
- [x] **MemoryManager Class:** Context manager for memory tracking and cleanup
- [x] **Real-time Monitoring:** Memory usage tracking with automatic garbage collection
- [x] **Memory Statistics:** Detailed memory reporting for production monitoring
- [x] **Error Recovery:** Memory-aware error handling with graceful degradation
- [x] **Health Integration:** Memory metrics exposed via `/health` endpoint
- [x] **Database Pre-building Strategy:** β
**COMPLETED** - Eliminate deployment memory spikes:
- [x] **Local Database Building:** `build_embeddings.py` script for development
- [x] **Repository Commitment:** Pre-built vector database (25MB) committed to git
- [x] **Deployment Optimization:** Zero embedding generation on production startup
- [x] **Memory Impact:** Avoid 150MB+ memory spikes during embedding generation
- [x] **Production Deployment Optimization:** β
**COMPLETED** - Full production readiness:
- [x] **Memory Profiling:** Comprehensive memory usage analysis and optimization
- [x] **Performance Testing:** Load testing with memory constraints validation
- [x] **Error Handling:** Production-grade error recovery for memory pressure
- [x] **Monitoring Integration:** Real-time memory tracking and alerting
- [x] **Documentation:** Complete memory management documentation across all files
- [x] **Testing & Validation:** β
**COMPLETED** - Memory-aware testing infrastructure:
- [x] **Memory Constraint Testing:** All 138 tests pass with memory optimizations
- [x] **Performance Regression Testing:** Response time validation maintained
- [x] **Memory Leak Detection:** Long-running tests validate memory stability
- [x] **Production Simulation:** Testing in memory-constrained environments
## 8. Evaluation
- [ ] **Evaluation Set:** Create an evaluation set of 15-30 questions and corresponding "gold" answers covering various policy topics.
- [ ] **Metric Implementation:** Develop scripts to calculate:
- **Answer Quality:** Groundedness and Citation Accuracy.
- **System Metrics:** Latency (p50/p95).
- [ ] **Execution:** Run the evaluation and record the results.
- [ ] **Documentation:** Summarize the evaluation results in `design-and-evaluation.md`.
## 9. Final Documentation and Submission
- [x] **Design Document:** β
**COMPLETED** - Complete `design-and-evaluation.md` with comprehensive technical analysis:
- [x] **Memory Architecture Design:** Detailed analysis of memory-constrained architecture decisions
- [x] **Performance Evaluation:** Comprehensive memory usage, response time, and quality metrics
- [x] **Model Selection Analysis:** Embedding model comparison with memory vs quality trade-offs
- [x] **Production Deployment Evaluation:** Platform compatibility and scalability analysis
- [x] **Design Trade-offs Documentation:** Lessons learned and future considerations
- [x] **README:** β
**COMPLETED** - Comprehensive documentation with memory management focus:
- [x] **Memory Management Section:** Detailed memory optimization architecture and utilities
- [x] **Production Configuration:** Gunicorn, database pre-building, and deployment strategies
- [x] **Performance Metrics:** Memory usage breakdown and production performance data
- [x] **Setup Instructions:** Memory-aware development and deployment guidelines
- [x] **Deployment Documentation:** β
**COMPLETED** - Updated `deployed.md` with production details:
- [x] **Memory-Optimized Configuration:** Production memory profile and optimization results
- [x] **Performance Metrics:** Real-time memory monitoring and capacity analysis
- [x] **Production Features:** Memory management system and error handling documentation
- [x] **Deployment Pipeline:** CI/CD integration with memory validation
- [x] **Contributing Guidelines:** β
**COMPLETED** - Updated `CONTRIBUTING.md` with memory-conscious development:
- [x] **Memory Development Principles:** Guidelines for memory-efficient code patterns
- [x] **Memory Testing Procedures:** Development workflow for memory constraint validation
- [x] **Code Review Guidelines:** Memory-focused review checklist and best practices
- [x] **Production Testing:** Memory leak detection and performance validation procedures
- [ ] **Demonstration Video:** Record a 5-10 minute screen-share video demonstrating the deployed application, walking through the code architecture, explaining the evaluation results, and showing a successful CI/CD run.
- [ ] **Submission:** Share the GitHub repository with the grader and submit the repository and video links.
|