msse-ai-engineering / project-plan.md
sethmcknight
Add initial project files including README, .gitignore, and project documentation
2d9ce15
|
raw
history blame
5.19 kB
# RAG Application Project Plan
This plan outlines the steps to design, build, and deploy a Retrieval-Augmented Generation (RAG) application as per the project requirements, with a focus on achieving a grade of 5. The approach prioritizes early deployment and continuous integration, following Test-Driven Development (TDD) principles.
## 1. Foundational Setup
- [x] **Repository:** Create a new GitHub repository.
- [x] **Virtual Environment:** Set up a local Python virtual environment (`venv`).
- [x] **Initial Files:**
- Create `requirements.txt` with initial dependencies (`Flask`, `pytest`).
- Create a `.gitignore` file for Python.
- Create a `README.md` with initial setup instructions.
- Create placeholder files: `deployed.md` and `design-and-evaluation.md`.
- [x] **Testing Framework:** Establish a `tests/` directory and configure `pytest`.
## 2. "Hello World" Deployment
- [ ] **Minimal App:** Develop a minimal Flask application (`app.py`) with a `/health` endpoint that returns a JSON status object.
- [ ] **Unit Test:** Write a test for the `/health` endpoint to ensure it returns a `200 OK` status and the correct JSON payload.
- [ ] **Local Validation:** Run the app and tests locally to confirm everything works.
## 3. CI/CD and Initial Deployment
- [ ] **Render Setup:** Create a new Web Service on Render and link it to the GitHub repository.
- [ ] **Environment Configuration:** Configure necessary environment variables on Render (e.g., `PYTHON_VERSION`).
- [ ] **GitHub Actions:** Create a CI/CD workflow (`.github/workflows/main.yml`) that:
- Triggers on push/PR to the `main` branch.
- Installs dependencies from `requirements.txt`.
- Runs the `pytest` test suite.
- On success, triggers a deployment to Render.
- [ ] **Deployment Validation:** Push a change and verify that the workflow runs successfully and the application is deployed.
- [ ] **Documentation:** Update `deployed.md` with the live URL of the deployed application.
## 4. Data Ingestion and Processing
- [ ] **Corpus Assembly:** Collect or generate 5-20 policy documents (PDF, TXT, MD) and place them in a `corpus/` directory.
- [ ] **Parsing Logic:** Implement and test functions to parse different document formats.
- [ ] **Chunking Strategy:** Implement and test a document chunking strategy (e.g., recursive character splitting with overlap).
- [ ] **Reproducibility:** Set fixed seeds for any processes involving randomness (e.g., chunking, sampling) to ensure deterministic outcomes.
## 5. Embedding and Vector Storage
- [ ] **Vector DB Setup:** Integrate a vector database (e.g., ChromaDB) into the project.
- [ ] **Embedding Model:** Select and integrate a free embedding model (e.g., from HuggingFace).
- [ ] **Ingestion Pipeline:** Create a script (`ingest.py`) that:
- Loads documents from the corpus.
- Chunks the documents.
- Embeds the chunks.
- Stores the embeddings in the vector database.
- [ ] **Testing:** Write tests to verify each step of the ingestion pipeline.
## 6. RAG Core Implementation
- [ ] **Retrieval Logic:** Implement a function to retrieve the top-k relevant document chunks from the vector store based on a user query.
- [ ] **Prompt Engineering:** Design a prompt template that injects the retrieved context into the query for the LLM.
- [ ] **LLM Integration:** Connect to a free-tier LLM (e.g., via OpenRouter or Groq) to generate answers.
- [ ] **Guardrails:** Implement and test guardrails:
- Refuse to answer questions outside the corpus.
- Limit the length of the generated output.
- Ensure all answers cite the source document IDs/titles.
## 7. Web Application Completion
- [ ] **Chat Interface:** Implement a simple web chat interface for the `/` endpoint.
- [ ] **API Endpoint:** Create the `/chat` API endpoint that receives user questions (POST) and returns model-generated answers with citations and snippets.
- [ ] **UI/UX:** Ensure the web interface is clean, user-friendly, and handles loading/error states gracefully.
- [ ] **Testing:** Write end-to-end tests for the chat functionality.
## 8. Evaluation
- [ ] **Evaluation Set:** Create an evaluation set of 15-30 questions and corresponding "gold" answers covering various policy topics.
- [ ] **Metric Implementation:** Develop scripts to calculate:
- **Answer Quality:** Groundedness and Citation Accuracy.
- **System Metrics:** Latency (p50/p95).
- [ ] **Execution:** Run the evaluation and record the results.
- [ ] **Documentation:** Summarize the evaluation results in `design-and-evaluation.md`.
## 9. Final Documentation and Submission
- [ ] **Design Document:** Complete `design-and-evaluation.md`, justifying all major design choices (embedding model, chunking strategy, vector store, LLM, etc.).
- [ ] **README:** Finalize the `README.md` with comprehensive setup, run, and testing instructions.
- [ ] **Demonstration Video:** Record a 5-10 minute screen-share video demonstrating the deployed application, walking through the code architecture, explaining the evaluation results, and showing a successful CI/CD run.
- [ ] **Submission:** Share the GitHub repository with the grader and submit the repository and video links.