Spaces:
Sleeping
Sleeping
| AI Engineering Project | |
| Project Overview | |
| For this project, you will be designing, building, and evaluating a Retrieval-Augmented | |
| Generation (RAG) LLM-based application that answers user questions about a corpus of | |
| company policies & procedures. You will then deploy the application to a free-tier host | |
| (e.g., Render, Railway) with a basic CI/CD pipeline (e.g., GitHub Actions) that triggers | |
| deployment on push/PR when the app builds successfully. Finally, you will demonstrate | |
| the system via a screen-share video showing key features of your deployed application, | |
| and a quick walkthrough of your design, evaluation and CI/CD run. You can complete this | |
| project either individually or as a group of no more than three people. | |
| While you can fully hand code this project if you wish, you are highly encouraged to | |
| utilize leading AI code generation models/AI IDEs/async agents to assist in rapidly | |
| producing your solution, being sure to describe in broad terms how you made use of | |
| them. Here are some examples of very useful AI tools you may wish to consider. You will | |
| be graded on the quality and functionality of the application and how well it meets the | |
| project requirements—no given proportion of the code is required to be hand coded. | |
| Learning Outcomes | |
| When completed successfully, this project will enable you to: | |
| ● Demonstrate excellent AI engineering skills | |
| ● Demonstrate the ability to select appropriate AI application design and | |
| architecture | |
| ● Implement a working LLM-based application including RAG | |
| ● Evaluate the performance of an LLM-based application | |
| ● Utilize AI tooling as appropriate | |
| Project Description | |
| First, assemble a small but coherent corpus of documents outlining company policies & | |
| procedures—about 5–20 short markdown/HTML/PDF/TXT files totaling 30–120 pages. | |
| You may author them yourself (with AI assistance) or use policies that you are aware of | |
| from your own organization that can be used for this assignment. Students must use a | |
| corpus they can legally include in the repo or load at runtime (e.g., your own synthetic | |
| policies, your organization’s employee policy documents etc.)—no private/paid data is | |
| required. Additionally, you should define success metrics for your application (see the | |
| “Evaluation” step below), including at least one information-quality metric (e.g., | |
| groundedness or citation accuracy) and one system metric (e.g., latency). | |
| Use free or zero-cost options when possible e.g., OpenRouter’s free tier | |
| (https://openrouter.ai/docs/api-reference/limits), Groq | |
| (https://console.groq.com/docs/rate-limits), or your own paid API keys if you have them. | |
| For embedding models, free-tier options are available from Cohere, Voyage, | |
| HuggingFace and others | |
| Complete the following steps to fully develop, deploy, and evaluate your application: | |
| Environment and Reproducibility | |
| ○ Create a virtual environment (e.g., venv, conda). | |
| ○ List dependencies in requirements.txt (or environment.yml). | |
| ○ Provide a README.md with setup + run instructions. | |
| ○ Set fixed seeds where/if applicable (for deterministic chunking or | |
| evaluation sampling). | |
| Ingestion and Indexing | |
| ○ Parse & clean documents (handle PDFs/HTML/md/txt). | |
| ○ Chunk documents (e.g., by headings or token windows with overlap). | |
| ○ Embed chunks with a free embedding model or a free-tier API. | |
| ○ Store the embedded document chunks in a local or lightweight vector | |
| database (e.g. Chroma or an optionally cloud-hosted vector store like | |
| Pinecone, etc.) | |
| ○ Store vectors in a local/vector DB or cloud DB (e.g., Chroma, Pinecone, etc.) | |
| Retrieval and Generation (RAG) | |
| ○ To build your RAG pipeline you may use frameworks such as LangChain to | |
| handle retrieval, prompt chaining, and API calls, or implement these | |
| manually. | |
| ○ Implement Top-k retrieval with optional re-ranking. | |
| ○ Build a prompting strategy that injects retrieved chunks (and | |
| citations/sources) into the LLM context. | |
| ○ Add basic guardrails: | |
| ■ Refuse to answer outside the corpus (“I can only answer about our | |
| policies”), | |
| ■ Limit output length, | |
| ■ Always cite source doc IDs/titles for answers. | |
| Web Application | |
| ○ Students can use Flask, Streamlit or alernative for the Web app. LangChain | |
| is recommended for orchestration, but is optional. | |
| ○ Endpoints/UI: | |
| ■ / - Web chat interface (text box for user input) | |
| ■ /chat - API endpoint that receives user questions (POST) and returns | |
| model-generated answers with citations and snippets (link to source | |
| and show snippet). | |
| ■ /health - returns simple status via JSON. | |
| Deployment | |
| ○ For production hosting use Render or Railway free tiers; students may | |
| alternatively use any other free-tier providers of their choice. | |
| ○ Configure environment variables (e.g. API keys, model endpoints, DB | |
| related etc.). | |
| ○ Ensure the app is publicly accessible at a shareable URL. | |
| CI/CD | |
| ○ Minimal automated testing is sufficient for this assignment (a build/run | |
| check, optional smoke test). | |
| ○ Create a GitHub Actions workflow that on push/PR : | |
| ■ Installs dependencies, | |
| ■ Runs a build/start check (e.g., python -m pip install -r | |
| requirements.txt and python -c "import app" or pytest -q if you add | |
| tests), | |
| ■ On success in main, deploy to your host (Render/Railway action or | |
| via webhook/API). | |
| Evaluation of the LLM Application | |
| ○ Provide a small evaluation set of 15–30 questions covering various policy | |
| topics (PTO, security, expense, remote work, holidays, etc.). Report: | |
| ■ Answer Quality (required): | |
| 1. Groundedness: % of answers whose content is factually | |
| consistent with and fully supported by the retrieved | |
| evidence—i.e., the answer contains no information that is | |
| absent or contradicted in the context. | |
| Citation Accuracy: % of answers whose listed citations | |
| correctly point to the specific passage(s) that support the | |
| information stated—i.e., the attribution is correct and not | |
| misleading. | |
| Exact/Partial Match (optional): % of answers that exactly or | |
| partially match a short gold answer you provide. | |
| ■ System Metrics (required): | |
| Latency (p50/p95) from request to answer for 10–20 queries. | |
| ■ Ablations (optional): compare retrieval k, chunk size, or prompt | |
| variants. | |
| Design Documentation | |
| ○ Briefly justify design choices (embedding model, chunking, k, prompt | |
| format, vector store). | |
| Submission Guidelines | |
| Your final submission should consist of two links: | |
| ● A link to an accessible software repository (a GitHub repo) containing all your | |
| developed code. You must share your repository with the GitHub account, | |
| quantic-grader. | |
| o The GitHub repository should include a link to the deployed version of | |
| your RAG LLM-based application (in file deployed.md) | |
| o The GitHub repository must include a README.md file indicating setup and | |
| run instructions | |
| o The GitHub repository must also include a brief design and evaluation | |
| document (design-and-evaluation.md) listing and explaining: | |
| i) design and architecture decisions made - and why they were made, | |
| including technology choices | |
| ii) summary of your evaluation of your RAG system | |
| ● A link to a recorded screen-share demonstration video of the working RAG | |
| LLM-based application, involving screen capture of it being used with voiceover | |
| o All group members must speak and be present on camera. | |
| o All group members must show their government ID. | |
| o The demonstration/presentation should be between 5 and 10 minutes long. | |
| To submit your project, please click on the "Submit Project" button on your dashboard | |
| and follow the steps provided. If you are submitting your project as a group, please | |
| ensure only ONE member submits on behalf of the group. Please reach out to | |
| [email protected] if you have any questions. Project grading typically takes | |
| about 3-4 weeks to complete after the submission due date. There is no score penalty | |
| for projects submitted after the due date, however grading may be delayed. | |
| Plagiarism Policy | |
| Here at Quantic, we believe that learning is best accomplished by “doing”—this ethos | |
| underpinned the design of our active learning platform, and it likewise informs our | |
| approach to the completion of projects and presentations for our degree programs. We | |
| expect that all of our graduates will be able to deploy the concepts and skills they’ve | |
| learned over the course of their degree, whether in the workplace or in pursuit of | |
| personal goals, and so it is in our students’ best interest that these assignments be | |
| completed solely through their own efforts with academic integrity. | |
| Quantic takes academic integrity very seriously—we define plagiarism as: “Knowingly | |
| representing the work of others as one’s own, engaging in any acts of plagiarism, or | |
| referencing the works of others without appropriate citation.” This includes both misusing | |
| or not using proper citations for the works referenced, and submitting someone else’s | |
| work as your own. Quantic monitors all submissions for instances of plagiarism and all | |
| plagiarism, even unintentional, is considered a conduct violation. If you’re still not sure | |
| about what constitutes plagiarism, check out this two-minute presentation by our | |
| librarian, Kristina. It is important to be conscientious when citing your sources. When in | |
| doubt, cite! Kristina outlines the basics of best citation practices in this one-minute video. | |
| You can also find more about our plagiarism policy here. | |
| Project Rubric | |
| Scores 2 and above are considered passing. Students who receive a 1 or 0 will not get | |
| credit for the assignment and must revise and resubmit to receive a passing grade. | |
| Score Description | |
| 5 | |
| ● Addresses ALL of the project requirements, but not limited to: | |
| ○ Outstanding RAG application with correct responses with matching | |
| citations, ingest and indexing works | |
| ○ Excellent, well-structured application architecture | |
| ○ Public deployment on Render, Railway (or equivalent) fully functional | |
| ○ CI/CD runs on push/PR and deploys on success | |
| ○ Excellent documentation of design choices. | |
| ○ Excellent evaluation results, which includes groundedness, citation | |
| accuracy, and latency | |
| ○ Excellent, clear demo of features, design and evaluation | |
| 4 | |
| ● Addresses MOST of the project requirements, but not limited to: | |
| ○ Excellent RAG application with correct responses with generally | |
| matching citations, ingest and indexing works | |
| ○ Very good, well-structured application architecture | |
| ○ Public deployment on Render, Railway (or equivalent) almost fully | |
| functional | |
| ○ CI/CD runs on push/PR and deploys on success | |
| ○ Very good documentation of design choices. | |
| ○ Very good evaluation results which includes groundedness, citation | |
| accuracy, and latency | |
| ○ Very good, clear demo of features, design and evaluation | |
| 3 | |
| ● Addresses SOME of the project requirements, but not limited to: | |
| ○ Very good RAG application with mainly correct responses with | |
| generally matching citations, ingest and indexing works | |
| ○ Good, well-structured application architecture | |
| ○ Public deployment on Render, Railway (or equivalent) almost fully | |
| functional | |
| ○ CI/CD runs on push/PR and deploys on success | |
| ○ Good documentation of design choices. | |
| ○ Good evaluation results which includes most of groundedness, | |
| citation accuracy, and latency | |
| ○ Good, clear demo of features, design and evaluation. | |
| 2 | |
| ● Addresses FEW of the project requirements, but not limited to: | |
| ○ Passable RAG application with limited correct responses with few | |
| matching citations, ingest and indexing works partially | |
| ○ Passable application architecture | |
| ○ Public deployment on Render, Railway (or equivalent) not fully | |
| functional | |
| ○ CI/CD runs on push/PR and deploys on success | |
| ○ Passable documentation of design choices. | |
| ○ Passable evaluation results which includes only some of | |
| groundedness, citation accuracy, and latency | |
| ○ Passable demo of features, design and evaluation | |
| 1 | |
| ● Addresses the project but MOST of the project requirements are missing, | |
| but not limited to: | |
| ○ Incomplete app; not deployed, | |
| ○ No CI/CD, | |
| ○ No to very limited evaluation | |
| ○ No design documentation | |
| ○ No demo of application | |
| 0 | |
| ● The student either did not complete the assignment, plagiarized all or part | |
| of the assignment, or completely failed to address the project requirements. | |