# Advanced RAG Chatbot - User Guide ## What's New? ### 1. Multiple Images & Texts Support in `/index` API The `/index` endpoint now supports indexing multiple texts and images in a single request (max 10 each). **Before:** ```python # Old: Only 1 text and 1 image data = { 'id': 'doc1', 'text': 'Single text', } files = {'image': open('image.jpg', 'rb')} ``` **After:** ```python # New: Multiple texts and images (max 10 each) data = { 'id': 'doc1', 'texts': ['Text 1', 'Text 2', 'Text 3'], # Up to 10 } files = [ ('images', open('image1.jpg', 'rb')), ('images', open('image2.jpg', 'rb')), ('images', open('image3.jpg', 'rb')), # Up to 10 ] response = requests.post('http://localhost:8000/index', data=data, files=files) ``` **Example with cURL:** ```bash curl -X POST "http://localhost:8000/index" \ -F "id=event123" \ -F "texts=Sự kiện âm nhạc tại Hà Nội" \ -F "texts=Diễn ra vào ngày 20/10/2025" \ -F "texts=Địa điểm: Trung tâm Hội nghị Quốc gia" \ -F "images=@poster1.jpg" \ -F "images=@poster2.jpg" \ -F "images=@poster3.jpg" ``` ### 2. Advanced RAG Pipeline in `/chat` API The chat endpoint now uses modern RAG techniques for better response quality: #### Key Improvements: 1. **Query Expansion**: Automatically expands your question with variations 2. **Multi-Query Retrieval**: Searches with multiple query variants 3. **Reranking**: Re-scores results for better relevance 4. **Contextual Compression**: Keeps only the most relevant parts 5. **Better Prompt Engineering**: Optimized prompts for LLM #### How to Use: **Basic Usage (Auto-enabled):** ```python import requests response = requests.post('http://localhost:8000/chat', json={ 'message': 'Dao có nguy hiểm không?', 'use_rag': True, 'use_advanced_rag': True, # Default: True 'hf_token': 'hf_xxxxx' }) result = response.json() print("Response:", result['response']) print("RAG Stats:", result['rag_stats']) # See pipeline statistics ``` **Advanced Configuration:** ```python response = requests.post('http://localhost:8000/chat', json={ 'message': 'Làm sao để tạo event mới?', 'use_rag': True, 'use_advanced_rag': True, # RAG Pipeline Options 'use_query_expansion': True, # Expand query with variations 'use_reranking': True, # Rerank results 'use_compression': True, # Compress context 'score_threshold': 0.5, # Min relevance score (0-1) 'top_k': 5, # Number of documents to retrieve # LLM Options 'max_tokens': 512, 'temperature': 0.7, 'hf_token': 'hf_xxxxx' }) ``` **Disable Advanced RAG (Use Basic):** ```python response = requests.post('http://localhost:8000/chat', json={ 'message': 'Your question', 'use_rag': True, 'use_advanced_rag': False, # Use basic RAG }) ``` ## API Changes Summary ### `/index` Endpoint **Old Parameters:** - `id`: str (required) - `text`: str (required) - `image`: UploadFile (optional) **New Parameters:** - `id`: str (required) - `texts`: List[str] (optional, max 10) - `images`: List[UploadFile] (optional, max 10) **Response:** ```json { "success": true, "id": "doc123", "message": "Đã index thành công document doc123 với 3 texts và 2 images" } ``` ### `/chat` Endpoint **New Parameters:** - `use_advanced_rag`: bool (default: True) - Enable advanced RAG - `use_query_expansion`: bool (default: True) - Expand query - `use_reranking`: bool (default: True) - Rerank results - `use_compression`: bool (default: True) - Compress context - `score_threshold`: float (default: 0.5) - Min relevance score **Response (New):** ```json { "response": "AI generated answer...", "context_used": [...], "timestamp": "2025-10-29T...", "rag_stats": { "original_query": "Your question", "expanded_queries": ["Query variant 1", "Query variant 2"], "initial_results": 10, "after_rerank": 5, "after_compression": 5 } } ``` ## Complete Examples ### Example 1: Index Multiple Social Media Posts ```python import requests # Index a social media event with multiple posts and images data = { 'id': 'event_festival_2025', 'texts': [ 'Festival âm nhạc quốc tế Hà Nội 2025', 'Ngày 15-17 tháng 11 năm 2025', 'Địa điểm: Công viên Thống Nhất', 'Line-up: Sơn Tùng MTP, Đen Vâu, Hoàng Thùy Linh', 'Giá vé từ 500.000đ - 2.000.000đ' ] } files = [ ('images', open('poster_festival.jpg', 'rb')), ('images', open('lineup.jpg', 'rb')), ('images', open('venue_map.jpg', 'rb')) ] response = requests.post('http://localhost:8000/index', data=data, files=files) print(response.json()) ``` ### Example 2: Advanced RAG Chat ```python import requests # Chat with advanced RAG chat_response = requests.post('http://localhost:8000/chat', json={ 'message': 'Festival âm nhạc Hà Nội diễn ra khi nào và ở đâu?', 'use_rag': True, 'use_advanced_rag': True, 'top_k': 3, 'score_threshold': 0.6, 'hf_token': 'your_hf_token_here' }) result = chat_response.json() print("Answer:", result['response']) print("\nRetrieved Context:") for ctx in result['context_used']: print(f"- [{ctx['id']}] Confidence: {ctx['confidence']:.2%}") print("\nRAG Pipeline Stats:") print(f"- Original query: {result['rag_stats']['original_query']}") print(f"- Query variants: {result['rag_stats']['expanded_queries']}") print(f"- Documents retrieved: {result['rag_stats']['initial_results']}") print(f"- After reranking: {result['rag_stats']['after_rerank']}") ``` ## Performance Comparison | Feature | Basic RAG | Advanced RAG | |---------|-----------|--------------| | Query Understanding | Single query | Multiple query variants | | Retrieval Method | Direct vector search | Multi-query + hybrid | | Result Ranking | Score from DB | Reranked with semantic similarity | | Context Quality | Full text | Compressed, relevant parts only | | Response Accuracy | Good | Better | | Response Time | Faster | Slightly slower but better quality | ## When to Use What? **Use Basic RAG when:** - You need fast response time - Queries are straightforward - Context is already well-structured **Use Advanced RAG when:** - You need higher accuracy - Queries are complex or ambiguous - Context documents are long - You want better relevance ## Troubleshooting ### Error: "Tối đa 10 texts" You're sending more than 10 texts. Reduce to max 10. ### Error: "Tối đa 10 images" You're sending more than 10 images. Reduce to max 10. ### RAG stats show 0 results Your `score_threshold` might be too high. Try lowering it (e.g., 0.3-0.5). ## Next Steps To further improve RAG, consider: 1. **Add BM25 Hybrid Search**: Combine dense + sparse retrieval 2. **Use Cross-Encoder for Reranking**: Better than embedding similarity 3. **Implement Query Decomposition**: Break complex queries into sub-queries 4. **Add Citation/Source Tracking**: Show which document each fact comes from 5. **Integrate RAG-Anything**: For advanced multimodal document processing For RAG-Anything integration (more complex), see: https://github.com/HKUDS/RAG-Anything