Spaces:

minhvtt
/

EBD_Fest

Running

File size: 12,464 Bytes

cb93402

# Multimodal PDF Guide - PDFs với Text + Hình Ảnh

## Tổng Quan

Hệ thống giờ hỗ trợ **Multimodal PDF** - PDFs có:
- ✅ Text hướng dẫn
- ✅ Image URLs (links đến hình ảnh)
- ✅ Markdown images: `![alt](url)`
- ✅ HTML images: `<img src="url">`

**Perfect cho**: User guides với screenshots, tutorials với diagrams, documentation với visual aids.

---

## Tại Sao Cần Multimodal?

### Vấn Đề Với PDF Thông Thường

PDF hướng dẫn thường có:
```
Bước 1: Mở trang chủ
[Xem hình ảnh: https://example.com/homepage.png]

Bước 2: Click vào "Tạo mới"
![Create button](https://example.com/create-button.png)

Bước 3: Điền thông tin
<img src="https://example.com/form.png" alt="Form" />
```

**PDF parser cũ** chỉ extract text → **MẤT hết image URLs** → Chatbot không biết hình ảnh nào liên quan!

**Multimodal PDF parser mới**:
- ✓ Extract text
- ✓ Detect tất cả image URLs
- ✓ Link images với text chunks tương ứng
- ✓ Store URLs trong metadata
- ✓ Return images cùng text khi chat

---

## So Sánh: PDF Thường vs Multimodal PDF

| Feature | PDF Thường (`/upload-pdf`) | Multimodal PDF (`/upload-pdf-multimodal`) |
|---------|---------------------------|-------------------------------------------|
| Extract text | ✓ | ✓ |
| Detect image URLs | ✗ | ✓ |
| Link images to chunks | ✗ | ✓ |
| Return images in chat | ✗ | ✓ |
| URL formats supported | ✗ | http://, https://, markdown, HTML |
| Use case | Simple text documents | User guides, tutorials, docs with images |

---

## Cách Sử Dụng

### 1. Upload Multimodal PDF

**Endpoint:** `POST /upload-pdf-multimodal`

**Curl:**
```bash
curl -X POST "http://localhost:8000/upload-pdf-multimodal" \
  -F "file=@user_guide_with_images.pdf" \
  -F "title=Hướng dẫn sử dụng hệ thống" \
  -F "description=User guide with screenshots" \
  -F "category=user_guide"
```

**Python:**
```python
import requests

with open('user_guide_with_images.pdf', 'rb') as f:
    response = requests.post(
        'http://localhost:8000/upload-pdf-multimodal',
        files={'file': f},
        data={
            'title': 'User Guide with Screenshots',
            'category': 'user_guide'
        }
    )

result = response.json()
print(f"Indexed: {result['chunks_indexed']} chunks")
print(f"Images found: {result['message']}")
```

**Response:**
```json
{
  "success": true,
  "document_id": "pdf_multimodal_20251029_150000",
  "filename": "user_guide_with_images.pdf",
  "chunks_indexed": 25,
  "message": "PDF 'user_guide_with_images.pdf' indexed successfully with 25 chunks and 15 images"
}
```

### 2. Chat Với Multimodal Context

```python
import requests

response = requests.post('http://localhost:8000/chat', json={
    'message': 'Làm sao để tạo event mới?',
    'use_rag': True,
    'use_advanced_rag': True,
    'top_k': 3,
    'hf_token': 'your_token'
})

result = response.json()

# Response text
print("Answer:", result['response'])

# Retrieved context with images
for ctx in result['context_used']:
    print(f"\n--- Source: Page {ctx['metadata']['page']} ---")
    print(f"Text: {ctx['metadata']['text'][:200]}...")

    # Check if this chunk has images
    if ctx['metadata'].get('has_images'):
        print(f"Images ({ctx['metadata']['num_images']}):")
        for img_url in ctx['metadata'].get('image_urls', []):
            print(f"  - {img_url}")
```

**Example Output:**
```
Answer: Để tạo event mới, bạn thực hiện các bước sau:
1. Mở trang chủ và click vào nút "Tạo Event" (xem hình minh họa)
2. Điền thông tin event...

--- Source: Page 5 ---
Text: Bước 1: Mở trang chủ và click vào nút "Tạo Event"...
Images (2):
  - https://example.com/homepage.png
  - https://example.com/create-button.png
```

---

## Cách Chuẩn Bị PDF

### Format Hỗ Trợ

Multimodal parser detect các format sau:

1. **Standard URLs:**
   ```
   Xem hình: https://example.com/image.png
   Screenshot: http://cdn.example.com/screenshot.jpg
   ```

2. **Markdown Images:**
   ```markdown
   ![Homepage](https://example.com/homepage.png)
   ![Button](https://example.com/button.png)
   ```

3. **HTML Images:**
   ```html
   <img src="https://example.com/form.png" alt="Form" />
   <img src="http://example.com/result.jpg">
   ```

4. **Image Extensions:**
   ```
   https://example.com/pic.jpg
   https://example.com/chart.png
   https://example.com/diagram.svg
   ```

### Best Practices

#### ✓ Tốt

**PDF Content Example:**
```
# Hướng Dẫn Tạo Event

## Bước 1: Mở Trang Chủ

Truy cập vào trang chủ hệ thống tại homepage.

![Homepage Screenshot](https://docs.example.com/images/homepage.png)

Bạn sẽ thấy màn hình chính với menu bên trái.

## Bước 2: Click "Tạo Event"

Tìm và click vào nút "Tạo Event" ở góc trên phải.

![Create Event Button](https://docs.example.com/images/create-button.png)

## Bước 3: Điền Thông Tin

Điền các thông tin sau vào form:
- Tên event
- Ngày giờ
- Địa điểm

Xem mẫu form: https://docs.example.com/images/event-form.png
```

**Why good:**
- Có cấu trúc rõ ràng (headings)
- Mỗi bước có text + hình ảnh
- URLs rõ ràng, dễ detect
- Context gắn chặt với hình

#### ✗ Tránh

```
Xem các hình dưới đây [1] [2] [3]

[Các hình ảnh ở cuối tài liệu]

...

[1] homepage.png
[2] button.png
[3] form.png
```

**Why bad:**
- Images references không có URLs
- Images tách biệt khỏi context
- Không có full URLs (chỉ filenames)

---

## Ví Dụ Thực Tế

### Tạo PDF Hướng Dẫn Multimodal

**File: `chatbot_guide_with_images.md`**

```markdown
# Hướng Dẫn Sử Dụng ChatbotRAG

## 1. Upload PDF

### Bước 1: Chuẩn bị file PDF

Đảm bảo file PDF của bạn đã sẵn sàng.

![PDF File Icon](https://via.placeholder.com/150?text=PDF+File)

### Bước 2: Sử dụng cURL hoặc Python

**Với cURL:**

\`\`\`bash
curl -X POST "http://localhost:8000/upload-pdf-multimodal" \\
  -F "file=@your_file.pdf"
\`\`\`

![cURL Command Example](https://via.placeholder.com/400x100?text=cURL+Command)

**Với Python:**

\`\`\`python
import requests
# Upload code here
\`\`\`

### Bước 3: Verify Upload

Kiểm tra kết quả upload:

https://via.placeholder.com/500x300?text=Upload+Success+Message

## 2. Chat Với Chatbot

Sau khi upload, bạn có thể hỏi chatbot:

![Chat Interface](https://via.placeholder.com/600x400?text=Chat+Interface)

**Ví dụ câu hỏi:**
- "Làm sao để upload PDF?"
- "Các bước tạo event là gì?"

![Chat Example](https://via.placeholder.com/600x300?text=Chat+Example)

## 3. Xem Kết Quả

Chatbot sẽ trả lời dựa trên PDF content:

https://via.placeholder.com/600x350?text=Chat+Response+with+Images
```

**Convert to PDF:**
```bash
pandoc chatbot_guide_with_images.md -o chatbot_guide_with_images.pdf
```

**Upload:**
```bash
curl -X POST "http://localhost:8000/upload-pdf-multimodal" \
  -F "file=@chatbot_guide_with_images.pdf" \
  -F "title=ChatbotRAG Guide" \
  -F "category=user_guide"
```

---

## Advanced: Custom Image Handling

### Option 1: Local Images

Nếu images ở local, bạn cần host chúng:

```bash
# Simple HTTP server
cd /path/to/images
python -m http.server 8080

# Images available at:
# http://localhost:8080/image1.png
# http://localhost:8080/image2.png
```

Trong PDF, reference:
```
![Image](http://localhost:8080/image1.png)
```

### Option 2: Cloud Storage

Upload images lên cloud (AWS S3, Cloudinary, Imgur, etc.):

```python
# Example: Upload to Imgur
import requests

def upload_to_imgur(image_path):
    client_id = 'YOUR_CLIENT_ID'
    headers = {'Authorization': f'Client-ID {client_id}'}

    with open(image_path, 'rb') as img:
        response = requests.post(
            'https://api.imgur.com/3/image',
            headers=headers,
            files={'image': img}
        )

    return response.json()['data']['link']

# Upload images
url1 = upload_to_imgur('screenshot1.png')
url2 = upload_to_imgur('screenshot2.png')

# Use URLs in PDF
print(f"![Screenshot 1]({url1})")
```

### Option 3: Embed Images as Base64

Nếu PDF có images embedded, extract chúng:

```python
import pypdfium2 as pdfium
from PIL import Image
import io
import base64

def extract_images_from_pdf(pdf_path):
    """Extract embedded images from PDF"""
    pdf = pdfium.PdfDocument(pdf_path)
    images = []

    for page_num in range(len(pdf)):
        page = pdf[page_num]
        # Render page as image
        bitmap = page.render(scale=2.0)
        pil_image = bitmap.to_pil()

        # Save or convert to base64
        buffered = io.BytesIO()
        pil_image.save(buffered, format="PNG")
        img_str = base64.b64encode(buffered.getvalue()).decode()

        images.append({
            'page': page_num + 1,
            'base64': img_str,
            'url': f'data:image/png;base64,{img_str}'
        })

    return images
```

---

## Troubleshooting

### Images không được detect

**Nguyên nhân:**
- URLs không đúng format (thiếu http://)
- URLs bị line break
- Markdown syntax sai

**Giải pháp:**
```python
# Test URL detection
from multimodal_pdf_parser import MultimodalPDFParser

parser = MultimodalPDFParser()
test_text = """
Xem hình: https://example.com/image.png
![Alt](https://example.com/pic.jpg)
"""

urls = parser.extract_image_urls(test_text)
print("Found URLs:", urls)
```

### Chatbot không return images

**Check:**
1. Verify PDF đã được index với multimodal parser:
   ```bash
   curl http://localhost:8000/documents/pdf
   # Look for "type": "multimodal_pdf"
   ```

2. Check metadata có `image_urls`:
   ```python
   response = requests.post('http://localhost:8000/chat', ...)
   for ctx in response.json()['context_used']:
       print(ctx['metadata'].get('image_urls', []))
   ```

### Images quá nhiều → chunks lớn

**Solution:** Giảm số images mỗi chunk:

```python
# In multimodal_pdf_parser.py
parser = MultimodalPDFParser(
    chunk_size=300,      # Smaller chunks
    chunk_overlap=30,
    extract_images=True
)
```

---

## Kết Luận

### Khi Nào Dùng Multimodal PDF?

✓ **Sử dụng `/upload-pdf-multimodal` khi:**
- PDF có hình ảnh minh họa (screenshots, diagrams)
- Cần chatbot reference hình ảnh khi trả lời
- User guides, tutorials với visual instructions
- Documentation với charts, tables as images

✓ **Sử dụng `/upload-pdf` thường khi:**
- PDF chỉ có text thuần
- Không cần images trong context
- Simple documents, FAQs

### Workflow Hoàn Chỉnh

1. **Tạo PDF** với text + image URLs (Markdown/HTML)
2. **Upload** qua `/upload-pdf-multimodal`
3. **Verify** images đã được detect
4. **Chat** - images sẽ tự động được include in context
5. **Display** images trong UI của bạn

---

## Example: Full Workflow

```python
"""
Complete workflow: Create, upload, and chat with multimodal PDF
"""
import requests

# 1. Upload multimodal PDF
print("=== Uploading Multimodal PDF ===")
with open('user_guide_with_images.pdf', 'rb') as f:
    response = requests.post(
        'http://localhost:8000/upload-pdf-multimodal',
        files={'file': f},
        data={'title': 'User Guide', 'category': 'guide'}
    )

result = response.json()
print(f"✓ Indexed: {result['chunks_indexed']} chunks")
print(f"✓ Message: {result['message']}")

# 2. Chat with multimodal context
print("\n=== Chatting ===")
response = requests.post('http://localhost:8000/chat', json={
    'message': 'Làm sao để tạo event mới? Cho tôi xem hình minh họa.',
    'use_rag': True,
    'use_advanced_rag': True,
    'top_k': 3,
    'hf_token': 'your_token'
})

chat_result = response.json()
print(f"Answer: {chat_result['response']}\n")

# 3. Display context with images
print("=== Context with Images ===")
for i, ctx in enumerate(chat_result['context_used'], 1):
    print(f"\n[{i}] Page {ctx['metadata']['page']}, Confidence: {ctx['confidence']:.2%}")
    print(f"Text: {ctx['metadata']['text'][:150]}...")

    if ctx['metadata'].get('has_images'):
        print(f"Images ({ctx['metadata']['num_images']}):")
        for url in ctx['metadata']['image_urls']:
            print(f"  🖼️ {url}")
```

---

**Bây giờ PDF của bạn có hình ảnh minh họa sẽ work perfectly! 🎨📄**