NingLab
/

CASLIE-S

Safetensors

Model card Files Files and versions

xet

Community

Improve model card: Add metadata, links, and usage example

by nielsr HF Staff - opened 10 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+59

-4

Files changed (1) hide show

README.md +59 -4

README.md CHANGED Viewed

@@ -1,18 +1,73 @@
 ---
-license: cc-by-4.0
-datasets:
-- NingLab/MMECInstruct
 base_model:
 - meta-llama/Llama-3.2-3B-Instruct
 ---
 # CASLIE-S
-This repo contains the models for "Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data"
 ## CASLIE Models
 The CASLIE-S model is instruction-tuned from the small base models [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct).
 ## Citation
 ```bibtex
 @article{ling2024captions,

 ---
 base_model:
 - meta-llama/Llama-3.2-3B-Instruct
+datasets:
+- NingLab/MMECInstruct
+license: cc-by-4.0
+pipeline_tag: image-text-to-text
+library_name: transformers
 ---
 # CASLIE-S
+This repo contains the models for "[Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data](https://huggingface.co/papers/2410.17337)".
+- 📚 [Paper](https://huggingface.co/papers/2410.17337)
+- 🌐 [Project Page](https://ninglab.github.io/CASLIE/)
+- 💻 [Code](https://github.com/ninglab/CASLIE)
+## Introduction
+We introduce [MMECInstruct](https://huggingface.co/datasets/NingLab/MMECInstruct), the first-ever, large-scale, and high-quality multimodal instruction dataset for e-commerce. We also develop CASLIE, a simple, lightweight, yet effective framework for integrating multimodal information. Leveraging MMECInstruct, we fine-tune a series of e-commerce Multimodal Foundation Models (MFMs) within CASLIE.
 ## CASLIE Models
 The CASLIE-S model is instruction-tuned from the small base models [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct).
+## Sample Usage
+To conduct multimodal inference with the CASLIE-S model using the Hugging Face `transformers` library, you can follow this example. This snippet demonstrates how to load the model and processor, and perform a basic image-text-to-text generation.
+```python
+import torch
+from transformers import AutoProcessor, AutoModelForCausalLM
+from PIL import Image
+# Load model and processor
+model_path = "NingLab/CASLIE-S"
+# The `trust_remote_code=True` is necessary to load custom model and processor definitions.
+processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True)
+# Example: Image and text input for a product description task
+# Replace "image.png" with the actual path to your image file
+try:
+    image = Image.open("image.png").convert("RGB")
+except FileNotFoundError:
+    print("Warning: 'image.png' not found. Using a dummy image for demonstration. Please replace with a real image path.")
+    # Create a dummy image for demonstration if actual image is not found
+    image = Image.new('RGB', (256, 256), color = 'red')
+question = "Describe the product in detail."
+# Prepare the conversation in a chat template format
+# The "<image>" token is a placeholder which the processor handles to embed image features.
+messages = [{"role": "user", "content": f"{question} <image>"}]
+# Apply the chat template and process inputs (image and text)
+text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = processor(text=[text], images=[image], padding=True, return_tensors="pt").to(model.device)
+# Generate response from the model
+output_ids = model.generate(**inputs, max_new_tokens=256, do_sample=True, temperature=0.7)
+response = processor.decode(output_ids[0], skip_special_tokens=True)
+print(f"Question: {question}")
+print(f"Response: {response}")
+# For more advanced usage, specific tasks, and detailed inference scripts,
+# please refer to the project's official GitHub repository:
+# https://github.com/ninglab/CASLIE
+```
 ## Citation
 ```bibtex
 @article{ling2024captions,