LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model
Paper • 2603.01068 • Published • 22
# Load model directly
from transformers import LlavaLLaDAModelLM
model = LlavaLLaDAModelLM.from_pretrained("GSAI-ML/LLaDA-V", trust_remote_code=True, dtype="auto")We introduce LLaDA-V, a competitive diffusion-based vision-language model that outperforms other diffusion MLLMs.
It was presented in the paper LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning.
Project Page: https://ml-gsai.github.io/LLaDA-V-demo/
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="GSAI-ML/LLaDA-V", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)