Any-to-Any
Transformers
leonli66 commited on
Commit
4a71137
·
verified ·
1 Parent(s): 1a56384

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -48
README.md CHANGED
@@ -40,54 +40,7 @@ Bagel‑Zebra‑CoT is fine-tuned from [Bagel‑7B](https://huggingface.co/ByteD
40
 
41
  ## Usage
42
 
43
- Here's a quick example to use the model with the `transformers` library:
44
-
45
- ```python
46
- from transformers import AutoProcessor, AutoModel
47
- from PIL import Image
48
- import torch
49
-
50
- # Load model and processor
51
- model_id = "multimodal-reasoning-lab/Bagel-Zebra-CoT"
52
- model = AutoModel.from_pretrained(model_id, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto")
53
- processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
54
-
55
- # Example image and question (replace with your path and query)
56
- image_path = "test_images/image.png"
57
- image = Image.open(image_path).convert('RGB')
58
- question = "Subtract all cylinders. Add 1 red sphere. How many objects are left?"
59
-
60
- # Prepare inputs
61
- messages = [
62
- {
63
- "role": "user",
64
- "content": [
65
- {"type": "image", "image": image},
66
- {"type": "text", "text": question},
67
- ],
68
- }
69
- ]
70
-
71
- text = processor.apply_chat_template(
72
- messages, tokenize=False, add_generation_prompt=True
73
- )
74
- inputs = processor(
75
- text=[text],
76
- images=[image],
77
- padding=True,
78
- return_tensors="pt",
79
- )
80
- inputs = {k: v.to(model.device) for k, v in inputs.items()}
81
-
82
- # Generate response
83
- generated_ids = model.generate(**inputs, max_new_tokens=512)
84
-
85
- # Decode and print output
86
- output_text = processor.batch_decode(generated_ids, skip_special_tokens=False, clean_up_tokenization_spaces=False)[0]
87
- print(output_text)
88
- ```
89
-
90
- For more advanced usage, training details, and additional examples, please refer to the [official GitHub repository](https://github.com/multimodal-reasoning-lab/Bagel-Zebra-CoT).
91
 
92
  ---
93
 
 
40
 
41
  ## Usage
42
 
43
+ For more interleaved text and image inference and training, please refer to the [official GitHub repository](https://github.com/multimodal-reasoning-lab/Bagel-Zebra-CoT).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
  ---
46