Instructions to use TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill")
model = AutoModelForImageTextToText.from_pretrained("TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill

SGLang

How to use TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio new

How to use TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill",
    max_seq_length=2048,
)

Docker Model Runner
How to use TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill with Docker Model Runner:
```
docker model run hf.co/TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill
```

question about TeichAI json gen

by CryptoAIM - opened Mar 27

Discussion

CryptoAIM

Mar 27

soo TeichAI has made a json generater, right? wasn‘t it tool gen or smth? anyway can it also just create the prompts and topics? im still arguing what kind of llm monster im buying or building and do not have the money yet, but ill prob spend a lot of time creating datasets from open source models, for distillation. also you used a unique aproach to distilling. is there anything else youd recommend? because style imitation is good and all, but i still would like some actual new knowledge in these models or should I just continue pre-training of instruct models and post-train using distilling or what?

CompactAI

TeichAI org Mar 27

If you want your model to learn more continue pretraining
If you want your model to predict better tokens to use more of what it already knows, then do proper Distillation.
What I mean by proper Distillation:
Use a local model, run prompts through it, get the top 50 tokens it thought about for every token in its output, train on all of that data combined. This will result in a much smarter model. Currently TeichAI does not provide these types of datasets.

CompactAI

TeichAI org Mar 27

What proper distillation data looks like ->
Existing output: "The capital of France is "
Token: "Paris"
Probability: 0.85
Top 50 Alternatives:

London (0.05)
Berlin (0.03)
Rome (0.02)
...

This forces the model to learn what could be said anywhere, and how to correct its mistakes if it says a wrong token :)

CryptoAIM

Mar 27

armand0e

TeichAI org Mar 27

Yea if you're just doing opensource models the best and most through-and-through method of distillation is called a logits distillation. I suggest you read more about that, this can also transfer knowledge, though I think you will be limited to distilling large models into smaller models from the same family (i.e Qwen3.5 27B -> Qwen3.5 4B)

CryptoAIM

Mar 28

Seriously? What if I did miced RL? 2/3 Distilling and 1/3 RLVR so it still learns self-correction? I will probs. do a lot with around same size model, but different architecture and things like that. This would kill logits distillation for me.

Bob-the-Koala

Mar 28

Yea if you're just doing opensource models the best and most through-and-through method of distillation is called a logits distillation. I suggest you read more about that, this can also transfer knowledge, though I think you will be limited to distilling large models into smaller models from the same family (i.e Qwen3.5 27B -> Qwen3.5 4B)

Unless two models share the same vocabulary but not many models do that so yeah

CryptoAIM

Mar 28

Unless two models share the same vocabulary but not many models do that so yeah

can someone slowly change a models vocab? Pre-phase (no training): benching the model and what vocabs are used how often. first phase: RLVR, while slowly changing vocab (less used first) problem is the major ones may be important, so you also the same for the teacher (less agressivly), until you have the same vocab ?

CryptoAIM

Mar 28

or are there other workarounds?

CryptoAIM

Mar 28

maybe instead of other logits, sample one token per sentence and let the teacher generate multiple candidate solutions and you use the draft teacher or a reward model to rank the candidates

Bob-the-Koala

Mar 28

Depends on your definition of slowly, it would basically be slower than starting from scratch as it would have definitions for things that do not exist anymore.

As for your second question, it could be kind of interesting, you would have to generate the actual probabilities though and that could be kind of unreliable