Instructions to use TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill") model = AutoModelForImageTextToText.from_pretrained("TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill
- SGLang
How to use TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Unsloth Studio new
How to use TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill", max_seq_length=2048, ) - Docker Model Runner
How to use TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill with Docker Model Runner:
docker model run hf.co/TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill
question about TeichAI json gen
soo TeichAI has made a json generater, right? wasn‘t it tool gen or smth? anyway can it also just create the prompts and topics? im still arguing what kind of llm monster im buying or building and do not have the money yet, but ill prob spend a lot of time creating datasets from open source models, for distillation. also you used a unique aproach to distilling. is there anything else youd recommend? because style imitation is good and all, but i still would like some actual new knowledge in these models or should I just continue pre-training of instruct models and post-train using distilling or what?
If you want your model to learn more continue pretraining
If you want your model to predict better tokens to use more of what it already knows, then do proper Distillation.
What I mean by proper Distillation:
Use a local model, run prompts through it, get the top 50 tokens it thought about for every token in its output, train on all of that data combined. This will result in a much smarter model. Currently TeichAI does not provide these types of datasets.
What proper distillation data looks like ->
Existing output: "The capital of France is "
Token: "Paris"
Probability: 0.85
Top 50 Alternatives:
- London (0.05)
- Berlin (0.03)
- Rome (0.02)
...
This forces the model to learn what could be said anywhere, and how to correct its mistakes if it says a wrong token :)
ty
Yea if you're just doing opensource models the best and most through-and-through method of distillation is called a logits distillation. I suggest you read more about that, this can also transfer knowledge, though I think you will be limited to distilling large models into smaller models from the same family (i.e Qwen3.5 27B -> Qwen3.5 4B)
Seriously? What if I did miced RL? 2/3 Distilling and 1/3 RLVR so it still learns self-correction? I will probs. do a lot with around same size model, but different architecture and things like that. This would kill logits distillation for me.
Yea if you're just doing opensource models the best and most through-and-through method of distillation is called a logits distillation. I suggest you read more about that, this can also transfer knowledge, though I think you will be limited to distilling large models into smaller models from the same family (i.e Qwen3.5 27B -> Qwen3.5 4B)
Unless two models share the same vocabulary but not many models do that so yeah
Unless two models share the same vocabulary but not many models do that so yeah
can someone slowly change a models vocab? Pre-phase (no training): benching the model and what vocabs are used how often. first phase: RLVR, while slowly changing vocab (less used first) problem is the major ones may be important, so you also the same for the teacher (less agressivly), until you have the same vocab ?
or are there other workarounds?
maybe instead of other logits, sample one token per sentence and let the teacher generate multiple candidate solutions and you use the draft teacher or a reward model to rank the candidates
Depends on your definition of slowly, it would basically be slower than starting from scratch as it would have definitions for things that do not exist anymore.
As for your second question, it could be kind of interesting, you would have to generate the actual probabilities though and that could be kind of unreliable
Fast Vocabulary Transfer (FVT) is the recomendation of gemini
Seems promising
Hugging face built General On-Policy Logit Distillation (GOLD) where you do not even have to change the vocab
You have to use TRL though
That's pretty cool, will need to check it out.
Yeah, you still need logits though…