3 14 5

Ofir Zafrir

ofirzaf

AI & ML interests

Sparsity, Qunatization, Model Compression

Recent Activity

upvoted an article 4 days ago

Intel XPU Kernel Skill: LLM-driven Triton kernel optimization for the Hugging Face Kernel Hub

updated a model 22 days ago

ofirzaf/Qwen3.6-35B-A3B-int4-ov

published a model 22 days ago

ofirzaf/Qwen3.6-35B-A3B-int4-ov

View all activity

Organizations

upvoted an article 4 days ago

Article

Intel XPU Kernel Skill: LLM-driven Triton kernel optimization for the Hugging Face Kernel Hub

danf

•

4 days ago

• 9

updated a model 22 days ago

ofirzaf/Qwen3.6-35B-A3B-int4-ov

Updated 22 days ago • 21

published a model 22 days ago

ofirzaf/Qwen3.6-35B-A3B-int4-ov

Updated 22 days ago • 21

updated a model 22 days ago

ofirzaf/Qwen3.6-35B-A3B-DFlash-8bit-ov

Updated 22 days ago • 79

published a model 22 days ago

ofirzaf/Qwen3.6-35B-A3B-DFlash-8bit-ov

Updated 22 days ago • 79

upvoted an article 4 months ago

Article

Getting More from Your Test-Time Compute Budget with Portfolio Beam Search

danelbaz

•

Feb 24

• 8

updated a model 4 months ago

ofirzaf/hebrew-math-tutor-v1-W4A16-G128

4B • Updated Feb 10 • 1

New activity in OpenVINO/Phi-3-mini-FastDraft-50M-int8-sym-ov 4 months ago

Update README.md

#1 opened 4 months ago by

ofirzaf

published a model 5 months ago

ofirzaf/hebrew-math-tutor-v1-W4A16-G128

4B • Updated Feb 10 • 1

upvoted a paper 5 months ago

Prune Once for All: Sparse Pre-Trained Language Models

Paper • 2111.05754 • Published Nov 10, 2021 • 2

upvoted an article 6 months ago

Article

DeepMath: A lightweight math reasoning Agent with smolagents

danf, mber, moshew

•

Dec 4, 2025

• 40

upvoted an article 9 months ago

Article

Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models

imargulis, ofirzaf, sguskin, guybd, pcuenq

•

Sep 29, 2025

• 25

published an article 9 months ago

Article

Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models

imargulis, ofirzaf, sguskin, guybd, pcuenq

•

Sep 29, 2025

• 25

liked a model 9 months ago

OpenVINO/Qwen3-pruned-6L-from-0.6B-int8-ov

Updated Sep 24, 2025 • 16 • 1

upvoted an article 10 months ago

Article

Breaking Language Barriers in Mathematical AI: Introducing Hebrew Math Tutor

danf

•

Sep 7, 2025

• 3

liked 2 models 12 months ago

OpenVINO/Phi-3-mini-FastDraft-50M-int8-ov

Updated Dec 16, 2024 • 58 • 5

OpenVINO/Phi-4-mini-FastDraft-120M-int8-ov

Updated May 8, 2025 • 29 • 2

upvoted 2 articles about 1 year ago

Article

Introducing HELMET: Holistically Evaluating Long-context Language Models

hyen, gaotianyu1350, houminmin, kding1, danf, moshew, cdq10131

•

Apr 16, 2025

• 42

Article

Speeding Up LLM Decoding with Advanced Universal Assisted Generation Techniques

jmamou

•

Mar 24, 2025

• 20

upvoted a paper over 1 year ago

SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models

Paper • 2502.09390 • Published Feb 13, 2025 • 16

Ofir Zafrir

AI & ML interests

Recent Activity

Organizations

ofirzaf's activity

Intel XPU Kernel Skill: LLM-driven Triton kernel optimization for the Hugging Face Kernel Hub

Getting More from Your Test-Time Compute Budget with Portfolio Beam Search

Update README.md

DeepMath: A lightweight math reasoning Agent with smolagents

Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models

Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models

Breaking Language Barriers in Mathematical AI: Introducing Hebrew Math Tutor

Introducing HELMET: Holistically Evaluating Long-context Language Models

Speeding Up LLM Decoding with Advanced Universal Assisted Generation Techniques