head_dim * num_attention_heads != hidden_size
#4 opened 2 months ago
by
zhangchuanhu
【Evaluation】Best practice for evaluating Qwen3 !!
🚀
🔥
2
#2 opened 6 months ago
by
wangxingjun778
Add languages tag
#1 opened 6 months ago
by
de-francophones