why do we still need qk_rope_head_dim when using Nope

#6
by jimmyhbx - opened

Hi,

Thanks for sharing this great model. Just curious why does kimi-linear still need to set qk_rope_head_dim=64 using Nope?

Moonshot AI org

The reason for this design is to share all the current inference engines and achieve minimal differentiation from the open source world.@jimmyhbx

https://github.com/vllm-project/vllm/pull/27809/files#diff-6ffcb4f51daf85df32c7d35433c3393f1602663960f677ae61f55af1ed3ab524R150

yzhangcs changed discussion status to closed

Sign up or log in to comment