why do we still need qk_rope_head_dim when using Nope
#6
by
jimmyhbx - opened
Hi,
Thanks for sharing this great model. Just curious why does kimi-linear still need to set qk_rope_head_dim=64 using Nope?
The reason for this design is to share all the current inference engines and achieve minimal differentiation from the open source world.@jimmyhbx
yzhangcs changed discussion status to
closed