why do we still need qk_rope_head_dim when using Nope
#6
by
jimmyhbx
- opened
Hi,
Thanks for sharing this great model. Just curious why does kimi-linear still need to set qk_rope_head_dim=64 using Nope?
The reason for this design is to share all the current inference engines and achieve minimal differentiation from the open source world.@jimmyhbx
yzhangcs
changed discussion status to
closed