Eval requests: G4 E4B/MoE ARA vs SOMA
Hey, DontPlanToEnd, lad. I have risen from the dead to nag you with the following models. As you are aware, I have a thing for the tiny E4B multimodal model and comparing, contrasting different ablation methods. They were cooked basically on the same set of markers and score about the same on the PIQA benchmark. However, UGI should shed a brighter light for drawing conclusions.
https://huggingface.co/MuXodious/gemma-4-E4B-it-ARA-heresy
https://huggingface.co/MuXodious/gemma-4-E4B-it-SOMPOA-heresy
Highly optional at this time, as I'm yet to and unsure when to cook its comparate I had it done.:
https://huggingface.co/MuXodious/gemma-4-26B-A4B-it-ARA-heresy
https://huggingface.co/MuXodious/gemma-4-26B-A4B-it-SOMPOA-heresy
MuXodious/gemma-4-E4B-it-ARA-heresy and MuXodious/gemma-4-E4B-it-SOMPOA-heresy are giving me errors like this for some reason:
"""
ValueError: Following weights were not initialized from checkpoint: {'language_model.model.layers.33.self_attn.k_norm.weight', 'language_model.model.layers.31.self_attn.k_norm.weight', 'language_model.model.layers.38.self_attn.k_norm.weight'
...
"""
Added the 26Bs