For some weird reason google ai studio is different from ALL quantizations

#11
by aguspiza - opened

Mostly never gemma3n:e4b GGUF quantizations can reply to this prompt correctly:

"calculate 4095+4095 and write the result in hexadecimal"

While the google ai studio version nails it ALWAYS, even the e2b version does it perfectly sometimes.

After testing it in ai studio several times it finally failed... so I guess ai studio is running the FP32 and it just fails less.

Google org

Hi,

Apologies for the late reply, thanks for bringing this to our attention. your observation that the "e2b version does it perfectly sometimes" and the "ai studio version nails it ALWAYS" confirms this. The e2b quantization is even more aggressive than e4b, leading to more frequent errors. The full-precision model in Google AI Studio has no such numerical limitations, so it consistently provides the correct answer.

This is a well-known issue in the field of LLM quantization. The goal is to find the best balance between a model's size/speed and its performance on critical tasks. For tasks requiring high-fidelity numerical precision, a fully quantized model may not be the right tool for the job. It's recommended to use the full precision models for such kind of numerical tasks where higher precision is crucial.

Thanks.

Sign up or log in to comment