BigCodeArena
🚀
37
Compare two AI models by sending them code and seeing their responses
Unveiling More Reliable Human Preferences in Code Generation via Execution
Compare two AI models by sending them code and seeing their responses