> ZeroEval's Leaderboard on hugging face [0] actually shows that it beats even C...

nunodonato · on Aug 7, 2024

which is a good hint that that benchmark sucks. No way 4o beats sonnet 3.5

lauralex · on Aug 7, 2024

Sonnet 3.5 has a lot of alignment issues. It many times refused to answer simple coding questions I asked, just because it considered them "unsafe". 4o is much more relaxed. Regarding math, sonnet is a bit better than 4o though.

ashu1461 · on Aug 7, 2024

I think they have secretly released something which is better than 4. In our internal benchmarks also the 4 o mini is performing better than 4 o

rvnx · on Aug 7, 2024

The weirdest is that ultimately the best model is supposed to be Gemini Pro according to these benchmarks