QwQ-32B added to LiveBench: An open source model small enough to run on a 3090 outperforming Claude 3.7 Sonnet on most categories

https://livebench.ai/#/

The only categories it loses on are Coding (obviously) and Language

EDIT: it might be even better because apparently the Qwen team is absolutely certain their model is better than R1 and want a rerun this will occur on Monday

https://preview.redd.it/9luzfpwr4ene1.png?width=599&format=png&auto=webp&s=71cb2a763c25c7c902144f858b1333110e411386