QwQ-32B added to LiveBench: An open source model small enough to run on a 3090 outperforming Claude 3.7 Sonnet on most categories
The only categories it loses on are Coding (obviously) and Language
EDIT: it might be even better because apparently the Qwen team is absolutely certain their model is better than R1 and want a rerun this will occur on Monday