Bench predictions for new Claude model(s)?
My guess is ~75 on livebench for coding (lower than o3-mini-high), but more capable at real-world coding tasks though. Curious to hear what you all are expecting.
My guess is ~75 on livebench for coding (lower than o3-mini-high), but more capable at real-world coding tasks though. Curious to hear what you all are expecting.