fbpx


New Grok 3 release tops LLM leaderboards despite Musk-approved “based” opinions

Published on: February 19, 2025

Potential opinionated output aside, early reviews of Grok 3 seem to position the model family favorably against its competitors. For example, the model is currently topping the LMSYS Chatbot Arena leaderboard, which ranks AI language models in a blind popularity vibemarking contest.

Screenshot of a tweet from Elon Musk showing Grok 3 saying,


Credit:

X


AI researcher Andrej Karpathy tested Grok 3 and wrote on X, “As far as a quick vibe check over ~2 hours this morning, Grok 3 + Thinking feels somewhere around the state of the art territory of OpenAI’s strongest models (o1-pro, $200/month), and slightly better than DeepSeek-R1 and Gemini 2.0 Flash Thinking. Which is quite incredible considering that the team started from scratch ~1 year ago, this timescale to state of the art territory is unprecedented.”



X Premium+ subscribers paying $50 monthly will receive first access to Grok 3. Leaks suggest a new SuperGrok plan will be $30 monthly or $300 annually, providing subscribers with additional features including unlimited image generation.

A multi-model family

Like AI models from other companies, the Grok 3 family contains several models, including a smaller “mini” version that trades accuracy for speed. xAI claims that Grok 3 outperforms OpenAI’s GPT-4o on certain mathematics and science benchmarks, including AIME and GPQA, which test graduate-level physics, biology, and chemistry knowledge.

Two models in the family, Grok 3 Reasoning and Grok 3 mini Reasoning, incorporate simulated reasoning features similar to OpenAI’s o3-mini and DeepSeek’s R1 models. Users can access these through a “Think” command or “Big Brain” mode in the Grok app. In addition, the Grok app now includes “DeepSearch,” a research tool that searches the internet and X platform to create summaries of information, similar to Google and OpenAI’s Deep Research features.

xAI plans to add voice synthesis to the Grok app within a week and launch an enterprise API with DeepSearch capabilities in the following weeks. The company says it will also open-source the previous Grok 2 model once Grok 3 stabilizes, which Musk estimates will take several months.

This article was updated on February 19, 2025 at 6:53 AM to better contextualize Elon Musk’s post about Grok 3.