By signing in or creating an account, you agree with Associated Broadcasting Company's Terms & Conditions and Privacy Policy.
New Delhi: The AI race has been heating up all year, with OpenAI, Anthropic, and Google pushing out new frontier models. But this week, a relatively quieter player shook things up again. Chinese startup DeepSeek quietly released its biggest model yet, DeepSeek V3.1, on Hugging Face. At 685 billion parameters, it has already sparked debates in global AI circles.
What caught attention wasn’t just the scale but also its hybrid design. Instead of splitting reasoning and normal tasks into different models, DeepSeek merged both into one. For many developers, that’s the kind of efficiency they’ve been waiting for. And given its open-source MIT license, it has suddenly become accessible in ways that most U.S. models are not.
DeepSeek calls this a hybrid thinking model. It can switch between “thinking mode” and “non-thinking mode” by simply changing the chat template. That means the same model can handle casual queries and then shift into deeper reasoning when needed.
A few standout features:
This “Mixture of Experts” design is key. It’s how DeepSeek manages to deliver strong results without forcing developers to pay a fortune in compute.
Early numbers look promising. DeepSeek-V3.1 scored 71.6% on the Aider coding benchmark, putting it slightly ahead of Anthropic’s Claude Opus 4. And the cost difference is jaw-dropping. According to reports, completing a coding task with DeepSeek costs around $1.01, compared to nearly $70 for proprietary competitors.
On reasoning tasks, testers claim it solved tough problems like the “bouncing ball in a rotating shape,” a challenge that often trips up even advanced models. It also showed strength in maths, building on the success of its predecessor, which had already done well on AIME and MATH-500.
DeepSeek isn’t just scaling big; it’s tweaking training methods too. V3.1 was built on top of its earlier V3 base using a two-phase long-context extension. The 32K extension phase was stretched to 630B tokens, while the 128K phase reached 209B tokens. It also used the new UE8M0 FP8 scale format for training, which makes it compatible with modern micro-scaling hardware.
For context, DeepSeek’s older V2 model cost just $5.6 million for a single training run. Even if V3.1 is higher, it’s still remarkably cheap compared to U.S. labs that spend hundreds of millions.
The release timing is telling. DeepSeek-V3.1 landed just weeks after OpenAI’s GPT-5 and Anthropic’s Claude 4.1. By putting out a near-competitive open-weight model, DeepSeek is directly questioning the closed and expensive models of U.S. firms.
Sam Altman himself admitted to CNBC recently that competition from Chinese open-source models pushed OpenAI to release its own open weights. And with DeepSeek climbing Hugging Face’s trending charts within hours, it’s clear the community is paying attention to technical merit above politics.
The real question now is whether DeepSeek has abandoned its DeepSeek-R2 plans, since this hybrid design already covers reasoning. Their upcoming technical report might give answers.
For now, DeepSeek-V3.1 is available on Hugging Face under the MIT license. Variants are expected, but details remain unclear. Developers are already experimenting, and industry analysts say this could mark another turning point in open-source AI.