DeepSeek V3.1: China’s 685B parameter open-weight AI model explained

Chinese AI startup DeepSeek has launched DeepSeek V3.1, a massive 685B parameter open-source AI model on Hugging Face. The hybrid system combines reasoning and chat in one model, boasts a 128K context window, and rivals GPT-5 and Claude 4.1 in performance, but at a fraction of the cost.

DeepSeek V3.1 AI model launched with 685B parameters, hybrid reasoning, Hugging Face release

TV9 English Desk | Updated on: Aug 21, 2025 | 01:28 PM

New Delhi: The AI race has been heating up all year, with OpenAI, Anthropic, and Google pushing out new frontier models. But this week, a relatively quieter player shook things up again. Chinese startup DeepSeek quietly released its biggest model yet, DeepSeek V3.1, on Hugging Face. At 685 billion parameters, it has already sparked debates in global AI circles.

What caught attention wasn’t just the scale but also its hybrid design. Instead of splitting reasoning and normal tasks into different models, DeepSeek merged both into one. For many developers, that’s the kind of efficiency they’ve been waiting for. And given its open-source MIT license, it has suddenly become accessible in ways that most U.S. models are not.

What makes DeepSeek V3.1 different?

DeepSeek calls this a hybrid thinking model. It can switch between “thinking mode” and “non-thinking mode” by simply changing the chat template. That means the same model can handle casual queries and then shift into deeper reasoning when needed.

A few standout features:

Hybrid thinking: No need for separate reasoning-only models anymore.
Smarter tool use: Post-training work has improved how it handles tool calling and agent-style tasks.
128K context window: On par with open models like Google’s Gemma 3 and OpenAI’s gpt-oss.
Efficient MoE architecture: Although the model is 685B parameters, only 37B are active per token, keeping costs low.

This “Mixture of Experts” design is key. It’s how DeepSeek manages to deliver strong results without forcing developers to pay a fortune in compute.

Benchmarks and cost comparisons

Early numbers look promising. DeepSeek-V3.1 scored 71.6% on the Aider coding benchmark, putting it slightly ahead of Anthropic’s Claude Opus 4. And the cost difference is jaw-dropping. According to reports, completing a coding task with DeepSeek costs around $1.01, compared to nearly $70 for proprietary competitors.

On reasoning tasks, testers claim it solved tough problems like the “bouncing ball in a rotating shape,” a challenge that often trips up even advanced models. It also showed strength in maths, building on the success of its predecessor, which had already done well on AIME and MATH-500.

Training scale and data choices

DeepSeek isn’t just scaling big; it’s tweaking training methods too. V3.1 was built on top of its earlier V3 base using a two-phase long-context extension. The 32K extension phase was stretched to 630B tokens, while the 128K phase reached 209B tokens. It also used the new UE8M0 FP8 scale format for training, which makes it compatible with modern micro-scaling hardware.

For context, DeepSeek’s older V2 model cost just $5.6 million for a single training run. Even if V3.1 is higher, it’s still remarkably cheap compared to U.S. labs that spend hundreds of millions.

Why this matters for the AI race

The release timing is telling. DeepSeek-V3.1 landed just weeks after OpenAI’s GPT-5 and Anthropic’s Claude 4.1. By putting out a near-competitive open-weight model, DeepSeek is directly questioning the closed and expensive models of U.S. firms.

Sam Altman himself admitted to CNBC recently that competition from Chinese open-source models pushed OpenAI to release its own open weights. And with DeepSeek climbing Hugging Face’s trending charts within hours, it’s clear the community is paying attention to technical merit above politics.

The real question now is whether DeepSeek has abandoned its DeepSeek-R2 plans, since this hybrid design already covers reasoning. Their upcoming technical report might give answers.

What’s next?

For now, DeepSeek-V3.1 is available on Hugging Face under the MIT license. Variants are expected, but details remain unclear. Developers are already experimenting, and industry analysts say this could mark another turning point in open-source AI.

DeepSeek V3.1: China’s 685B parameter open-weight AI model explained

Chinese AI startup DeepSeek has launched DeepSeek V3.1, a massive 685B parameter open-source AI model on Hugging Face. The hybrid system combines reasoning and chat in one model, boasts a 128K context window, and rivals GPT-5 and Claude 4.1 in performance, but at a fraction of the cost.

What makes DeepSeek V3.1 different?

Benchmarks and cost comparisons

Training scale and data choices

Why this matters for the AI race

What’s next?

Latest

WATCH: Panic grips Mirpur stadium after 5.5 magnitude earthquake halts Bangladesh-Ireland Test

From development to deployment: The journey of India's Tejas fighter jet

JWST spots giant black hole growing too fast in early Universe, shocks scientists

Looking for rented home in Bengaluru? This man found one with Rs 1 Lakh per month charge

UP cop submits ‘forged’ acquittal order in 2005 case, booked in fresh FIR

Govt could raise coverage limits of PMJJBY & PMSBY: Know the details

Major cricket tournament shifted from Delhi to Mumbai due to severe air pollution

Ashes 1st Test: Starc, Stokes highlight enthralling opening day at Perth

After thaw in relations, India reopens global tourist visas for Chinese nationals

{{ item.title }}