DeepSeek-V3.2 scores big in maths and coding beats GPT-5 in key tests
DeepSeek-V3.2 has shown strong gains in reasoning and coding tasks, including top scores in global competitions like AIME and Codeforces. The Speciale version even surpasses GPT-5-High in several key benchmarks. With lower compute needs, the model could attract fast adoption in India.
New Delhi: China’s DeepSeek has rolled out DeepSeek-V3.2, and the company says the model combines strong reasoning, advanced agent abilities, and much faster compute efficiency. attention.
DeepSeek says the model delivers "high computational efficiency with superior reasoning and agent performance.” The charts shared by the company show DeepSeek-V3.2 outperforming GPT-5-High and Claude-4.5-Sonnet in several reasoning tests and agent tasks.
DeepSeek-V3.2 aims to solve reasoning and tool-use together
The model works on three big changes in its design. DeepSeek calls the first one DeepSeek Sparse Attention. The system reduces compute load for long content while keeping accuracy high. This could help with large legal text, long code files, or enterprise chat sessions.
Second, they have scaled their reinforcement learning approach. This helps the model solve maths, logic and competition-style problems better. DeepSeek said its high compute variant Speciale "surpasses GPT-5” and performs close to Gemini-3.0-Pro in reasoning benchmarks.
The company celebrated one milestone strongly. It said the model achieved "Gold-medal performance in the 2025 International Mathematical Olympiad (IMO) and International Olympiad in Informatics (IOI).”
The third innovation is a new pipeline for training agent behaviour. It creates huge amounts of synthetic data for tool-based tasks, such as automated coding, browser actions, and file search.
DeepSeek v3.2 Benchmarks
Charts released by the team show the Speciale version leading in multiple tests
• 96 percent Pass@1 in AIME 2025
• 99.2 percent in HMMT 2025
• Codeforces score of 2701
• 46.4 percent on Terminal Bench 2.0
• 80.3 percent on T2 Bench
In Codeforces, it ranks higher than GPT-5-High and very close to Gemini-3.0-Pro. In hands-on coding tasks like SWE verified, it is again in the top cluster.
Codeforces and software engineering tasks reflect real industry jobs like debugging code, shipping fixes, or verifying production systems.
Changes for developers using the model
DeepSeek has also updated how the model formats conversations. It introduced a new "thinking with tools” flow and a separate role named developer just for search agent scenarios. The company shared Python code samples but warned that the output parser "is not suitable for production use without robust error handling.”
There is no Jinja template provided in this release.
Focus on decentralised verification and trust
The company has released the full answer submissions for IMO 2025, IOI, ICPC World Finals and CMO 2025 for the community. This allows independent checking of the exact work the model did. It shows DeepSeek wants credibility in reasoning rather than only leaderboard claims.
What it means for the AI race
DeepSeek-V3.2 looks designed to challenge US firms on two fronts at the same time. Better reasoning and lower compute cost. That could appeal to India, where startups and institutions want stronger models without expensive GPUs.