Controversy Over US Government's Undervaluation of Chinese AI: The Gap Between NIST Reports and Technical Reality
A NIST evaluation report released on May 4, 2026, is facing criticism for underestimating Chinese AI capabilities. Data from independent organizations like Stanford HAI suggests the gap between the two nations is only 2.7%, with the rapid progress of DeepSeek and Alibaba threatening US technological hegemony.
On May 4, 2026, the AI Safety Institute (CAISI) under the US National Institute of Standards and Technology (NIST) released evaluation results stating that major Chinese AI models still lag behind US models by a significant margin. However, the tech community and industry experts are raising strong skepticism regarding this government announcement. Critics point out that the evaluation methodology used by NIST was designed to favor US frontier models and intentionally ignored the rapid efficiency improvements achieved by Chinese labs such as DeepSeek and Alibaba.
NIST's CAISI evaluation applied a 'cost-performance filter' when assessing DeepSeek V4 Pro, excluding all major US models except for GPT-5.4 mini from the comparison. This is a convenient methodology tailored to policy objectives rather than reflecting technical reality.
NIST's latest evaluation aligns with previous analysis from the Brookings Institution, which suggested that Chinese AI models are several months behind the most advanced US models. Brookings argues that US models maintain a lead across a wide range of benchmarks, including mathematical reasoning, code generation, and long-term agent tasks. However, this government perspective shows a significant disconnect from technical indicators observed in the actual market.
Stanford HAI's 2026 Reality Check: A 2.7% Gap
According to the AI Index Report released in March 2026 by the Stanford Institute for Human-Centered AI (HAI), the performance gap between the top-tier models of the US and China was found to be only 2.7%. Since early 2025, models from both countries have frequently swapped positions in performance rankings, engaging in a fierce competition. In particular, since DeepSeek-R1 recorded performance on par with the top US models in February 2025, the gap between the two nations has repeatedly fluctuated within single digits, leading to a dominant analysis that they have effectively reached a state of technical equilibrium.
- DeepSeek V3.2 proved its technical superiority by outperforming GPT-4.5 in mathematical reasoning and coding evaluations.
- Alibaba's Qwen 3 235B model received an S-tier rating on the Open LLM Leaderboard, gaining recognition for its technical reasoning capabilities.
- While Meta's Llama 4 Scout maintains its position as an open-source leader by providing a 10-million-token context window, the catch-up speed of Chinese models is overwhelming it.
- Chinese AI startup MiniMax secured capital by raising $619 million through a Hong Kong IPO in January 2026.
Despite US chip export regulations, Chinese companies are breaking through technical limitations by utilizing highly efficient architectures and 'regulation-compliant' chips. While Nvidia's H20 chip was designed to bypass US training performance threshold regulations, it has occasionally recorded faster speeds than the H100 in terms of inference performance. Chinese labs are adopting strategies to maximize algorithmic efficiency to overcome these hardware constraints, which is ultimately acting as a factor that neutralizes the US advantage in computing resources.
As the gap between technical reality and government evaluations grows, the US Congress has stepped in to demand a more objective diagnosis. On May 2, 2026, US lawmakers began pushing for legislation that mandates the first comprehensive review of China's AI capabilities. According to the bill, the relevant report must be submitted within 180 days after the passage of the fiscal year 2027 national security and State Department budget. This reflects distrust in existing government evaluations and suggests that the upcoming 180-day precision investigation will be a decisive moment in reshaping the US technology strategy toward China.
Infrastructure Gap: The Last Bastion Maintained by the US
While the gap in model performance has narrowed, the US still maintains a strong lead in terms of data center infrastructure and the scale of capital investment. According to a report by MeriTalk, the US leads China in securing large-scale computing resources and power infrastructure, which serves as a core competitive advantage for training next-generation models. Although China's technical efficiency is offsetting its hardware shortage, the absolute difference in the scale of physical infrastructure remains a challenge to be overcome.
US companies are pouring astronomical capital into expanding infrastructure, which contrasts with the difficulties Chinese startups face in private capital markets. Although companies like MiniMax continue to succeed in fundraising, the speed of infrastructure investment led by US Big Tech is evaluated as a strategic asset at the national level. This infrastructure superiority is likely to become the only differentiating factor for the US in a situation where model performance is becoming leveled.
Ultimately, the results of the comprehensive review to be conducted over the next 180 days are expected to be a watershed moment determining the future strategic direction of the US. Whether the US will further strengthen existing export regulations to accelerate a technological blockade or shift its strategic focus to domestic infrastructure acceleration and technical innovation depends on this report. The remarkable resilience and efficiency shown by Chinese AI models are posing fundamental questions to the US strategy for maintaining technological hegemony.
| Model | Developer | Key Strength | Benchmark Status |
|---|---|---|---|
| DeepSeek V3.2 | DeepSeek (China) | Mathematics & Coding | Surpassed GPT-4.5 |
| GPT-5.4 mini | OpenAI (US) | General Knowledge | NIST CAISI Baseline |
| Qwen 3 235B | Alibaba (China) | Technical Reasoning | S-Tier Open LLM Leaderboard |
| Llama 4 Scout | Meta (US) | Context Window (10M) | Open-Source Leader |
Comparison of top Chinese and US models across specialized benchmarks.



This content is for information and commentary only and is not investment advice.
Join the reader conversation
Read reactions to this article and leave your own note.