Skip to main content
AI & Machine Learning 3 min read 505 views

Alibaba Launches Qwen 3.5: Open-Weight Model Claims to Beat GPT-5.2 and Claude on 80% of Benchmarks

Alibaba releases Qwen 3.5, a 397-billion-parameter mixture-of-experts model with visual agentic capabilities, claiming benchmark superiority over GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro while costing 60% less to run than its predecessor.

TD

TechDrop Editorial

Share:

Alibaba released Qwen 3.5 on February 16, 2026, a mixture-of-experts model with 397 billion total parameters and 17 billion active parameters per inference pass. The model claims to outperform OpenAI's GPT-5.2, Anthropic's Claude Opus 4.5, and Google's Gemini 3 Pro on 80% of evaluated benchmarks — though these are self-reported results that have not been independently verified.

Architecture and Performance

Qwen 3.5 uses a mixture-of-experts (MoE) architecture, which routes each input to a subset of the model's total parameters rather than activating all parameters for every token. With 17 billion active parameters out of 397 billion total, the model achieves a favorable trade-off between capability and compute cost: it delivers frontier-class performance while costing 60% less to run than its predecessor and processing large workloads 8x faster.

The benchmark scores are competitive with the best Western closed-source models: 83.6 on LiveCodeBench v6 (coding), 91.3 on AIME26 (mathematical reasoning), and 88.4 on GPQA Diamond (graduate-level science). These scores place Qwen 3.5 in the same performance tier as GPT-5.2 and Claude Opus 4.6 on standardized evaluations, though benchmark performance does not always translate linearly to real-world application quality.

Visual Agentic Capabilities

Qwen 3.5 introduces what Alibaba calls "visual agentic capabilities" — the ability to autonomously interact with mobile and desktop applications by observing screen content and generating actions. This positions Qwen 3.5 alongside Anthropic's computer use feature and Google's Project Astra as a model that can operate software interfaces rather than just generate text about them. The practical applications include automated testing, workflow automation, and accessibility assistance.

Language Coverage

The model supports 201 languages and dialects, up from 82 in the previous Qwen generation. This expansion significantly broadens the model's addressable market in regions that are underserved by English-centric Western models, and is consistent with Alibaba's commercial interest in serving customers across Asia, the Middle East, Africa, and Latin America through its cloud and e-commerce platforms.

Open Weights and Market Impact

Qwen 3.5 is available as an open-weight model, meaning the trained parameters are publicly downloadable and can be run, fine-tuned, and deployed by any organization. This positions it as a direct alternative to Meta's Llama series and a challenge to the closed-source models from OpenAI and Anthropic that require API access and per-token pricing.

The release timing — on the eve of the Chinese Lunar New Year — was strategic, landing ahead of an anticipated DeepSeek V4 release. The Chinese AI ecosystem is now producing multiple frontier-competitive models per quarter, compressing the capability gap between Chinese and American labs and intensifying the competitive pressure on pricing, performance, and openness across the global AI market.

Related Articles