Alibaba Releases Qwen 3.5 Small Models for On-Device AI from 0.8B to 9B Parameters

Alibaba's Qwen team launches the Qwen 3.5 Small Model Series — five models from 0.8B to 9B parameters designed for edge deployment — with the 9B flagship matching or surpassing larger open-source models on coding and reasoning benchmarks.

TechDrop Editorial

March 2, 2026

Alibaba's Qwen team has launched the Qwen 3.5 Small Model Series — a family of five language models ranging from 0.8 billion to 9 billion parameters, designed specifically for on-device deployment on smartphones, edge servers, and embedded systems where cloud connectivity is unreliable or where data privacy requirements prevent sending information to remote servers.

Model Lineup

The series includes models at 0.8B, 1.5B, 3B, 6B, and 9B parameters, each available in both Base and Instruct variants. The 9B flagship model is the headline performer: on GPQA Diamond (graduate-level science questions), it scores 81.7 — surpassing several models more than 10x its size. On LiveCodeBench, the coding evaluation benchmark, the 9B model performs competitively with models in the 30-70B parameter range. The smaller models trade performance for efficiency, with the 0.8B model small enough to run on mid-range smartphones.

Edge AI Applications

The small model series targets applications where running inference locally is either necessary or preferred: voice assistants that work offline, on-device document processing for privacy-sensitive industries (healthcare, legal, finance), smart home devices, and automotive infotainment systems. For these use cases, the ability to run a capable language model directly on the device — without network latency or cloud API costs — provides both a better user experience and compliance with data residency requirements.

Competitive Context

The release completes a nine-model launch across 16 days by the Qwen team, demonstrating Alibaba's aggressive push to establish Qwen as a dominant open-source AI platform. The small model series competes directly with Meta's Llama family, Google's Gemma, Microsoft's Phi, and Apple's on-device models. With the models available on Hugging Face and ModelScope, developers can download and deploy them without licensing restrictions — continuing the pattern of Chinese AI labs using open-source distribution to build ecosystem adoption.

AI & Machine Learning 2 min read

NVIDIA GTC 2026 Keynote: Jensen Huang Unveils Vera Rubin Platform and Six New Chips

NVIDIA CEO Jensen Huang opened GTC 2026 in San Jose with the formal unveiling of the complete Vera Rubin GPU platform — six new chips featuring 288 GB of HBM4 memory, 336 billion transistors, and 50 PetaFLOPS of FP4 performance. Over 30,000 attendees from 190 countries gathered for the AI industry's most anticipated annual event.

AI & Machine Learning 2 min read

OpenAI Acquires Promptfoo to Strengthen AI Agent Security and Red-Teaming

OpenAI has agreed to acquire Promptfoo, the open-source AI security and red-teaming platform used by over 25% of the Fortune 500, in a deal that will integrate the tool directly into OpenAI's enterprise agent platform. The acquisition signals OpenAI's growing focus on safety infrastructure as it pushes deeper into autonomous AI agent deployment.

AI & Machine Learning 2 min read

NVIDIA Releases Nemotron 3 Super: Open 120B-Parameter Model Targets Enterprise Agentic AI

NVIDIA has released Nemotron 3 Super, a 120-billion-parameter open-weights model built on a hybrid Mamba-Transformer architecture with a one-million-token context window. The model delivers 5x throughput improvements over its predecessor and is designed specifically for enterprise agentic AI workflows.

Model Lineup

Edge AI Applications

Competitive Context

Related Articles

NVIDIA GTC 2026 Keynote: Jensen Huang Unveils Vera Rubin Platform and Six New Chips

OpenAI Acquires Promptfoo to Strengthen AI Agent Security and Red-Teaming

NVIDIA Releases Nemotron 3 Super: Open 120B-Parameter Model Targets Enterprise Agentic AI