Qwen 3.7 Preview Scored 57 AAI: 27B/35B Open Weights Next
Qwen 3.7 Max preview landed on Alibaba’s API on May 19, 2026, and scored 57 on Artificial Analysis’s Intelligence Index. That’s a five-point jump over Qwen 3.6 Max Preview’s 52. AA places the new model at #1 of 218 ranked entries on its public leaderboard. On Arena AI’s text leaderboard the model sits at 1,489 Elo, ranked #14 overall.
What InsiderLLM readers actually want to know is different. When do the open 27B and 35B variants drop, will they work with the same llama.cpp / MTP / DFlash pipeline that runs Qwen 3.6 today, and is it worth holding off on a hardware purchase to wait?
This article walks through what’s verified about the 57 AAI score, what’s been announced for the open weights, and what InsiderLLM still doesn’t know. When the open weights ship, the firsthand bench on RTX 3090 plus RTX 3060 will land within 24 hours of release.
The scoring news
Artificial Analysis published its evaluation of Qwen3.7 Max on May 19, 2026. The headline numbers (source):
- Intelligence Index: 57
- Rank: #1 of 218 ranked models on AA
- Context window: 1 million tokens
- Modality: text in, text out (no image input)
- Mode: reasoning model only (“This page shows the reasoning version”)
- Output tokens generated during eval: 97 million, against a 26 million median for the evaluated set
The 97M output figure is worth a pause. AA describes the model as “very verbose in comparison to the average.” Reasoning models trade tokens for accuracy, and Qwen 3.7’s verbosity sits on the high end even within that category. That has cost implications for anyone running the Max API at scale, and it has timing implications for anyone planning to run an open variant locally.
The Intelligence Index aggregates ten evaluations: GDPval-AA, τ²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity’s Last Exam, GPQA Diamond, and CritPt. The 57 score reflects performance across that whole suite, not any single eval.
For comparison, Qwen 3.6 Max Preview sits at 52 AAI under the same methodology. That’s a +5 absolute jump, or +9.6% relative. Both Max releases are reasoning-classified hosted models with verbose-output profiles.
On Arena AI’s text leaderboard, Qwen 3.7 Max Preview shows 1,489 Elo at rank #14 overall (verified May 20, 2026). Arena Elo is human pairwise preference, a different signal from AA’s aggregate eval score. Both numbers are based on the hosted API model. Neither has any direct relationship to open-weight inference on consumer hardware.
That last point is the whole reason this article exists.
The open-weights situation
Qwen 3.7 ships in two announced tiers:
- Max / Plus preview. Hosted via Alibaba’s API. No weights public. Following the pattern of the Qwen Max line in 3.5 and 3.6, weights are unlikely to be released for the Max tier.
- 27B and 35B open variants. Announced as forthcoming. No public release date. No weights or GGUFs available as of May 20, 2026.
Why the distinction matters: the 57 AAI score is for the Max model. Local AI readers care about the 27B and 35B. There is no reliable way to predict open-weights quality from Max scoring. Alibaba has not claimed parity. Precedent from 3.6: Max Preview scored 52 AAI; the open 27B (reasoning) scored 46 AAI on the same methodology — a six-point gap. If 3.7 follows the same pattern, the open 27B lands around 51 AAI. That’s essentially tied with 3.6 Max Preview (52), well below 3.7 Max (57), and still in the strong open-weights tier.
The architecture for the 27B and 35B is unannounced. The following is informed speculation, flagged as speculation:
- The 27B will probably stay dense, inheriting Qwen 3.6’s hybrid Gated DeltaNet plus Gated Attention layout.
- The 35B is likely to remain a Mixture-of-Experts model, plausibly carrying forward the 35B-A3B configuration (3 billion active parameters per token). This is a guess, not a confirmed spec.
- MTP layer availability depends on whether Alibaba ships MTP-trained variants the way they did for parts of the Qwen 3.6 family. If they don’t, the community would need to train MTP layers from scratch.
- DFlash and EAGLE3 compatibility depends on draft model training. None exists yet for 3.7.
- Expected VRAM range for the 27B at Q4_K_M: roughly 15-16 GiB, similar to 3.6. For 35B-A3B at Q4: roughly 18-22 GiB. Both numbers are extrapolations from 3.6 and should not be quoted as confirmed.
GGUF release timing typically lags weight drop by 24 to 72 hours. Bartowski, Unsloth, and RDson have shipped Qwen quants within that window on previous releases. None of that timing is guaranteed for 3.7.
What this means for local AI buyers
Four scenarios with clear guidance.
Already running Qwen 3.6 27B or 35B on a 24GB card. Stay put. The stack works, MTP is mainline-bound through PR #22673, and the DFlash plus MTP comparison on the same hardware documents 1.5x to 2.5x speedup options today. Wait for weights, wait for benches, then re-evaluate.
Planning a hardware purchase soon (RTX 4090, RTX 5090, Mac Studio). No reason to delay. 24GB runs the 27B comfortably. 32GB and up handles the 35B with room to spare. The hardware decision is independent of which Qwen generation a buyer targets, because consumer-GPU VRAM ceilings move slowly compared to model releases.
Running Qwen 3.5 or older locally. The 3.6 family is the more practical upgrade. It’s available now, has 60 tok/s MTP on RTX 3090, and full coverage of the 27B dense vs 35B MoE choice is already published. Skipping 3.6 to wait for 3.7 means leaving real performance on the table for an unknown release window.
API users evaluating Qwen 3.7 Max. That is a different question than local AI. The hosted Max preview competes with Claude Opus, GPT-5.5, and DeepSeek R1 on reasoning quality and per-token cost. InsiderLLM’s domain is local hardware. Cloud comparisons are out of scope here.
What’s still missing
Five concrete unknowns, none of which the launch announcement addressed:
- Local inference performance. Zero benches are possible until weights ship. Anyone publishing numbers today is either repackaging API results or guessing.
- MTP layer availability. If Alibaba doesn’t ship MTP-trained variants, mainline llama.cpp speculative decoding via PR #22673 won’t work day-zero on 3.7. The community would have to train MTP layers post-release.
- GGUF release timing. Estimated at 24-72 hours after weights drop based on prior cadence. Not guaranteed.
- DFlash / EAGLE3 compatibility. Depends on draft model training that hasn’t happened yet. The speculative decoding primer covers why draft availability matters.
- MoE variant configuration. Whether the 35B is A3B-style, a different MoE shape, or replaced entirely is unknown.
Anyone claiming firm answers on these points today is speculating. InsiderLLM’s commitment is to flag the speculation as such until evidence ships.
How InsiderLLM is preparing
When the open weights drop, the editorial plan is:
- Day 0. Initial article with download links, model card analysis, build requirements, and setup notes for llama.cpp.
- Day 1-2. Firsthand bench on RTX 3090 plus RTX 3060 12GB using am17an’s gist harness, the same nine-prompt setup as the existing 3.6 benches.
- Day 3-5. MTP compatibility report (does PR #22673 understand the 3.7 layers?) and DFlash compatibility report (does an EAGLE3 draft exist for 3.7?).
- Week 1. Head-to-head Qwen 3.6 versus Qwen 3.7 on identical hardware, identical harness.
Reference points readers can use today, all firsthand:
- Qwen 3.6 complete guide
- Wicked Fast Qwen 3.6 27B with MTP on RTX 3090, 60 tok/s on Miu
- Best way to run Qwen 3.6 35B MoE locally
- DFlash vs MTP head-to-head on RTX 3090
- Speculative decoding explained
What we’re watching today (May 20)
Alibaba Cloud Summit is today. Possible outcomes:
- Best case. 27B and 35B open weights drop during the keynote, with day-zero Hugging Face uploads and Apache 2.0 licensing.
- Likely case. Announcement with a vague timeline (“coming weeks” or similar) and no immediate weight release.
- Worst case. Continued emphasis on Max and Plus tiers with no concrete open-weights commitment.
Whichever path the summit takes, this article will be updated as facts settle, and the firsthand bench piece will ship within 24 hours of weights becoming available.
Get notified when we publish new guides.
Subscribe — free, no spam