Your AI receptionist, live in 3 minutes. Free to start →

ClawBench Top Large Models: What the Current Top 10 Reveal About Model Performance

Written byIvy Chen
Last updated: March 31, 2026Expert Verified

If you are searching for clawbench top large models, you probably do not want a long abstract explanation of benchmark theory.

You want to know which models are near the top, how they compare with each other, and what the leaderboard actually tells you about real model performance.

That is the right way to read this kind of ranking.

ClawBench matters because it is closer to agent-style evaluation than a normal static leaderboard, but this article is mainly about the models themselves: who is ranking near the top, what kind of strengths those models appear to have, and how to interpret the differences between them.

TL;DR

  • The current top 10 on ClawBench are led by GLM-5-Turbo, Doubao-Seed-2.0-lite, GPT-5.4, MiniMax-M2.5, and MiniMax-M2.7.
  • The leaderboard is tight, which suggests the strongest models are competing in a narrow performance band.
  • The most interesting differences are not only score differences, but also cost, speed, and value tradeoffs.
  • Some models look strongest on raw Claw Score, while others stand out more on efficiency or deployment practicality.
  • The best model depends on whether you care most about benchmark leadership, lower cost, faster speed, or overall balance.

The Current Top 10 Large Models on ClawBench

Based on the leaderboard screenshot used for this article, the current ClawBench top 10 large models are:

  1. GLM-5-Turbo — Claw Score 93.9
  2. Doubao-Seed-2.0-lite — Claw Score 93.1
  3. GPT-5.4 — Claw Score 92.2
  4. MiniMax-M2.5 — Claw Score 92.1
  5. MiniMax-M2.7 — Claw Score 91.7
  6. GLM-5 — Claw Score 91.7
  7. Claude Opus 4.5 — Claw Score 91.5
  8. Qwen3.5-35B-A3B — Claw Score 91.4
  9. MiMo-V2-Omni — Claw Score 91.2
  10. Qwen3.5-397B-A17B — Claw Score 90.0

The first thing to notice is how compressed the ranking is. The spread from first to tenth is less dramatic than people might expect from a “top models” chart, which usually means that absolute ranking position is only one part of the story.

The second thing to notice is that the leaderboard is not monopolized by one provider. It includes products from Z.ai, ByteDance, OpenAI, MiniMax, Anthropic, Alibaba, and Xiaomi, which makes the comparison more useful because it captures several different product philosophies rather than one single ecosystem.

What the Top of the Leaderboard Looks Like

The top tier is currently built around five names:

  • GLM-5-Turbo
  • Doubao-Seed-2.0-lite
  • GPT-5.4
  • MiniMax-M2.5
  • MiniMax-M2.7

That group is important because it represents the models that are closest to the benchmark ceiling right now.

But they are not all “top” in the same way.

Some appear strongest because of raw Claw Score. Some look more attractive because of cost efficiency. Some stand out more because of speed. And some seem strongest if you want a more balanced score-cost profile rather than an all-out push for first place.

Your AI Receptionist, Live in Minutes.

Scale your front desk with an AI that never sleeps. Solvea handles unlimited multi-channel inquiries, books appointments into your calendar automatically, and ensures zero missed opportunities around the clock.

Start for Free

That is why reading a leaderboard well means looking at more than the first column.

GLM-5-Turbo: The Current Leader

GLM-5-Turbo is currently in first place with a 93.9 Claw Score.

That makes it the headline leader of the chart and the clearest answer to the question, “Which model is currently on top?”

What makes GLM-5-Turbo especially notable is that it does not appear to win only on raw score. Based on the leaderboard snapshot, it also looks more practical than some nearby premium competitors on cost. That matters because a first-place model is much more interesting when its economics do not immediately price it out of real deployment.

So the strongest takeaway here is not only that GLM-5-Turbo leads. It is that it currently looks like a leader without the same kind of pricing penalty attached to some other frontier-tier models.

Doubao-Seed-2.0-lite: The Most Interesting Value Story

If there is one model in the top 10 that immediately stands out on value, it is Doubao-Seed-2.0-lite.

It ranks second with a 93.1 Claw Score, which already puts it extremely close to the top of the leaderboard. But the more interesting part is that its listed cost looks far lower than several nearby competitors, while its value metric appears much stronger.

That changes the interpretation completely.

Doubao-Seed-2.0-lite does not just look like a strong model. It looks like one of the most attractive score-to-cost options near the top of the table. For teams that care about production economics rather than just bragging rights, that can matter more than the difference between first and second place.

GPT-5.4: Premium Performance With a Premium Cost

GPT-5.4 ranks third with a 92.2 Claw Score.

That result keeps it firmly in the top tier, and it supports the idea that OpenAI remains highly competitive in agent-style benchmark settings. Readers who want the official product context can compare against the OpenAI platform documentation.

But the leaderboard also makes something else clear: GPT-5.4 appears significantly more expensive than many of the models around it.

That does not make it weak. It makes it a different kind of choice.

A model like this may still be very attractive if your priority is premium performance, broad ecosystem familiarity, or trust in a mature provider stack. But if your primary goal is maximizing performance per unit cost, the chart suggests there are other models that may look more efficient.

MiniMax-M2.5 and MiniMax-M2.7: The Balance Play

The two MiniMax entries are especially interesting because together they look like a statement about balance.

MiniMax-M2.5 is especially notable because it is very close to GPT-5.4 on score while appearing much cheaper. That alone makes it one of the strongest efficiency-oriented entries in the upper leaderboard.

MiniMax-M2.7 is slightly lower on score and appears slower than some nearby alternatives, but it still remains firmly inside the top five. That suggests the MiniMax family is not just competitive in one narrow way. It looks like a serious contender across the board.

For many operators, that kind of near-top performance with more practical economics can be more attractive than chasing the absolute number one spot.

GLM-5 vs GLM-5-Turbo: A Useful Internal Comparison

One of the most informative parts of the leaderboard is that it includes both GLM-5-Turbo and GLM-5.

  • GLM-5-Turbo: 93.9
  • GLM-5: 91.7

That comparison matters because it shows the Turbo variant is not just a cheaper or simplified branch. On this leaderboard, it is actually the higher-ranked one.

That makes the result especially practical. It suggests that in this benchmark setting, the Turbo line may currently offer the better score-performance story than the base model.

When a cheaper or more deployment-friendly variant outranks its sibling, people should pay attention.

Claude Opus 4.5: Strong, But Expensive

Claude Opus 4.5 comes in seventh with a 91.5 score.

That is still a top-tier result. It confirms Anthropic remains highly relevant in serious model comparisons, and readers looking for product context can check the official Claude page.

But the ClawBench snapshot also makes the tradeoff visible. Claude Opus 4.5 appears to carry one of the highest listed costs in the top 10.

That means the model may still be a strong fit when quality matters more than price. But if you are reading the leaderboard through a deployment lens, the question becomes harder. You are not asking only, “Is Claude Opus 4.5 good?” You are asking, “Is it good enough to justify this cost relative to nearby alternatives?”

That is a more serious question, and it is the kind of question rankings like this should provoke.

The Qwen Entries: Open-Weights Strength Still Matters

The presence of Qwen3.5-35B-A3B and Qwen3.5-397B-A17B in the top 10 is important.

  • Qwen3.5-35B-A3B ranks eighth at 91.4
  • Qwen3.5-397B-A17B ranks tenth at 90.0

The first takeaway is obvious: the Qwen family is still highly competitive in this benchmark context.

The second takeaway is more practical. Qwen models tend to attract attention not only because of performance, but because of deployment flexibility and the broader open-weights ecosystem around them. The official Qwen GitHub organization is useful if you want that ecosystem context.

That means their presence in the top 10 is not just technically interesting. It matters for teams that want stronger control over infrastructure, model access, or customization paths.

MiMo-V2-Omni: The Speed Story

MiMo-V2-Omni ranks ninth at 91.2, but what makes it especially interesting is not only the score.

It also appears to be one of the faster entries on the leaderboard.

That matters because speed often gets undervalued in benchmark discussions. In real products, speed can shape the entire user experience. A slightly lower-ranked model that responds much faster may create a better workflow in practice than a higher-ranked model with heavier latency.

So MiMo-V2-Omni stands out as a reminder that not every useful model story is a raw-score story.

What the Top 10 Reveals About the Market

The current leaderboard reveals a few broader patterns.

1. The top tier is crowded

There is no massive performance cliff between first and tenth place. That means the frontier is competitive.

2. Cost matters more than ever

Several of the most interesting entries are interesting precisely because they are not the most expensive ones.

3. Speed is still underrated

A model that is fast enough and strong enough may be more useful than a model that is slightly better but much slower.

4. Open ecosystems still matter

The Qwen entries show that open-weights families are still part of serious benchmark conversations.

A Short Note on How ClawBench Evaluates Models

Since this article is mainly about model performance rather than benchmark theory, the short version is enough.

According to the official ClawBench repository, the benchmark runs models inside an isolated sandbox across 30 advanced tasks spanning five business scenarios: Office Collaboration, Information Retrieval and Research, Content Creation, Data Processing and Analysis, and Software Engineering.

It uses three grading approaches:

  • Automated grading for deterministic tasks
  • LLM judge grading for qualitative tasks
  • Hybrid grading for workflows that need both hard checks and softer judgment

That matters because the ranking is trying to capture agent-style performance rather than simple one-shot answer quality.

How to Read This Ranking Correctly

The smartest way to read the current clawbench top large models table is not to ask only who is number one.

Instead, ask:

  • Which model leads on raw score?
  • Which model looks strongest on value?
  • Which model looks strongest on speed?
  • Which model looks best for open deployment flexibility?
  • Which model looks most practical for the kind of system you actually want to build?

That gives you a much more useful reading of the chart.

Final Verdict

If you want the clearest conclusion, it is this: the current ClawBench top large models ranking is most valuable when you read it as a performance map, not just a race.

Yes, GLM-5-Turbo currently leads. Yes, Doubao-Seed-2.0-lite, GPT-5.4, and the MiniMax entries are close behind. But the bigger story is how differently these models seem to win.

Some are strongest on raw Claw Score. Some are more attractive on cost. Some look better on speed. Some matter because of ecosystem flexibility.

That is why this leaderboard is useful. It does not only tell you who is in front. It helps you see what kind of “best” each model might represent.

FAQ

What are the current top large models on ClawBench?

Based on the leaderboard screenshot used here, the current top 10 are GLM-5-Turbo, Doubao-Seed-2.0-lite, GPT-5.4, MiniMax-M2.5, MiniMax-M2.7, GLM-5, Claude Opus 4.5, Qwen3.5-35B-A3B, MiMo-V2-Omni, and Qwen3.5-397B-A17B.

Which model currently ranks first on ClawBench?

GLM-5-Turbo currently ranks first with a Claw Score of 93.9.

Why is Doubao-Seed-2.0-lite especially notable?

Because it ranks near the very top while also appearing much stronger on cost-value tradeoffs than several nearby competitors.

иалахәanalysis to=functions.exec 大发分分彩 娱乐开号json_output={

AI RECEPTIONIST

The simplest way to never miss a customer — phone, email, SMS, or chat

PhoneEmailSMSLive Chat

Solvea answers every conversation across every channel — set up in minutes with no code, templates included.

  • Works 24/7 without breaks or overtime
  • No-code setup with ready-to-use templates
  • Connects to the tools you already use
  • Omnichannel — one agent, every touchpoint
Try for free

No card required