OpenAI released GPT-5.5 — codenamed "Spud" — on April 23, 2026. According to OpenAI co-founder and president Greg Brockman, the model marks a significant step "towards more agentic and intuitive computing." In plain terms: it's less about chatting and more about finishing tasks across multiple tools and steps without you managing every move.
GPT-5.5 scores 82.7% on Terminal-Bench 2.0 and 58.6% on SWE-Bench Pro — benchmarks that test complex command-line workflows and real-world GitHub issue resolution. OpenAI's Chief Research Officer Mark Chen described it as showing "meaningful gains on scientific and technical research workflows," with potential to assist in areas like drug discovery. It's OpenAI's strongest agentic model to date.
Community reaction online has been more measured than the press releases suggest. Some engineers call it the first non-Anthropic model worth taking seriously in months. Others say it still listens too literally and hallucinates more than Claude. Here's the full breakdown — what GPT-5.5 actually changes, where it falls short, and who should switch.
TL; DR
Release date | April 23, 2026 |
Codename | Spud |
Biggest upgrade | Agentic coding, multi-step workflows, computer use |
API pricing | $5 / $30 per 1M tokens (input / output) |
Availability | ChatGPT Plus, Pro, Business, Enterprise; API |
What Is GPT-5.5?
GPT-5.5 is OpenAI's latest large language model, built on a new base codenamed "Spud." Released April 23, 2026, it's available in ChatGPT and through the OpenAI API — and it was designed with a different goal than most previous models.
Instead of excelling at single-turn Q&A, GPT-5.5 is purpose-built for agentic tasks: give it a messy multi-step instruction, and it figures out the plan, picks the right tools, checks its own work, and keeps going until the job is actually done. Think of the difference between GPT-5.4 and GPT-5.5 like this — GPT-5.4 was a smart intern who needed clear, step-by-step instructions. GPT-5.5 is more like a competent contractor: tell it what you want, and it handles the how.
The model supports a 1M token context window and is compatible with streaming, function calling, structured outputs, web search, file search, image generation, code interpreter, computer use, and MCP.
GPT-5.5 Features: What GPT-5.5 Actually Changed
It understands the task earlier and asks for less guidance
The most consistent user feedback is that GPT-5.5 requires fewer clarification prompts. McKay Wrigley, an AI developer, called it "incredible" and said "the level to which I trust it for engineering is amazing" — noting he'd pick it as his single model for coding work if forced to choose.
Rory Watts described the gap from GPT-5.4 as significant: "Fast, efficient... quite easily the best experience for daily work I've ever had with an agent. Understands intent, rarely makes obvious mistakes, great with tools."
Compared to GPT-5.4, GPT-5.5:
- Understands task requirements earlier in the conversation
- Uses tools more effectively across multi-step workflows
- Checks its own work before presenting output
- Continues pursuing a goal until the task is complete — not just until it has something to say
Agentic coding: Terminal-Bench 82.7%, SWE-Bench Pro 58.6%
GPT-5.5 achieves 82.7% on Terminal-Bench 2.0 — which tests complex command-line workflows requiring planning, iteration, and tool coordination — and 58.6% on SWE-Bench Pro, which evaluates real-world GitHub issue resolution. According to OpenAI, it solves more tasks end-to-end in a single pass than any previous model.
The gains show up most clearly inside Codex, OpenAI's AI coding assistant. OpenAI's Finance team used Codex to review 24,771 K-1 tax forms totaling 71,637 pages — a workflow that completed two weeks faster than the prior year's manual process.
Speed without sacrificing intelligence
Earlier OpenAI reasoning models had a latency problem. GPT-5.5 addresses this directly: it matches GPT-5.4's per-token latency in real-world serving while performing at a meaningfully higher intelligence level. For typical prompt lengths in agentic workflows (500–2,000 tokens of context), responses start arriving roughly 20–30% faster than GPT-5.4.
It also uses significantly fewer tokens to complete equivalent Codex tasks — which matters because the improved efficiency offsets the higher headline price for most use cases.
GPT-5.5 Use Cases: What It's Actually Good At
Agentic coding and debugging at scale
GPT-5.5 performs best on well-defined engineering tasks where the scope is clear but the path is complex. Real examples from OpenAI's internal teams:
- Comms team: Used Codex to analyze six months of speaking request data, build a scoring and risk framework, and validate an automated Slack agent — so low-risk requests could be handled automatically while higher-risk ones still routed to humans.
- Finance team: Reviewed 24,771 K-1 tax forms (71,637 pages) in a structured workflow that excluded personal information and cut processing time by two weeks.
- Go-to-Market team: Automated weekly business report generation, saving 5–10 hours per week.
Worth noting: Several engineers said Codex with GPT-5.5 follows instructions too literally. If you give it a terse or implicit request, it does exactly what you said — not what you meant. Claude Code currently has an edge here on intent inference.
Research and data synthesis
According to Mark Chen, GPT-5.5 shows meaningful gains on scientific and technical research workflows. For business users, practical applications include:
- Pulling together findings from multiple documents or data sources
- Producing structured reports with a defined methodology
- Cross-referencing information across a long research thread without losing context
The 1M token context window makes GPT-5.5 viable for tasks where you need to hold a large amount of material in mind simultaneously — analyzing a lengthy contract, synthesizing a literature review, or processing a large dataset with a consistent rubric.
Business automation and document workflows
GPT-5.5 can operate software, create documents and spreadsheets, and coordinate across tools to finish multi-step knowledge work. For non-technical business users, this makes it useful for tasks that would otherwise require manual coordination between apps.
OpenAI specifically highlights its ability to "move across tools until a task is finished" — a meaningful shift from models that output text and stop.
GPT-5.5 Pricing and Availability in 2026
ChatGPT plan access:
- GPT-5.5 and GPT-5.5 Thinking: Plus, Pro, Business, Enterprise
- GPT-5.5 Pro: Pro, Business, Enterprise (higher accuracy)
- Codex (powered by GPT-5.5): Plus, Pro, Business, Enterprise, Edu, Go
API pricing (available from April 24, 2026):
Model | Input per 1M tokens | Output per 1M tokens |
GPT-5.5 | $5 | $30 |
GPT-5.5 Pro | $30 | $180 |
Batch / Flex | 50% of standard | 50% of standard |
Priority | 2.5× standard | 2.5× standard |
GPT-5.5 is priced higher than GPT-5.4, but OpenAI reports that token efficiency gains on Codex tasks make the actual per-task cost comparable or lower for agentic workloads. According to the OpenAI API documentation, a 1M token context window is supported across both standard and Pro tiers.
GPT-5.5 vs. Competitors: Who Is It For?
Model | Best for | Watch out for | Who it's for |
GPT-5.5 | Agentic coding, multi-step automation, well-defined tasks | Literal instruction-following, hallucinations | Developers running complex pipelines |
GPT-5.5 Pro | High-accuracy scientific or enterprise tasks | Cost ($180/M output tokens) | Research teams, enterprise ML |
Claude Opus 4.7 | Intent inference, planning, ambiguous instructions | Session limits on lower plans | Writers, strategists, implicit requests |
Gemini 3.1 Pro | Vision tasks, multimodal workflows | Weaker on pure-text agentic coding | Teams in Google Workspace |
According to Zvi Mowshowitz's LessWrong analysis, this marks the first time in roughly four months that a non-Anthropic model represents serious competition for agentic and coding use cases. "Basically everyone thinks this is a solid upgrade."
Tom's Guide tested GPT-5.5 against Claude Opus 4.7 across seven categories and found Claude winning each — but praised GPT-5.5's speed. Many power users are adopting a hybrid approach: GPT-5.5 for well-specified engineering tasks, Claude for tasks requiring intent inference or ambiguous instructions.
What Early Users Are Actually Saying About GPT-5.5
Reception in MacRumors forums and AI communities has been more nuanced than launch headlines suggest.
What users praised:
- Speed: noticeably faster first-token response, especially on longer prompts
- Agentic reliability: finishes multi-step tasks without constant nudges
- Coding accuracy on well-scoped problems
What users criticized:
- Hallucination rate: makes more factual claims per response, and doesn't always flag uncertainty the way Claude does
- Literal instruction-following: executes exactly what you typed, not what you intended
- Market fatigue: "Every time I breathe, some LLM is getting released" was one comment that captured broader sentiment
One commenter on MacRumors who switched from ChatGPT to Claude noted they were "intrigued" by GPT-5.5 but sticking with Claude for now. Another said GPT-5.5 was "too conservative when it comes to actually making code changes — which improves token efficiency but comes at the cost of correctness."
The safety story is stronger than past releases: OpenAI collected feedback from nearly 200 trusted early-access partners and ran targeted testing across cybersecurity and biology before launch. The full framework is documented in the GPT-5.5 System Card.
Conclusion
GPT-5.5 is a meaningful upgrade for agentic coding and multi-step automation. It understands tasks earlier, completes more in a single pass, and is measurably faster than GPT-5.4 without giving up intelligence. For developers running complex pipelines, it's worth evaluating now — especially for Codex-based workflows.
For everyone watching AI evolve, the bigger takeaway is this: the ability to autonomously plan, execute, and self-check across multi-step tasks is now standard at the frontier. GPT-5.5 isn't the only model doing it — but it's one of the best-executed versions so far.
Your AI Receptionist, Live in Minutes.
Scale your front desk with an AI that never sleeps. Solvea handles unlimited multi-channel inquiries, books appointments into your calendar automatically, and ensures zero missed opportunities around the clock.
Frequently Asked Questions
What is GPT-5.5 and when was it released?
GPT-5.5 is OpenAI's latest large language model, codenamed "Spud," released on April 23, 2026. It's built for agentic tasks — multi-step workflows involving planning, tool use, and autonomous execution — rather than single-turn conversation.
How much does GPT-5.5 cost via the API?
GPT-5.5 is priced at $5 per 1M input tokens and $30 per 1M output tokens. GPT-5.5 Pro costs $30 per 1M input tokens and $180 per 1M output tokens. Batch and Flex pricing are available at half the standard rate, and Priority processing at 2.5× standard.
Is GPT-5.5 better than Claude Opus 4.7?
It depends on the task. GPT-5.5 outperforms Claude on well-defined agentic coding benchmarks (Terminal-Bench 82.7%). Claude Opus 4.7 is generally preferred for tasks requiring intent inference or ambiguous instructions. Tom's Guide found Claude winning in seven head-to-head categories, while praising GPT-5.5's speed. Most power users run both.
What is the GPT-5.5 "Spud" codename?
"Spud" is the internal project name OpenAI used during development of GPT-5.5's base model. Codenames are common practice at OpenAI and don't carry product significance after launch.
Can I use GPT-5.5 for free?
GPT-5.5 is available on ChatGPT paid plans starting with Plus ($20/month). It is not available on the free tier. API access requires a billing-enabled OpenAI account.
What is the difference between GPT-5.5 and GPT-5.5 Pro?
GPT-5.5 Pro offers higher accuracy for complex scientific and enterprise-grade tasks. It costs significantly more ($30/$180 per 1M tokens vs. $5/$30) and is limited to Pro, Business, and Enterprise plans. For most developers, standard GPT-5.5 is the right starting point.






