Deep Thinking in Claude Code
When to use extended thinking, how ultrathink works, and which tasks actually benefit from longer reasoning time.
--- title: How Claude Code's extended thinking actually works description: Extended thinking is on by default in Claude Code. Here's how to control it, when to push it harder, and when to leave it alone. author: FractionalSkill ---
How Claude Code's extended thinking actually works
Most operators assume Claude Code works like a chatbot -- you send a prompt, it generates a response. The mental model is linear: prompt in, answer out.
The real process has an extra step most people never see.
Before responding to anything, Claude Code performs internal reasoning. It works through the problem, considers approaches, checks for gaps, and then writes the response. That reasoning phase is extended thinking. It runs by default on every session. You don't have to turn it on.
What you can control is how deep that reasoning goes -- and when to push it much harder than the default.
What extended thinking is and what it costs you
Extended thinking is Claude's internal scratchpad. Before it writes a single word of its response, it reasons through the problem in a separate pass. That reasoning is what makes it possible for Claude to catch mistakes before they appear in the output, weigh tradeoffs between approaches, and produce a plan that actually holds together.
On current models (Opus 4.6 and Sonnet 4.6), this reasoning is adaptive. Claude allocates more or less thinking based on how complex the task appears. A simple question about a file gets light thinking. A multi-step architectural decision gets more. The unit Anthropic uses to measure this reasoning work is "thinking tokens" -- essentially the computational cost of that internal reasoning pass, billed separately from the response you read.
You're charged for those thinking tokens whether you see them or not. That's the tradeoff worth knowing upfront.
By default, thinking appears as a collapsed stub in the UI. You can see the full internal reasoning by pressing Ctrl+O to toggle verbose mode -- the thinking shows up as gray italic text above Claude's response. For most sessions, you won't need to read it. But when Claude produces something unexpected, opening verbose mode lets you trace exactly where its reasoning went wrong.
> What you're actually paying for: Thinking tokens are billed even when the summary is collapsed. If you're running long sessions on complex work, those tokens add up. You can limit the budget with the MAXTHINKINGTOKENS environment variable, or disable thinking entirely with Option+T (macOS) or Alt+T (Windows/Linux).
The one keyword that actually changes reasoning depth
Here's a distinction the official documentation makes clearly, and that most people using Claude Code get wrong.
Phrases like "think hard," "think more carefully," or "reason through this" do not change how much thinking Claude allocates. They're read as regular prompt instructions -- natural language that Claude interprets, not a mechanism that adjusts the thinking token budget. Claude will sound more deliberate, but it isn't actually doing more reasoning work under the hood.
The word that does work is: ultrathink
Include ultrathink anywhere in your prompt, and Claude Code sets its effort level to high for that turn. On Opus 4.6 and Sonnet 4.6, that means the model dynamically allocates significantly more thinking tokens before it responds. It's documented in the official Claude Code reference as a single-turn override that doesn't permanently change your effort setting.
You: ultrathink a migration plan for moving our client reporting
system from manual spreadsheets to automated data pulls.
Current state: 4 clients, 3 different Excel templates,
one shared Google Sheet with version conflicts.
This is different from asking Claude to "think carefully." The ultrathink keyword is mechanically recognized -- it changes resource allocation, not just tone.
The practical difference shows up on hard problems. Complex architectural decisions, debugging sessions where previous attempts have failed, and multi-client workflow designs where the wrong choice creates rework across several engagements are all situations where the deeper reasoning pass produces meaningfully better output.
> Use ultrathink selectively. It costs more and takes longer. For routine tasks -- drafting a client update, reformatting a document, running a search -- default thinking is appropriate. Save the keyword for decisions that are hard to reverse.
How to configure thinking depth across a session
Beyond the single-turn ultrathink override, Claude Code gives you several ways to tune reasoning depth across an entire session or globally.
Effort levels control thinking depth for the current session on Opus 4.6 and Sonnet 4.6. Run /effort inside a session to adjust, or set the CLAUDECODEEFFORT_LEVEL environment variable before launching.
| Effort level | What it does | When to deploy it |
|---|---|---|
| Low | Minimal thinking allocation | Drafts, reformatting, low-stakes queries |
| Default | Adaptive reasoning | Most everyday work |
| High | Maximum thinking allocation | Architecture, planning, complex debugging |
/config lets you toggle thinking mode globally. If you disable thinking here, it stays off until you re-enable it. This persists in ~/.claude/settings.json as alwaysThinkingEnabled.
showThinkingSummaries: true in your project's settings.json expands the thinking display by default, so you see the full reasoning in every session without having to toggle verbose mode manually. Useful when you're debugging a complex engagement workflow and want to track Claude's reasoning at each step.
For multi-client work, the most practical setup is to leave effort at default, use ultrathink for the moments that genuinely warrant deep reasoning, and keep verbose mode off until something goes wrong.
Pairing extended thinking with plan mode
Extended thinking compounds when combined with plan mode. This is where the feature earns its place in a fractional workflow.
Plan mode (activated with Shift+Tab twice, or --permission-mode plan at startup) prevents Claude from writing or editing any files. It can only read and reason. When you add ultrathink to a prompt in plan mode, Claude spends its full reasoning capacity producing a plan -- no file edits, no premature execution, no assumptions written into your project before you've reviewed them.
For client engagements where a wrong architectural decision creates days of cleanup, this two-pass workflow is worth the extra time:
1. Start in plan mode 2. Prompt with ultrathink and a specific, bounded problem 3. Read the plan 4. Adjust it before Claude touches a single file 5. Switch to regular mode to execute
You: ultrathink a plan to consolidate our three client onboarding
workflows into one repeatable process. Each client currently
has separate folders, separate templates, and different
naming conventions. We want one structure that works for all.
Claude will read your file structure, reason through the consolidation, and produce a written plan with tradeoffs before making any changes. The ultrathink keyword means it won't take shortcuts in that reasoning phase.
> For new or complex engagements: Run plan mode with ultrathink on the first session when you're picking up an unfamiliar codebase or designing a new workflow. You want the full reasoning pass before Claude starts writing anything.
What to check when thinking doesn't help
Extended thinking improves output on hard problems. It doesn't fix problems caused by missing context.
If Claude produces a weak plan even after ultrathink, the issue is usually one of these:
- The prompt is underspecified. Claude can reason deeply on a vague question and still produce a vague answer. Give it the constraints that matter: current state, desired outcome, the specific constraints your client has.
- The right files aren't in scope. Claude's thinking operates on what it can read. Use
@filenameto pull specific files into the conversation before asking for a plan. - The effort level was overridden. Check whether
MAXTHINKINGTOKENSis set too low in your environment, or whether someone disabled thinking in the global config. Run/configto verify the current state.
The reasoning capacity is only as useful as the inputs it has to work with. Extended thinking amplifies good context. It doesn't substitute for it.