Reasoning Models: When Thinking Time Pays Off
How reasoning models differ from regular models, why they need less instruction, and when the extra cost and latency are worth it.
Reasoning models don't want your instructions
Everything you've learned about prompting so far -- assigning a role, structuring step-by-step instructions, specifying output format in detail -- works beautifully with flagship and fast models. Reasoning models throw most of that out the window.
A reasoning model is a distinct category of AI that spends computational resources "thinking" before it responds. Where a flagship model like GPT-4o or Claude Sonnet generates its answer token by token in a straight line, a reasoning model like OpenAI's o1 or o1 Pro pauses, reasons through the problem internally, and then produces a response. That internal reasoning step is what makes these models a different tool entirely.
The catch is that people prompt them wrong. One of the contributors to a widely-shared piece by Latent Space called "o1 isn't a chat model" originally believed o1 Pro was terrible. Not because the model was bad, but because they were prompting it the same way they prompted flagship models. Detailed step-by-step instructions, specific role assignments, rigid formatting constraints. The model fought against all of it.
The reason is straightforward. When you give a reasoning model granular instructions, you're constraining the very thinking process that makes it valuable. You're telling a senior consultant exactly how to do their job instead of telling them what outcome you need.
> The shift in one sentence: With flagship models, you tell the model how to think. With reasoning models, you tell the model what to think about.
How reasoning models actually work
The technical term is "inference-time compute." In practical terms, it means the model is allowed to use computing resources to reason at the moment you send your prompt, not only during training.
Flagship models generate responses immediately. They predict the next word based on patterns learned during training. Fast, efficient, and highly capable for most tasks. But they don't pause to reconsider or check their own logic.
Reasoning models take a different approach. Before generating their visible response, they work through the problem internally. They consider multiple angles, verify their own logic, and catch errors that a flagship model would miss. This is why they're slower and cost more per request -- they're doing real work before you see any output.
Here's what this looks like from your side of the screen:
You send a prompt to o1 Pro...
[Thinking...]
[Thinking...]
[Thinking...]
(30-90 seconds later)
o1 Pro: Here is my analysis...
That waiting period isn't a bug. It's the model reasoning through your problem. The longer and more complex the task, the more thinking time it takes.
| Characteristic | Flagship models (GPT-4o, Sonnet) | Reasoning models (o1, o1 Pro) |
|---|---|---|
| Response time | 2-10 seconds | 30-120 seconds |
| Cost per request | $$ | $$$-$$$$ |
| Prompting style | Detailed instructions | High-level briefs |
| Role assignment | Very effective | Often counterproductive |
| Step-by-step prompts | Improves output | Constrains thinking |
| Best at | Writing, analysis, formatting | Complex logic, math, code |
| Error rate on logic | Moderate | Measurably lower |
> Operator tip: Reasoning models are not better flagship models. They are a different tool for different jobs. Running a weekly status report through o1 Pro is like hiring a structural engineer to hang a picture frame. It will work, but you're burning time and budget for zero additional quality.
Write briefs, not prompts
The single most practical change when working with reasoning models is how you structure what you send them. Stop writing prompts. Start writing briefs.
A brief is closer to what you'd hand a senior consultant or a product manager. You define the goal, provide the relevant context, specify the constraints, and then get out of the way. You don't tell them how to think through the problem. That's what you're paying them for.
Here's the same task written two ways:
Flagship model prompt (detailed instructions):
You are a senior financial analyst. Analyze the following quarterly
revenue data. First, calculate year-over-year growth for each
product line. Then identify which product lines are growing above
15%. Then rank them by growth rate. Present the results in a
markdown table with columns for product line, Q3 revenue, Q3 prior
year revenue, and YoY growth percentage. Use two decimal places for
all percentages.
Reasoning model brief (goals and context):
Analyze this quarterly revenue data. Identify which product lines
are outperforming and which are falling behind. Flag anything that
would matter in a board presentation.
Here is the data:
[paste revenue data]
Take all the time you need.
The flagship prompt is prescriptive. Every step is laid out. That works because flagship models follow instructions linearly and don't reason about why those steps matter.
The reasoning model brief states the goal and provides the raw material. The model figures out the right analytical approach on its own. It will calculate growth rates, spot anomalies, and flag patterns you didn't think to ask about -- because you didn't constrain it to only the steps you listed.
Three elements every reasoning model brief needs:
- The goal. What outcome are you after? Be specific about the deliverable, not the process. "Present a complete plan to solve this problem" is better than listing every step the model should take.
- The context. Dump everything relevant. Background documents, data, constraints, prior decisions. Reasoning models handle large context well and use it to inform their thinking. More raw material means better reasoning.
- Permission to think. This sounds strange, but phrases like "take all the time you need" actually affect how reasoning models allocate their compute budget. Give them room to work.
> What to remove from your prompts: Role assignments ("You are a..."), step-by-step instructions, and rigid output formatting. Reasoning models reason best when they choose their own path through the problem. Add formatting constraints only if the output structure genuinely matters for your deliverable.
When reasoning models earn their cost
Reasoning models are expensive. They're slow. For 80% of the work fractional leaders produce, a flagship model handles the task faster and cheaper with no quality loss. The question isn't whether reasoning models are powerful. It's whether the task on your desk right now actually needs that power.
Use a reasoning model when:
- The task involves multi-step logic where errors compound. Financial modeling, contract analysis with interdependent clauses, or debugging code where a missed condition breaks the entire workflow.
- A flagship model keeps getting the same thing wrong. If you've iterated on a prompt three times and the flagship model still produces logical errors, switch to a reasoning model instead of adding more instructions.
- You need the model to catch things you didn't think to ask about. Reasoning models surface edge cases, contradictions, and patterns that prescriptive prompts would miss entirely.
- The cost of an incorrect output exceeds the cost of the model. A board deck with a calculation error costs you credibility. Spending an extra dollar on a reasoning model for that specific analysis is cheap insurance.
Stick with flagship models when:
- The task is primarily about writing, formatting, or communication. Status reports, client emails, proposals, meeting recaps. Flagship models handle these faster and the output quality is identical.
- You already know the exact steps the model should follow. If you can prescribe the process, a flagship model will execute it faithfully. Reasoning models add nothing here.
- Speed matters. When you need a deliverable in the next five minutes, a 90-second thinking delay per request adds up fast.
Decision framework:
Is the task about WRITING or FORMATTING?
→ Flagship model. Every time.
Is the task about LOGIC, MATH, or DEBUGGING?
→ Test with flagship first.
→ If flagship fails after 2-3 iterations, switch to reasoning.
Is an error in this output genuinely costly?
→ Reasoning model. The insurance is worth it.
> Operator tip: The most common mistake is defaulting to reasoning models because they feel more powerful. Track your actual usage for one week. Most operators find that fewer than 20% of their tasks benefit from reasoning-tier models. Route the other 80% to flagship or fast models and save both time and spend.
Put this into practice this week
The gap between operators who get good AI output and operators who get exceptional output often comes down to model selection. Reasoning models are the sharpest tool in the set, but only for specific problems.
Here's your action plan:
1. Identify one task this week that involves multi-step logic. Financial analysis, a complex client recommendation, or a process that a flagship model keeps getting wrong. 2. Write a brief, not a prompt. State the goal, dump the context, add "take all the time you need" at the end. Skip the role assignment and step-by-step instructions. 3. Compare the output to your flagship result. Run the same task through both a flagship model and a reasoning model. The difference will teach you more about when to use each one than any guide can.
The models keep improving quarter over quarter. The names will change. The specific pricing will shift. But the principle stays fixed: match the model to the task, and adjust your prompting style to fit the model you're using. Reasoning models want your context and your goals. Give them that, and stay out of their way.