Prompt Engineering: The Absolute Basics

What prompting is, how tokens work, context windows, temperature, and pricing. The foundation every operator needs before writing a single prompt.

7 min read
prompt engineering basics what is prompting AI tokens context window temperature AI

Every AI model speaks the same language, and it's yours

The barrier to commanding AI models is not technical skill. It's not knowing Python or JavaScript or any programming language. The barrier is knowing how to write clear instructions in plain English.

That's what prompting is. You write a set of instructions in natural language, and the model executes them. Think of a prompt as a work order you'd hand to a contractor: the clearer and more specific the instructions, the better the result.

The way you construct a prompt changes everything. Not only what the model can do, but how well it does it. Two people can use the same model on the same task and get wildly different outputs. The difference is the prompt.

Here is what that looks like in practice:

You: Help me with this report.

Claude: Sure, I can help! What kind of report are you working on?

Compare that to:

You: Rewrite the executive summary of this Q3 engagement report.
     Make it 150 words or less. Use a confident but neutral tone.
     Focus on the three metrics the client cares about most:
     retention rate, NPS score, and monthly recurring revenue.

Claude: Here's your revised executive summary:

     Q3 delivered measurable progress across all three priority
     metrics. Client retention held at 94%, up from 91% in Q2...

Same model. Same subscription. The second prompt produced a client-ready deliverable. The first produced a follow-up question.

> Prompt engineering is a meta-skill. The better you get at constructing prompts, the more capable every AI model becomes in your hands. It applies to every model, every tool, and every task you'll encounter in this course.

Tokens are how AI models read your words

When you type a prompt, the model doesn't read it the way you do. It breaks your text into chunks called tokens. A token is roughly four characters of English text. Sometimes a single word is one token. Sometimes it's two or three.

The word "birthday" is one token. The sentence "my birthday is this week" is six tokens.

You don't need to count tokens manually. But you do need to understand that tokens are the unit of measurement for three things that directly affect your work:

  • How much information you can feed into a model (context window)
  • How long of a response you can get back (max output tokens)
  • How much you pay (pricing is per token)

Context windows determine how much a model can handle at once. This is the total number of tokens a model can process in a single request, your prompt plus the model's response combined. A bigger context window means you can include more background material, longer documents, and more detailed instructions.

Max output tokens cap the length of each response. If you ask a model to write an entire 50-page report in one shot and the model's output limit is 16,000 tokens, you're not getting 50 pages. You're getting roughly 12,000 words at most, and often less.

ModelContext WindowMax Output Tokens
GPT-4o128,000 tokens16,384 tokens
Claude 3.5 Sonnet200,000 tokens8,192 tokens
Gemini 1.5 Pro1,000,000 tokens8,192 tokens

These numbers change as new model versions ship. The pattern to understand is this: context windows keep growing, and output limits are growing more slowly. You can fit an entire book into some models. You cannot get an entire book back out in one response.

> What this means for operators. If you're working across three client engagements and want to feed a model background documents for each one, context windows matter. A model with 128K tokens can handle roughly 96,000 words of input, enough for most deliverables. A model with 1M tokens can handle an entire project archive.

Temperature controls creativity vs. consistency

Every major AI model has a setting called temperature. You won't see it in ChatGPT or the Claude chat interface, but it appears in tools like OpenAI's Playground, Anthropic's Workbench, and Google's AI Studio. Since this course works inside those tools, it's worth knowing what temperature does.

Temperature controls how predictable the model's output is. The scale typically runs from 0 to 2, though most practical use stays between 0 and 1.

  • Low temperature (0 to 0.3): The model picks the most probable next word almost every time. Outputs are consistent, factual, and repeatable. Run the same prompt twice, you get nearly identical results.
  • High temperature (0.7 to 1.0+): The model introduces more randomness. It picks less obvious words, takes creative risks, varies its phrasing. Outputs are more original but less predictable.

Here is how to think about it for real work:

TaskRecommended TemperatureWhy
Financial analysis0 - 0.2You need consistent, accurate numbers
Client email drafts0.3 - 0.5Professional tone with some natural variation
Brainstorming ideas0.7 - 0.9You want variety and unexpected angles
Creative writing0.8 - 1.0Maximum originality and stylistic range
Code generation0 - 0.2Precision matters more than creativity

For most operator workflows, keep temperature low. When you're producing client deliverables, you want consistency. When a client receives a weekly status report, it should read like the same person wrote it every time. Low temperature gives you that.

Save higher temperature for brainstorming sessions or early-stage ideation where you actually want the model to surprise you.

> The practical rule. If the output needs to be right, use low temperature. If the output needs to be interesting, use high temperature. When in doubt, 0.3 is a safe default for professional work.

How pricing works and why it matters

AI models charge based on token usage. You pay for the tokens going in (your prompt) and the tokens coming out (the model's response). The rates differ by model.

This pricing structure means two things for operators running multiple client engagements:

Longer prompts cost more. If you paste an entire 30-page document into every request, you're burning input tokens. A more targeted approach, pulling out the relevant section and providing specific instructions, keeps costs down and often produces better results.

Bigger models cost more per token. The most capable models charge higher rates. For routine tasks like reformatting notes or generating email drafts, a smaller and cheaper model handles the work fine. Reserve the premium models for tasks that demand stronger reasoning.

ModelInput (per 1M tokens)Output (per 1M tokens)
GPT-4o$2.50$10.00
GPT-4o mini$0.15$0.60
Claude 3.5 Sonnet$3.00$15.00
Claude 3.5 Haiku$0.80$4.00

The price difference between a flagship model and its smaller variant is often 10x or more. That gap adds up fast when you're processing client documents across multiple engagements every week.

The operator's approach to model costs. Match the model to the task. Use cheaper models for formatting, summarization, and first drafts. Switch to premium models when you need deep analysis, complex reasoning, or work that goes directly to the client without editing. This isn't about cutting corners. It's about deploying the right tool for each task.

> Before you worry about costs. Most individual operators won't spend more than $20-50 per month on API usage if they're working through chat interfaces. The pricing math matters more when you start building automated workflows that run hundreds of prompts per day. We'll cover that in later sections.

What to do next

You now have the working vocabulary for everything that follows in this course. Prompts are instructions. Tokens are the unit of measurement. Context windows and output limits define the boundaries. Temperature controls creativity. Pricing scales with usage.

Here is how to put this into practice today:

1. Open the OpenAI Tokenizer and paste in a paragraph from a recent client deliverable. See how many tokens it produces. Get a feel for the relationship between your words and the model's units. 2. Open ChatGPT, Claude, or Gemini and try the before-and-after prompt pattern from the first section. Write a vague prompt, then rewrite it with specific constraints. Compare the outputs. 3. Check the OpenAI Models page, Anthropic Models page, or Gemini Models page and look at the context windows and output limits for two or three models. Start building a mental map of what each model can handle.

The next section walks through the specific tools, like OpenAI's Playground and Anthropic's Workbench, where you'll start building and testing real prompts with full control over every parameter we covered here.

Keep Going

Ready to Start Building?

Pick the next step that matches where you are right now.

Tutorial
Claude Code Basics

Start with the terminal basics. A hands-on, step-by-step guide to your first 10 minutes with Claude Code.

Start the Tutorial
Guide
AI-Powered Workflows

Automate your client work. Learn how to connect AI tools into workflows that handle repetitive tasks for you.

Read the Guide
Community
Join the Community

Connect with other fractional leaders building with AI. Share workflows, get feedback, and learn from operators who are ahead of you.

Apply to Join