Prompt Engineering: The Absolute Basics
What prompting is, how tokens work, context windows, temperature, and pricing. The foundation every operator needs before writing a single prompt.
Every AI model speaks the same language, and it's yours
The barrier to commanding AI models is not technical skill. It's not knowing Python or JavaScript or any programming language. The barrier is knowing how to write clear instructions in plain English.
That's what prompting is. You write a set of instructions in natural language, and the model executes them. Think of a prompt as a work order you'd hand to a contractor: the clearer and more specific the instructions, the better the result.
The way you construct a prompt changes everything. Not only what the model can do, but how well it does it. Two people can use the same model on the same task and get wildly different outputs. The difference is the prompt.
Here is what that looks like in practice:
You: Help me with this report.
Claude: Sure, I can help! What kind of report are you working on?
Compare that to:
You: Rewrite the executive summary of this Q3 engagement report.
Make it 150 words or less. Use a confident but neutral tone.
Focus on the three metrics the client cares about most:
retention rate, NPS score, and monthly recurring revenue.
Claude: Here's your revised executive summary:
Q3 delivered measurable progress across all three priority
metrics. Client retention held at 94%, up from 91% in Q2...
Same model. Same subscription. The second prompt produced a client-ready deliverable. The first produced a follow-up question.
> Prompt engineering is a meta-skill. The better you get at constructing prompts, the more capable every AI model becomes in your hands. It applies to every model, every tool, and every task you'll encounter in this course.
Tokens are how AI models read your words
When you type a prompt, the model doesn't read it the way you do. It breaks your text into chunks called tokens. A token is roughly four characters of English text. Sometimes a single word is one token. Sometimes it's two or three.
The word "birthday" is one token. The sentence "my birthday is this week" is six tokens.
You don't need to count tokens manually. But you do need to understand that tokens are the unit of measurement for three things that directly affect your work:
- How much information you can feed into a model (context window)
- How long of a response you can get back (max output tokens)
- How much you pay (pricing is per token)
Context windows determine how much a model can handle at once. This is the total number of tokens a model can process in a single request, your prompt plus the model's response combined. A bigger context window means you can include more background material, longer documents, and more detailed instructions.
Max output tokens cap the length of each response. If you ask a model to write an entire 50-page report in one shot and the model's output limit is 16,000 tokens, you're not getting 50 pages. You're getting roughly 12,000 words at most, and often less.
| Model | Context Window | Max Output Tokens |
|---|---|---|
| GPT-4o | 128,000 tokens | 16,384 tokens |
| Claude 3.5 Sonnet | 200,000 tokens | 8,192 tokens |
| Gemini 1.5 Pro | 1,000,000 tokens | 8,192 tokens |
These numbers change as new model versions ship. The pattern to understand is this: context windows keep growing, and output limits are growing more slowly. You can fit an entire book into some models. You cannot get an entire book back out in one response.
> What this means for operators. If you're working across three client engagements and want to feed a model background documents for each one, context windows matter. A model with 128K tokens can handle roughly 96,000 words of input, enough for most deliverables. A model with 1M tokens can handle an entire project archive.
Temperature controls creativity vs. consistency
Every major AI model has a setting called temperature. You won't see it in ChatGPT or the Claude chat interface, but it appears in tools like OpenAI's Playground, Anthropic's Workbench, and Google's AI Studio. Since this course works inside those tools, it's worth knowing what temperature does.
Temperature controls how predictable the model's output is. The scale typically runs from 0 to 2, though most practical use stays between 0 and 1.
- Low temperature (0 to 0.3): The model picks the most probable next word almost every time. Outputs are consistent, factual, and repeatable. Run the same prompt twice, you get nearly identical results.
- High temperature (0.7 to 1.0+): The model introduces more randomness. It picks less obvious words, takes creative risks, varies its phrasing. Outputs are more original but less predictable.
Here is how to think about it for real work:
| Task | Recommended Temperature | Why |
|---|---|---|
| Financial analysis | 0 - 0.2 | You need consistent, accurate numbers |
| Client email drafts | 0.3 - 0.5 | Professional tone with some natural variation |
| Brainstorming ideas | 0.7 - 0.9 | You want variety and unexpected angles |
| Creative writing | 0.8 - 1.0 | Maximum originality and stylistic range |
| Code generation | 0 - 0.2 | Precision matters more than creativity |
For most operator workflows, keep temperature low. When you're producing client deliverables, you want consistency. When a client receives a weekly status report, it should read like the same person wrote it every time. Low temperature gives you that.
Save higher temperature for brainstorming sessions or early-stage ideation where you actually want the model to surprise you.
> The practical rule. If the output needs to be right, use low temperature. If the output needs to be interesting, use high temperature. When in doubt, 0.3 is a safe default for professional work.
How pricing works and why it matters
AI models charge based on token usage. You pay for the tokens going in (your prompt) and the tokens coming out (the model's response). The rates differ by model.
This pricing structure means two things for operators running multiple client engagements:
Longer prompts cost more. If you paste an entire 30-page document into every request, you're burning input tokens. A more targeted approach, pulling out the relevant section and providing specific instructions, keeps costs down and often produces better results.
Bigger models cost more per token. The most capable models charge higher rates. For routine tasks like reformatting notes or generating email drafts, a smaller and cheaper model handles the work fine. Reserve the premium models for tasks that demand stronger reasoning.
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GPT-4o | $2.50 | $10.00 |
| GPT-4o mini | $0.15 | $0.60 |
| Claude 3.5 Sonnet | $3.00 | $15.00 |
| Claude 3.5 Haiku | $0.80 | $4.00 |
The price difference between a flagship model and its smaller variant is often 10x or more. That gap adds up fast when you're processing client documents across multiple engagements every week.
The operator's approach to model costs. Match the model to the task. Use cheaper models for formatting, summarization, and first drafts. Switch to premium models when you need deep analysis, complex reasoning, or work that goes directly to the client without editing. This isn't about cutting corners. It's about deploying the right tool for each task.
> Before you worry about costs. Most individual operators won't spend more than $20-50 per month on API usage if they're working through chat interfaces. The pricing math matters more when you start building automated workflows that run hundreds of prompts per day. We'll cover that in later sections.
What to do next
You now have the working vocabulary for everything that follows in this course. Prompts are instructions. Tokens are the unit of measurement. Context windows and output limits define the boundaries. Temperature controls creativity. Pricing scales with usage.
Here is how to put this into practice today:
1. Open the OpenAI Tokenizer and paste in a paragraph from a recent client deliverable. See how many tokens it produces. Get a feel for the relationship between your words and the model's units. 2. Open ChatGPT, Claude, or Gemini and try the before-and-after prompt pattern from the first section. Write a vague prompt, then rewrite it with specific constraints. Compare the outputs. 3. Check the OpenAI Models page, Anthropic Models page, or Gemini Models page and look at the context windows and output limits for two or three models. Start building a mental map of what each model can handle.
The next section walks through the specific tools, like OpenAI's Playground and Anthropic's Workbench, where you'll start building and testing real prompts with full control over every parameter we covered here.