Guides Prompt Engineering

XML Prompting: Structured Output for Complex Tasks

When and why to deploy XML as an AI output format. Covers tag structure, CDATA wrapping, and the code changes prompt pattern.

8 min read

XML prompt engineering structured AI output XML XML code generation AI agent output

What XML prompting is and why it exists

Most operators who have worked with JSON output prompting h…

XML basics for non-developers

XML follows one core pattern: everything lives inside named…

Anatomy of an XML prompt

Effective XML prompts follow a three-part structure: task d…

When XML beats JSON

This isn't about preference. It's about what your output co…

Model requirements for XML

Not every model can reliably produce well-formed XML, espec…

What XML prompting is and why it exists

Most operators who have worked with JSON output prompting hit a wall at the same spot: the task requires deeply nested structures, multiple files worth of code, or output that mixes content with metadata. JSON handles flat data extraction well. But when your output needs to represent hierarchical relationships and feed directly into automated parsing, you need a different format.

XML (Extensible Markup Language) is a structured output format where you define your own tag names. Unlike JSON's generic keys and brackets, XML uses opening and closing tags that you name to match their purpose. A tag called path> tells both the model and your parser exactly what that data represents. A tag called changes> wraps related output in a named container that can be parsed, validated, and routed programmatically.

The practical difference for client engagements. When you're prompting a model to generate code across multiple files, track which files changed, or orchestrate multi-step agent workflows, XML gives you explicit semantic structure that survives complex nesting. JSON at the same depth becomes a maze of brackets where a single missing comma breaks the entire document.

Here is the key distinction: JSON is a data exchange format. XML is a document markup format. That difference matters when your output looks more like a structured document (code changes, agent task plans, nested configuration) than a data record (contacts, flights, inventory).

When to reach for XML: Your AI output contains code that needs to be extracted and written to files. You need to track operations (create, update, delete) alongside file content. Your output has three or more levels of nesting. The output will be parsed by an automated system, not read by a human.

XML basics for non-developers

XML follows one core pattern: everything lives inside named tags that open and close. If you've ever looked at HTML, you already recognize the structure.

<client_report>
  <client_name>Acme Corp</client_name>
  <quarter>Q1 2026</quarter>
  <status>On track</status>
</client_report>

Three rules govern all XML:

Every opening tag needs a closing tag. must be followed by . Miss the closing tag and the entire document breaks.
Tags nest inside each other. A report> tag can contain name>, , and tags. The inner tags must close before the outer tag closes.
Tag names are semantic. You pick names that describe the content. path> is better than . changes> is better than . The model reads these names as instructions for what to put inside them.

That's the entire syntax you need. You don't need to learn XML schemas, DTDs, or namespaces. You need to understand the open-nest-close pattern well enough to write a template the model can fill in.

Attributes add metadata without adding nesting. A tag can carry extra information inside its opening bracket:

<file file_operation="UPDATE">
  <file_path>src/components/Dashboard.tsx</file_path>
  <file_code>// updated code here</file_code>
</file>

The file_operation="UPDATE" attribute tells your parser what action to take without requiring a separate nested tag. JSON can't do this -- you'd need another key-value pair at the same nesting level.

Anatomy of an XML prompt

Effective XML prompts follow a three-part structure: task description, output rules as bullet points, then the XML template. Here is a real prompt for tracking code changes across multiple files:

You are a senior software engineer. Review the following codebase
and implement the requested feature. Output all changed files
in the XML format specified below.

At the end of your response, include the following XML section
with your code changes:

- Always output the full file content, never use placeholders
  or partial code
- Enclose the entire XML block in a markdown code block
- Include ALL changed files in a single <code_changes> block
- Use file paths relative to the project root
- Wrap all code content in CDATA sections

<code_changes>
  <changed_files>
    <file>
      <file_operation>CREATE | UPDATE | DELETE</file_operation>
      <file_path>path/to/file.ext</file_path>
      <file_code><![CDATA[
        // Full file content goes here
      ]]></file_code>
    </file>
  </changed_files>
</code_changes>

Each layer of this prompt does specific work.

Prompt Layer	Purpose	What Breaks Without It
Role assignment	Sets engineering precision context	Model writes explanatory prose instead of code
Bullet-point rules	Constrains output format	Partial code snippets, missing files, broken paths
XML template	Shows exact tag structure	Model invents its own tag names, inconsistent output
CDATA wrapper	Protects code from parsing conflicts	Angle brackets in code break the XML structure

Why CDATA matters. Code is full of characters that XML treats as special syntax. An angle bracket < in a JavaScript comparison looks like an opening XML tag to a parser. CDATA sections tell the parser: "Everything between and ]]> is raw content. Don't parse it." Without CDATA, any code containing <, >, or & corrupts your XML output and breaks automated parsing.


The bullet-point rules are where most prompts fall apart. "Always output the full file content" stops the model from writing // ... rest of file unchanged. "Use relative file paths" ensures your automation maps files to the correct locations. These aren't suggestions. They're constraints.
When XML beats JSON
This isn't about preference. It's about what your output contains.
Deploy XML when:

Code generation with file tracking. Any task where the model produces code that your automation needs to extract and write to specific file paths. The CDATA wrapper alone makes XML the right choice -- JSON has no equivalent for safely enclosing arbitrary code content.
Deeply nested hierarchies. When your output has four or more levels of nesting, XML's explicit open/close tags keep the structure visible. JSON at that depth becomes a maze of brackets where a single missing comma breaks everything.
Mixed content and metadata. XML attributes let you attach operation types or status flags directly to a tag without adding nesting depth. A  tag carries two metadata values that JSON would require as sibling keys.
Agent orchestration. When one AI agent's output becomes another agent's input, XML's named tags create natural parsing boundaries. Each agent extracts exactly the section it needs by tag name.

Stick with JSON when:

Your output is flat or lightly nested (two levels or fewer)
You're extracting structured data records (contacts, inventory, transactions)
The downstream consumer is a REST API, webhook, or automation platform like Zapier
You need the smallest possible token count in the response

Factor XML Wins JSON Wins
Code content in output CDATA prevents conflicts No safe wrapper for code
Nesting depth 4+ Explicit open/close tags Bracket matching becomes fragile
Attributes on elements Built into syntax Requires extra keys
Flat data extraction Verbose for simple records Lighter, fewer tokens
API/webhook integration Requires conversion step Native format for most APIs
Token efficiency More verbose 20-40% fewer tokens
If your output contains code or deeply nested document structures, use XML. If your output contains data records, use JSON.
Model requirements for XML
Not every model can reliably produce well-formed XML, especially with nested CDATA sections and consistent tag names across long outputs.
Reasoning-class models handle XML best. Claude 3.5 Sonnet and above, GPT-4 class models, and reasoning models (o1, o3) produce the most reliable XML output. They maintain tag consistency across long responses, correctly nest CDATA sections, and follow templates without drifting.
Mid-tier models work for simple XML. Claude 3.5 Haiku, GPT-4o mini, and similar models produce valid XML for straightforward templates with two or three nesting levels. They struggle when the template requires CDATA inside deeply nested structures or when output spans hundreds of lines.
Small or local models are unreliable for XML. Models under 13B parameters frequently produce malformed XML: unclosed tags, broken CDATA sections, invented tag names. If you're running local models, stick to JSON.
Model Tier XML Reliability Best For
Reasoning-class (Claude 3.5 Sonnet+, GPT-4, o1/o3) High Complex templates, code generation, multi-file output
Mid-tier (Claude Haiku, GPT-4o mini) Moderate Simple templates, 2-3 nesting levels, short output
Small/local (under 13B parameters) Low Not recommended for XML tasks
Practical guidance for client engagements. When deploying XML prompts in production workflows, use the most capable model available for the generation step. The cost difference between tiers is negligible compared to debugging malformed output in an automated pipeline. If budget is a constraint, run a cheaper model for analysis and planning, then route final XML generation to a reasoning-class model.
XML also uses 20-40% more tokens than JSON for the same content. A model with a 200K token context handles this without issue. A model capped at 8K tokens may cut your XML mid-tag, producing output your parser cannot read.
Part 2 of this series covers XML output templates, validation patterns, and how to build a reusable prompt library so your team stops rebuilding the same templates for every new client engagement.

Factor	XML Wins	JSON Wins
Code content in output	CDATA prevents conflicts	No safe wrapper for code
Nesting depth 4+	Explicit open/close tags	Bracket matching becomes fragile
Attributes on elements	Built into syntax	Requires extra keys
Flat data extraction	Verbose for simple records	Lighter, fewer tokens
API/webhook integration	Requires conversion step	Native format for most APIs
Token efficiency	More verbose	20-40% fewer tokens

Model Tier	XML Reliability	Best For
Reasoning-class (Claude 3.5 Sonnet+, GPT-4, o1/o3)	High	Complex templates, code generation, multi-file output
Mid-tier (Claude Haiku, GPT-4o mini)	Moderate	Simple templates, 2-3 nesting levels, short output
Small/local (under 13B parameters)	Low	Not recommended for XML tasks



        
          
                        In this guide
            
                            
                
                  01
                  What XML prompting is and why it exists                
              
                            
                
                  02
                  XML basics for non-developers                
              
                            
                
                  03
                  Anatomy of an XML prompt                
              
                            
                
                  04
                  When XML beats JSON                
              
                            
                
                  05
                  Model requirements for XML                
              
                          
            
            
              Get new guides first
              Join 122+ fractionals and operators getting the best AI workflows every week.

What XML prompting is and why it exists

XML basics for non-developers

Anatomy of an XML prompt

When XML beats JSON

Model requirements for XML

Ready to Start Building?

Get New Guides First