OptimizingClaudeCode:Karpathy's10xAgenticWorkflowGuide
10x your Claude Code reliability with Andrej Karpathy's structured prompting. This guide details setup, agentic workflows, tool use, and self-correction for advanced AI development.


📋 At a Glance
- Difficulty: Advanced
- Time required: 45-90 minutes (for initial setup and understanding core concepts)
- Prerequisites: Python 3.9+, an active Anthropic API key, fundamental understanding of LLMs and API interaction, basic Python development experience.
- Works on: OS-agnostic (Python library compatible with Windows, macOS, Linux).
How Do Karpathy's Principles Enhance Claude Agentic Workflows?
Andrej Karpathy's principles fundamentally transform how developers interact with LLMs, moving beyond simple conversational prompts to establish a robust, almost programmatic interface for agentic systems. By imposing strict structure and explicit instructions, these methods enable Claude to operate with greater determinism, reduced hallucination, and improved self-correction, leading to a significant increase in the reliability and efficiency of AI-driven code generation and task execution. This shift is critical for building production-grade AI agents that can consistently perform complex, multi-step operations without constant human oversight.
Karpathy's approach centers on making the LLM's internal "thought process" explicit and observable, effectively turning the model into a transparent, debuggable state machine. This is achieved by segmenting the prompt into distinct, machine-readable components using XML-like tags. Instead of merely asking the model to perform a task, you instruct it on how to think, what tools to use, and how to present its final output. This structured dialogue minimizes ambiguity, forces the model to reason step-by-step, and provides clear checkpoints for validation and self-correction, which are paramount for agentic reliability.
How Do I Structure Prompts for Optimal Claude Code Performance?
Structuring prompts with specific XML-like tags is the cornerstone of Karpathy's method, enforcing a clear, parseable communication protocol between your application and Claude. This explicit segmentation of instructions, reasoning, tool definitions, and desired output dramatically improves the model's ability to follow complex directives, reduces the likelihood of generating irrelevant information, and makes its internal decision-making process transparent and debuggable. By standardizing the input and output formats, you enable programmatic parsing of Claude's responses, facilitating automated tool execution and iterative self-correction loops.
The core idea is to break down the prompt into distinct functional blocks, each delimited by a unique XML tag. This is superior to generic markdown or free-form text because it creates unambiguous boundaries that Claude is highly trained to respect, especially newer models like Claude 3.5 Sonnet or Opus. This strict structure allows your application to reliably extract specific pieces of information (e.g., the code to execute, the agent's internal thoughts, the final answer) without relying on fuzzy pattern matching.
Step 1: Define the System Prompt for Agentic Behavior
What: Create a system role message that establishes the agent's persona, overall goal, and the strict communication protocol it must follow, including the specific XML tags for its internal reasoning and output.
Why: The system prompt is the guiding contract for the agent. It dictates its fundamental behavior, constraints, and the expected format of its responses. Explicitly defining the XML tags here ensures Claude understands and adheres to the structured output requirements from the outset.
How: Construct a system prompt that includes a clear persona, task instructions, and the mandatory output structure. For agentic tasks, Karpathy often advocates for tags like <tool_code>, <scratchpad>, <thought>, <tool_code_output>, and <final_answer>.
# Python
system_prompt_template = """
You are a highly capable AI assistant specializing in Python code generation and execution.
Your primary goal is to solve complex programming tasks by thinking step-by-step,
using provided tools, and self-correcting based on tool outputs.
You operate within a strict XML-like communication protocol.
Always enclose your internal reasoning in `<thought>` tags.
When you decide to execute code, wrap it in `<tool_code>` tags.
The output of any executed tool code will be provided to you within `<tool_code_output>` tags.
If you need to reflect on tool output or plan your next step, use `<scratchpad>` tags.
Once you have arrived at the final answer or solution, present it within `<final_answer>` tags.
Your process should always be:
1. `<thought>`: Analyze the request and formulate a plan.
2. `<tool_code>`: Write and execute Python code if necessary for the plan.
3. `<tool_code_output>`: Observe the output from the tool execution.
4. `<scratchpad>`: Reflect on the tool output, refine the plan, or identify errors.
5. Repeat steps 2-4 if further tool use or refinement is needed.
6. `<final_answer>`: Provide the complete solution.
Do not deviate from this XML structure. Ensure all tags are properly closed.
"""
Verify: No direct verification for the prompt itself, but its effectiveness will be seen in subsequent steps when Claude generates structured responses.
Step 2: Craft User Prompts with Task Specifications
What: Provide the specific task or problem description within the user role message, ensuring it aligns with the agent's defined capabilities and the expected structured workflow.
Why: The user prompt gives the agent its immediate objective. While the system prompt defines how the agent works, the user prompt defines what it should work on. Keeping it concise and focused on the task prevents ambiguity.
How: Formulate your task clearly, detailing inputs, constraints, and desired outcomes.
# Python
user_task = "Calculate the 10th Fibonacci number using Python code, then print the result."
Verify: The clarity of the task will be reflected in Claude's initial <thought> and subsequent actions.
Step 3: Implement the Claude API Call with Structured Messaging
What: Send both the system and user prompts to the Anthropic API, instructing Claude to generate a response following the defined structure.
Why: This is the actual interaction with the LLM. Using the messages array format with system and user roles is standard for Anthropic's API and crucial for consistent agent behavior.
How: Utilize the anthropic Python client to send the messages. Specify the model (e.g., claude-3-5-sonnet-20240620 or a plausible 2026 version) and a high max_tokens to allow for full reasoning and code generation.
# Python
import os
from anthropic import Anthropic
# Ensure your Anthropic API key is set as an environment variable
# ANTHROPIC_API_KEY="sk-..."
# > ⚠️ Warning: Replace 'claude-3-5-sonnet-20240620' with the most current and capable Claude model available in 2026.
# > The specific version string may vary.
MODEL_NAME = "claude-3-5-sonnet-20240620" # Placeholder for a capable 2026 model
def call_claude_agent(system_prompt: str, user_prompt: str, client: Anthropic) -> str:
response = client.messages.create(
model=MODEL_NAME,
max_tokens=2000, # Sufficient tokens for complex reasoning and code
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
]
)
return response.content[0].text
# Example client initialization (assuming ANTHROPIC_API_KEY is set)
client = Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
# Initial call
raw_response = call_claude_agent(system_prompt_template, user_task, client)
print(raw_response)
Verify: Examine the raw_response. You should see Claude's output strictly adhering to the XML tags defined in the system prompt.
> ✅ Expected Output (truncated example):
<thought>
I need to calculate the 10th Fibonacci number. I will write a Python function to do this and then execute it.
</thought>
<tool_code>
def fibonacci(n):
a, b = 0, 1
for _ in range(n):
a, b = b, a + b
return a
print(fibonacci(10))
</tool_code>
How Do I Implement Tool Use and Self-Correction with Claude?
Integrating tool use and self-correction mechanisms is vital for building truly autonomous and robust AI agents that can overcome their own limitations and interact with the real world. Tool use allows Claude to execute external code, query databases, access APIs, or perform calculations beyond its inherent capabilities. Self-correction, guided by the structured prompt, enables the agent to analyze tool outputs, identify errors, and iteratively refine its approach without human intervention, significantly enhancing reliability and reducing the need for manual debugging.
The ability for Claude to generate code, execute it, and then interpret the results is a powerful agentic loop. Karpathy's method formalizes this loop through explicit prompt structuring. When Claude generates <tool_code>, your application executes it. The output of this execution is then fed back to Claude within <tool_code_output> tags, prompting the model to reflect in <scratchpad> or <thought> tags and decide on its next action. This feedback loop is the essence of self-correction.
Step 1: Parse Claude's Structured Output
What: Extract specific content (e.g., <tool_code>, <final_answer>) from Claude's XML-formatted response using a robust parsing method.
Why: To programmatically act on Claude's instructions, you need to reliably extract the code it wants to run or the final answer it has produced. Regular expressions or an XML parser are suitable for this.
How: Use Python's re module to extract content between specific tags.
# Python
import re
def extract_tag_content(response_text: str, tag_name: str) -> str | None:
# Use a non-greedy match for the content within the tag
match = re.search(rf"<{tag_name}>(.*?)</{tag_name}>", response_text, re.DOTALL)
if match:
return match.group(1).strip()
return None
# Assuming raw_response contains the XML-like output from Claude
thought = extract_tag_content(raw_response, "thought")
tool_code = extract_tag_content(raw_response, "tool_code")
final_answer = extract_tag_content(raw_response, "final_answer")
if thought:
print(f"Agent's Thought:\n{thought}\n")
if tool_code:
print(f"Code to execute:\n{tool_code}\n")
if final_answer:
print(f"Final Answer:\n{final_answer}\n")
Verify: Print the extracted components to confirm they match the content within the respective tags in Claude's raw output.
Step 2: Implement a Code Execution Environment (Tool)
What: Create a safe and isolated environment to execute the Python code generated by Claude. This acts as the "tool" Claude uses.
Why: Directly executing arbitrary code from an LLM in your main application is a significant security risk. A sandboxed environment prevents malicious or erroneous code from affecting your system.
How: A simple approach for demonstration is to use exec() within a controlled scope, but for production, consider dedicated sandboxing libraries (e.g., restrictedpython, docker containers, or serverless functions). For this guide, we'll use a basic exec with output capture.
# Python
import io
import sys
def execute_python_code(code_string: str) -> str:
old_stdout = sys.stdout
redirected_output = io.StringIO()
sys.stdout = redirected_output
try:
# Define a dictionary for the execution environment
# This isolates the executed code from the rest of your program's scope
exec_globals = {}
exec(code_string, exec_globals)
output = redirected_output.getvalue()
except Exception as e:
output = f"Execution Error: {e}"
finally:
sys.stdout = old_stdout # Restore stdout
return output
# Example usage (assuming tool_code was extracted)
if tool_code:
tool_output = execute_python_code(tool_code)
print(f"Tool Code Output:\n{tool_output}\n")
else:
tool_output = "No tool code provided by agent."
Verify: Run the execute_python_code function with a simple print("Hello") and confirm "Hello" is captured in tool_output. Then, run it with the Fibonacci code generated by Claude and verify the correct number is printed.
> ✅ Expected Output for Fibonacci code:
Tool Code Output:
55
Step 3: Implement the Self-Correction Loop
What: Feed the tool_code_output back to Claude as part of the conversation history, allowing it to analyze the results and decide on its next action, potentially generating more code or a final answer.
Why: This feedback loop is where self-correction happens. Claude receives concrete evidence of its code's performance and can adjust its plan if errors occurred or if further steps are needed. This iterative process is crucial for handling complex tasks reliably.
How: Append the tool output to the messages array, wrapped in the <tool_code_output> tag, and call the API again. Continue this loop until Claude provides a <final_answer>.
# Python
def run_agentic_workflow(system_prompt: str, initial_user_prompt: str, client: Anthropic, max_iterations: int = 5) -> str:
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": initial_user_prompt}
]
for i in range(max_iterations):
print(f"\n--- Agent Turn {i+1} ---")
try:
response = client.messages.create(
model=MODEL_NAME,
max_tokens=2000,
messages=messages
)
claude_response_text = response.content[0].text
print(f"Claude's Response:\n{claude_response_text}")
thought = extract_tag_content(claude_response_text, "thought")
tool_code = extract_tag_content(claude_response_text, "tool_code")
scratchpad = extract_tag_content(claude_response_text, "scratchpad")
final_answer = extract_tag_content(claude_response_text, "final_answer")
if final_answer:
print(f"\n--- FINAL ANSWER ---")
return final_answer
elif tool_code:
print(f"Executing tool code...")
tool_output = execute_python_code(tool_code)
print(f"Tool Output:\n{tool_output}")
# Append tool output to messages for Claude's next turn
messages.append({"role": "assistant", "content": claude_response_text})
messages.append({"role": "user", "content": f"<tool_code_output>{tool_output}</tool_code_output>"})
elif scratchpad:
print(f"Agent is reflecting in scratchpad...")
# Append scratchpad content to messages for Claude's next turn
messages.append({"role": "assistant", "content": claude_response_text})
# No new user input needed, Claude just continues based on its scratchpad
else:
print("Agent did not provide tool code or final answer. Continuing...")
messages.append({"role": "assistant", "content": claude_response_text})
except Exception as e:
print(f"API Call or Processing Error: {e}")
messages.append({"role": "assistant", "content": f"<error>An error occurred: {e}</error>"})
messages.append({"role": "user", "content": "An error occurred during processing. Please review and try again."})
return "Agent failed to provide a final answer within the maximum iterations."
# Run the full workflow
final_result = run_agentic_workflow(system_prompt_template, user_task, client)
print(f"\nFinal Result from Agent: {final_result}")
Verify: The run_agentic_workflow function should execute, showing multiple turns of Claude generating code, your system executing it, and Claude processing the output, eventually leading to a <final_answer> with the correct Fibonacci number.
> ✅ Expected Final Output:
Final Result from Agent: The 10th Fibonacci number is 55.
What Are the Best Practices for Managing State and Context in Claude Agents?
Effectively managing state and context is paramount for building sophisticated Claude agents that can handle multi-step tasks, maintain coherence over extended interactions, and avoid context window limitations. As agents perform complex operations, they accumulate information from previous turns, tool outputs, and external data sources. Without proper context management, the agent can "forget" crucial details, exceed the model's token limit, or produce inconsistent results. This involves strategies for summarizing past interactions, retrieving relevant information from memory, and strategically injecting context into subsequent prompts.
The Karpathy method, by making the agent's thought process explicit, naturally aids context management. The <scratchpad> and <thought> tags serve as internal memory for the agent, allowing it to summarize and reflect on its current state. However, for truly long-running or data-intensive tasks, external memory systems become necessary.
1. Summarizing Conversation History
What: Periodically summarize older parts of the conversation history to condense information and reduce token usage.
Why: Direct concatenation of all prior messages quickly exhausts the LLM's context window. Summarization distills the essence of past interactions, preserving critical information while discarding verbose details.
How: Use Claude itself to summarize previous turns. When the message history approaches a certain token threshold (e.g., 75% of the model's context window), send the older messages to Claude with a specific system prompt asking it to generate a concise summary. Replace the original messages with this summary.
# Python
def summarize_history(client: Anthropic, conversation_history: list[dict]) -> str:
summary_prompt = """
You are an AI assistant tasked with summarizing conversation history for another AI agent.
Review the provided conversation and extract only the critical information, decisions, and outcomes.
Focus on the overall goal, key steps taken, and any remaining open questions or problems.
Present the summary concisely, preferably in bullet points or a short paragraph.
Do not add new information or conversational filler.
"""
history_content = "\n".join([f"{msg['role']}: {msg['content']}" for msg in conversation_history])
response = client.messages.create(
model=MODEL_NAME, # Use a capable model for summarization
max_tokens=500,
messages=[
{"role": "system", "content": summary_prompt},
{"role": "user", "content": f"<history>{history_content}</history>\n\nProvide a concise summary of the critical points."}
]
)
return response.content[0].text
# Example of integrating summarization into the workflow loop
# (This is a conceptual addition, not directly runnable without a full token counter)
# if calculate_tokens(messages) > TOKEN_THRESHOLD:
# old_messages = messages[:-N] # Keep recent N messages, summarize older ones
# summary = summarize_history(client, old_messages)
# messages = [{"role": "system", "content": system_prompt_template},
# {"role": "user", "content": f"<summary_of_past_conversation>{summary}</summary_of_past_conversation>"},
# *messages[-N:]] # Prepend summary, keep recent messages
Verify: Manually test the summarize_history function with a multi-turn conversation. The output summary should accurately capture the key points without unnecessary detail.
2. External Memory and Retrieval-Augmented Generation (RAG)
What: Store relevant information (e.g., documentation, past code snippets, database schemas) in an external vector database and retrieve it dynamically based on the agent's current task.
Why: LLMs have limited context windows and cannot store all necessary domain-specific knowledge. RAG allows agents to access a vast external knowledge base, ensuring they have the most up-to-date and relevant information without explicitly embedding it in every prompt.
How:
- Embed Documents: Convert your knowledge base documents into vector embeddings using an embedding model (e.g.,
text-embedding-3-small,cohere-embed-v3). - Store in Vector DB: Store these embeddings in a vector database (e.g., Pinecone, Weaviate, ChromaDB, Qdrant).
- Query: When the agent needs information, take its current query or thought, embed it, and query the vector DB for semantically similar documents.
- Inject Context: Include the retrieved relevant document chunks in the agent's prompt, typically within a
<context>or<knowledge>tag.
# Python (Conceptual example, requires vector DB setup)
# from qdrant_client import QdrantClient
# from qdrant_client.http.models import PointStruct, VectorParams, Distance
# from anthropic import Anthropic
# Assume `embedding_model` is an Anthropic embedding client or similar
# Assume `qdrant_client` is initialized and `collection_name` exists
def retrieve_context(query: str, qdrant_client, embedding_model, collection_name: str, top_k: int = 3) -> list[str]:
query_embedding = embedding_model.embed_query(query=query).embedding
search_result = qdrant_client.search(
collection_name=collection_name,
query_vector=query_embedding,
limit=top_k
)
return [hit.payload["text"] for hit in search_result if hit.payload and "text" in hit.payload]
# Modify the agent's prompt to include retrieved context
# (within the run_agentic_workflow loop or before the initial call)
# if agent_needs_external_info:
# context_docs = retrieve_context(thought, qdrant_client, embedding_model, "my_knowledge_base")
# context_str = "\n".join(context_docs)
# messages.append({"role": "user", "content": f"<context>{context_str}</context>\n\nContinue with your task."})
Verify: Test your RAG pipeline independently. Query it with questions relevant to your knowledge base and confirm it returns accurate, concise document chunks. When integrated, observe if the agent uses the provided context in its reasoning.
How Do I Set Up My Development Environment for Claude Code?
Setting up a robust and isolated development environment is the foundational step for working with Claude Code and implementing agentic workflows. A dedicated environment ensures that project dependencies are managed cleanly, avoiding conflicts with other Python projects on your system. It also provides a consistent base for developing, testing, and deploying your Claude-powered applications. Proper API key management is critical for security and authentication with Anthropic's services.
This setup focuses on Python, the primary language for interacting with Anthropic's official client library. It covers virtual environment creation, package installation, and secure API key configuration, which are standard best practices for any professional Python development.
Step 1: Install Python and Create a Virtual Environment
What: Install Python 3.9 or higher and create a virtual environment for your project.
Why: Python is the language of choice for the Anthropic client library. A virtual environment (venv) isolates your project's dependencies, preventing conflicts and ensuring reproducibility.
How:
- Install Python: Download and install Python from python.org if you don't have it. Ensure it's added to your system's PATH.
- Create Project Directory: Navigate to your desired project location in your terminal.
- Create Virtual Environment:
- macOS/Linux:
# Bash / Zsh mkdir claude_agent_project cd claude_agent_project python3.9 -m venv .venv # Use python3.10, python3.11, etc., if preferred - Windows (Command Prompt):
mkdir claude_agent_project cd claude_agent_project py -3.9 -m venv .venv - Windows (PowerShell):
mkdir claude_agent_project cd claude_agent_project py -3.9 -m venv .venv
- macOS/Linux:
Verify: After creation, you should see a .venv directory within your claude_agent_project folder.
Step 2: Activate the Virtual Environment
What: Activate the newly created virtual environment.
Why: Activating the virtual environment ensures that any Python packages you install or scripts you run will use the dependencies within this environment, not your global Python installation.
How:
- macOS/Linux:
# Bash / Zsh source .venv/bin/activate - Windows (Command Prompt):
.venv\Scripts\activate.bat - Windows (PowerShell):
.venv\Scripts\Activate.ps1
Verify: Your terminal prompt should change to include (.venv) or a similar indicator, signifying that the virtual environment is active.
> ✅ Expected Output:
(.venv) user@host:~/claude_agent_project$
Step 3: Install the Anthropic Python Client
What: Install the official Anthropic Python client library within your active virtual environment.
Why: This library provides the necessary functions and classes to interact with the Claude API, including sending messages, handling responses, and managing authentication.
How: Use pip to install the package. Specify a version to ensure consistency, or use ~= for compatible upgrades.
# Bash / Zsh / PowerShell / Cmd (after activating venv)
pip install anthropic~=0.25.0 # Use a plausible stable version for 2026, e.g., 0.25.0 or later
Verify: Run a simple Python command to check the installed version.
# Bash / Zsh / PowerShell / Cmd (after activating venv)
python -c "import anthropic; print(anthropic.__version__)"
> ✅ Expected Output:
0.25.0 # Or whatever version you installed
Step 4: Configure Your Anthropic API Key
What: Securely set your Anthropic API key as an environment variable.
Why: Your API key authenticates your requests to Anthropic's services. Storing it directly in code is insecure and exposes it if your code is shared. Environment variables are a standard and secure way to manage sensitive credentials.
How:
- Obtain API Key: Get your API key from the Anthropic console.
- Set Environment Variable:
- macOS/Linux (for current session):
export ANTHROPIC_API_KEY="sk-ant-api03-..." # Replace with your actual key - Windows (Command Prompt, for current session):
set ANTHROPIC_API_KEY="sk-ant-api03-..." - Windows (PowerShell, for current session):
$env:ANTHROPIC_API_KEY="sk-ant-api03-..." - Persistent (recommended for development): Add the
exportorsetcommand to your shell's profile file (e.g.,~/.bashrc,~/.zshrc,~/.profilefor Linux/macOS, or system environment variables for Windows). Remember tosourceyour profile file after editing. - Using a
.envfile: For local development,python-dotenvis a common choice.
Create apip install python-dotenv.envfile in your project root:
Then, in your Python script:ANTHROPIC_API_KEY="sk-ant-api03-..."import os from dotenv import load_dotenv load_dotenv() # Loads variables from .env file client = Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
- macOS/Linux (for current session):
Verify: In your Python script, attempt to create an Anthropic client instance and make a simple API call (e.g., a basic message creation). If the key is set correctly, the call will succeed.
# Python
import os
from anthropic import Anthropic
# If using .env file
# from dotenv import load_dotenv
# load_dotenv()
try:
client = Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
response = client.messages.create(
model=MODEL_NAME, # Use the specified model name
max_tokens=10,
messages=[{"role": "user", "content": "Hello"}]
)
print(f"> ✅ API key configured correctly. Response start: {response.content[0].text[:20]}...")
except Exception as e:
print(f"> ❌ API key configuration failed or invalid: {e}")
print("Ensure ANTHROPIC_API_KEY environment variable is set correctly.")
When Claude Code with Karpathy's Method Is NOT the Right Choice
While Karpathy's structured prompting method significantly enhances Claude's reliability for agentic workflows, it introduces overhead that makes it unsuitable or suboptimal for certain use cases. Understanding these limitations is crucial for making informed architectural decisions and avoiding unnecessary complexity or cost.
-
Simple, Single-Turn Prompts: For straightforward questions or single-shot content generation where no complex reasoning, tool use, or self-correction is required (e.g., "Summarize this paragraph," "Generate a short poem"), the overhead of defining XML tags for thoughts, scratchpads, and code execution is excessive. A simpler, direct prompt will achieve the desired result with fewer tokens and lower latency. The structured approach adds unnecessary complexity when the task is inherently simple.
-
Latency-Critical, Real-Time Applications: The iterative nature of Karpathy's method, involving multiple turns for reasoning, tool execution, and self-correction, inherently increases latency. Each turn requires an API call, network round trip, and model inference time. For applications demanding near-instantaneous responses (e.g., live chatbots, real-time code suggestions in an IDE, interactive UI generation), this multi-step process can introduce unacceptable delays. Simpler, faster models with less intricate prompting might be preferred, even if they are slightly less robust.
-
Strict Token Budget Constraints: The explicit XML tags and verbose reasoning in
<thought>and<scratchpad>tags consume a significant number of tokens, especially for complex problems. While this improves reliability, it directly translates to higher API costs and quicker hits against context window limits. For applications operating under extremely tight token budgets where every character counts, the verbosity of structured prompting might be prohibitive. In such cases, carefully crafted, concise prompts without explicit meta-tags might be more economical, assuming the task complexity allows for it. -
Models Not Fine-Tuned for Structured Output: While Claude models are generally excellent at adhering to structured output, not all LLMs or older versions are equally adept. Applying this method to models not specifically trained or fine-tuned to respect complex XML-like tags might lead to inconsistent parsing or outright failure to follow the instructions, resulting in a worse experience than a simpler, more direct prompt. Always verify a model's capabilities with structured prompting before fully committing to this methodology.
-
When Output Format Is Strictly Unstructured Text: If the ultimate desired output is purely natural language text without any need for programmatic parsing or further processing (e.g., a creative story, a conversational response), forcing an XML structure on the final output can be counterproductive. While internal reasoning can still be structured, the final delivery should match the requirement, otherwise, an additional parsing/formatting step is needed to strip the tags, adding unnecessary complexity.
#Frequently Asked Questions
What is an agentic workflow in the context of LLMs? An agentic workflow describes a system where an LLM acts as an autonomous "agent" that can reason, plan, execute tools (like code interpreters or external APIs), observe results, and self-correct to achieve a given goal. Unlike a simple chatbot, an agent iterates through a series of steps, making decisions based on its observations and internal state.
How does Claude's tool_use feature compare to function calling in other LLMs?
Claude's tool_use (often integrated into agentic workflows via prompt structure) allows the model to output specific JSON describing a tool call, which your application then executes. This is conceptually similar to "function calling" in models like OpenAI's GPT series, where the model outputs structured data indicating a function to be invoked. The primary difference often lies in the specific JSON schema and how the model is prompted to generate these calls, with Karpathy's method focusing on explicit XML tags to guide Claude's thought process leading up to and after tool use.
My agent gets stuck in a loop, what do I do? Agentic loops often occur when the agent fails to converge on a solution or correctly identify a stopping condition. To mitigate this:
- Set a
max_iterationslimit: Implement a hard stop after a predefined number of turns in yourrun_agentic_workflowfunction. - Improve self-correction prompts: Ensure your system prompt explicitly instructs the agent on how to handle errors, retry, or declare failure. Provide examples of successful completion.
- Explicit
STOPtoken/tag: Instruct Claude to output a specific tag (e.g.,<STOP_SEQUENCE>) when it believes it has completed the task or cannot proceed, allowing your application to break the loop. - Refine tool outputs: Ensure tool outputs are clear and concise, providing unambiguous feedback to the agent. Ambiguous or overly verbose tool outputs can confuse the agent.
#Quick Verification Checklist
- Python 3.9+ is installed and accessible.
- A virtual environment (
.venv) is created and activated for your project. - The
anthropicPython client is installed within the virtual environment. - Your
ANTHROPIC_API_KEYis securely set as an environment variable or via.env. - A basic API call to Claude (e.g.,
client.messages.create) successfully returns a response. - Your system prompt clearly defines the agent's role and the XML-like communication protocol.
- You can successfully extract content from Claude's structured responses (e.g.,
<tool_code>,<final_answer>). - Your code execution environment correctly runs generated Python code and captures its output.
- The agentic workflow loop correctly feeds tool outputs back to Claude for self-correction.
- The agent can successfully complete a simple multi-step task (e.g., Fibonacci calculation) and provide a
<final_answer>.
Related Reading
Last updated: July 29, 2024

Harit
Editor-in-Chief at Lazy Tech Talk. Independent verification, technical accuracy, and zero-bias reporting.
RESPECTS
Submit your respect if this protocol was helpful.
COMMUNICATIONS
No communications recorded in this log.
