2026-04-20

The smallest useful AI agent: 200 lines of Python

The industry is currently obsessed with selling you an "agent platform." Every week, a new startup launches with a glossy landing page, offering a complex architecture of nodes, edges, vector databases, and proprietary orchestration engines. They promise to abstract away the difficulty of building autonomous systems. The reality is far simpler, and frankly, far more boring.

Agent frameworks are overwhelmingly overweight for what they actually accomplish. They obscure the core mechanics of LLM orchestration behind layers of abstraction, making it difficult to debug when things inevitably go wrong. When an agent hallucinates a tool call or gets stuck in a loop, a heavy framework forces you to read through its source code just to understand how it formatted the prompt. You find yourself fighting the framework's abstractions rather than solving your actual problem.

The truth about AI agents is that they are not magic. They are simply while-loops wrapped around an API call.

If you are a developer curious about how agents work, the absolute worst way to learn is by reading the documentation for a massive agent framework. The best way to learn is by building one from scratch. A genuinely useful, autonomous agent that can write code, run terminal commands, and navigate your file system fits comfortably in under 200 lines of zero-dependency Python. Building it will teach you more in an hour than a week of fighting a platform's proprietary syntax.

In this post, we'll write that agent.

What an agent minimally needs

Strip away the marketing, and an autonomous agent requires exactly five components to function:

  1. A Loop: The core engine. It allows the model to act, observe the result, and act again. Without a loop, you just have a chatbot.
  2. A Tool Registry: A way to describe capabilities to the model and map its requests to actual Python functions.
  3. Context Management: A mechanism to prevent the conversation history from exceeding the model's context window.
  4. The Model Call: The actual network request to an LLM provider (like Anthropic, OpenAI, or Google).
  5. A Stopping Condition: A rule to break the loop, either because the task is complete, or because the agent has exceeded its allowed number of steps.

That is it. You do not need a vector database. You do not need LangChain. You do not need a graph abstraction. You just need basic control flow and string manipulation.

Let's implement these five components. We will use Anthropic's Claude 3.5 Sonnet because its tool-calling capabilities are currently exceptional, but the logic applies identically to any modern model. We will use nothing but Python's standard library. No requests, no langchain, no openai SDK. Just pure, unadulterated Python.

The 200-Line Implementation

Here is the complete code. You can copy this, save it as agent.py, set your ANTHROPIC_API_KEY, and immediately start using it to automate tasks on your machine.

import os
import json
import urllib.request
import subprocess
from typing import Dict, List, Any

API_KEY = os.environ.get("ANTHROPIC_API_KEY")
if not API_KEY:
    raise ValueError("ANTHROPIC_API_KEY environment variable is required.")

# ---------------------------------------------------------
# 1. THE MODEL CALL
# ---------------------------------------------------------
def call_llm(messages: List[Dict[str, Any]], system: str, tools: List[Dict[str, Any]]) -> Dict[str, Any]:
    url = "https://api.anthropic.com/v1/messages"
    headers = {
        "x-api-key": API_KEY,
        "anthropic-version": "2023-06-01",
        "content-type": "application/json"
    }
    data = {
        "model": "claude-3-5-sonnet-20241022",
        "max_tokens": 4096,
        "system": system,
        "messages": messages,
        "tools": tools
    }
    req = urllib.request.Request(url, data=json.dumps(data).encode("utf-8"), headers=headers, method="POST")
    try:
        with urllib.request.urlopen(req) as response:
            return json.loads(response.read().decode("utf-8"))
    except urllib.error.HTTPError as e:
        print(f"API Error: {e.read().decode('utf-8')}")
        raise

# ---------------------------------------------------------
# 2. THE TOOL REGISTRY (Implementation)
# ---------------------------------------------------------
def execute_shell(command: str) -> str:
    """Run a command in the shell and return stdout/stderr."""
    try:
        result = subprocess.run(command, shell=True, capture_output=True, text=True, timeout=30)
        output = f"STDOUT:\n{result.stdout}\nSTDERR:\n{result.stderr}"
        return output[:8000] # Prevent massive output from breaking context
    except Exception as e:
        return f"Error executing command: {e}"

def read_file(path: str) -> str:
    """Read a local file."""
    try:
        with open(path, "r") as f:
            return f.read()[:8000]
    except Exception as e:
        return f"Error reading file: {e}"

def search_dir(pattern: str, dir_path: str = ".") -> str:
    """Regex search across files in a directory."""
    import re
    results = []
    try:
        regex = re.compile(pattern)
        for root, _, files in os.walk(dir_path):
            for file in files:
                filepath = os.path.join(root, file)
                try:
                    with open(filepath, "r", encoding="utf-8") as f:
                        for i, line in enumerate(f):
                            if regex.search(line):
                                results.append(f"{filepath}:{i+1}:{line.strip()}")
                                if len(results) > 100:
                                    return "\n".join(results) + "\n...truncated"
                except Exception:
                    pass # Ignore unreadable files (binaries, etc)
        return "\n".join(results) if results else "No matches found."
    except Exception as e:
        return f"Error searching: {e}"

# ---------------------------------------------------------
# 2. THE TOOL REGISTRY (Schema definitions)
# ---------------------------------------------------------
TOOL_DEFINITIONS = [
    {
        "name": "execute_shell",
        "description": "Execute a shell command on the host machine. Returns stdout and stderr.",
        "input_schema": {
            "type": "object",
            "properties": {"command": {"type": "string", "description": "The bash command to run."}},
            "required": ["command"]
        }
    },
    {
        "name": "read_file",
        "description": "Read the contents of a local file.",
        "input_schema": {
            "type": "object",
            "properties": {"path": {"type": "string", "description": "Absolute or relative file path."}},
            "required": ["path"]
        }
    },
    {
        "name": "search_dir",
        "description": "Search for a regex pattern within files in a directory.",
        "input_schema": {
            "type": "object",
            "properties": {
                "pattern": {"type": "string", "description": "Regex pattern to search for."},
                "dir_path": {"type": "string", "description": "Directory to search in."}
            },
            "required": ["pattern"]
        }
    }
]

def dispatch_tool(name: str, args: Dict[str, Any]) -> str:
    if name == "execute_shell": return execute_shell(args["command"])
    elif name == "read_file": return read_file(args["path"])
    elif name == "search_dir": return search_dir(args["pattern"], args.get("dir_path", "."))
    return f"Unknown tool: {name}"

# ---------------------------------------------------------
# 3, 4, 5. CONTEXT, LOOP, AND STOPPING CONDITION
# ---------------------------------------------------------
def run_agent(task: str):
    system_prompt = "You are an autonomous CLI agent. Solve the user's task using the provided tools."
    messages = [{"role": "user", "content": task}]
    
    # 5. Stopping Condition (max 15 iterations)
    for step in range(15): 
        print(f"\n--- Step {step + 1} ---")
        
        # 4. The Model Call
        response = call_llm(messages, system_prompt, TOOL_DEFINITIONS)
        
        message_content = []
        tool_uses = []
        
        for block in response.get("content", []):
            if block["type"] == "text":
                print(f"Thought: {block['text']}")
                message_content.append(block)
            elif block["type"] == "tool_use":
                print(f"Tool Call: {block['name']} with {block['input']}")
                tool_uses.append(block)
                message_content.append(block)
        
        messages.append({"role": "assistant", "content": message_content})
        
        # 5. Stopping Condition (task complete)
        if not tool_uses:
            print("\nTask complete.")
            break
            
        tool_results = []
        for tool in tool_uses:
            result_text = dispatch_tool(tool["name"], tool["input"])
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": tool["id"],
                "content": result_text
            })
            
        messages.append({"role": "user", "content": tool_results})
        
        # 3. Context Management (Simple truncation)
        if len(json.dumps(messages)) > 50000:
            print("\n[Context limit approaching, truncating early history...]")
            # Keep the original task, drop the oldest interactions, keep the recent ones
            messages = [messages[0]] + messages[-6:]

if __name__ == "__main__":
    import sys
    task = sys.argv[1] if len(sys.argv) > 1 else "List the files in the current directory."
    run_agent(task)

At 134 lines of code, this script represents a completely functional, highly capable AI agent. It has the ability to read codebases, write scripts, execute tests, and iteratively fix its own errors.

Breaking down the implementation

The Tool Registry is arguably the most critical part. We give the agent exactly three tools: execute_shell, read_file, and search_dir. This trio is deceptively powerful. With execute_shell, the agent can inherently do anything a human developer can do at a terminal: git operations, npm installs, running linters, or even writing entirely new files via echo or cat. We wrap these in simple Python functions that catch exceptions and return them as strings, allowing the model to observe its own failures.

The Loop is simply a for loop bounded to 15 iterations. Inside, we append the user's prompt to a list of messages, send them to the API, and parse the response. If the model decides to use a tool, we intercept that request, execute the corresponding Python function, and append the result back to the message history as a "user" message. This ping-pong continues until the model responds with plain text instead of a tool call.

Context Management is handled brutally but effectively. As the agent loops, the messages array grows. Since API calls cost money and context windows are finite, we monitor the approximate byte size of the payload. If it exceeds a threshold, we slice the array—preserving the original instructions but discarding middle steps, keeping only the most recent context. While advanced frameworks use complex vector similarity searches here, simple truncation solves 90% of real-world use cases.

A Real Run-Through

What does this look like in practice? Let's give our 134-line agent a real software engineering task.

python agent.py "Find where the database connection is initialized, update the timeout setting from 30s to 60s, and run the test suite."

Step 1: The agent realizes it needs to locate the database connection. Thought: I need to find the database connection logic to update the timeout. Tool Call: search_dir with {"pattern": "timeout"}. Result: Returns several matches, including src/db/connection.ts:15: timeout: 30000.

Step 2: The agent modifies the file. It uses the shell tool because it doesn't have a dedicated "write" tool. Thought: I'll use sed to replace the timeout value in src/db/connection.ts. Tool Call: execute_shell with {"command": "sed -i '' 's/timeout: 30000/timeout: 60000/' src/db/connection.ts"}. Result: STDOUT: STDERR: (Success, no output).

Step 3: The agent runs the test suite. Thought: Now I need to run the test suite to verify the change didn't break anything. Tool Call: execute_shell with {"command": "npm test"}. Result: STDOUT: 42 passing (2s). STDERR:

Step 4: The agent concludes. Thought: The timeout has been updated and the tests pass. The task is complete. (Loop terminates).

In four steps, the script traversed a file system, executed a precise regex replacement, ran a Node.js test suite, and verified its own work. It did this without LangChain, without AutoGen, and without a single node-edge graph visualization.

What it's missing (and why you might not care)

If you read through the code, you will immediately spot the corners that were cut.

No Sandboxing: Our execute_shell tool runs arbitrary commands on your host machine. If the agent decides to run rm -rf /, your files are gone. Production frameworks use Docker containers or specialized sandboxes to isolate agent execution. For personal use, however, this risk is mitigated by simply watching the terminal output, or trusting the model's inherent alignment. (Though, run it at your own risk).

Naive Context Truncation: Simply chopping off the middle of the conversation history works until the agent needs a piece of information from Step 2 during Step 12. More robust systems maintain a rolling summary of discarded context, or utilize an external "Memory" tool that the agent can actively read and write to.

No Multi-Agent Architecture: This script is a solo worker. It cannot spawn a "Reviewer Agent" to check its code, or hand off a complex architectural decision to an "Architect Agent."

These are genuine limitations. However, you will find that for the vast majority of day-to-day developer tasks—refactoring a class, writing a boilerplate test suite, or grepping logs for an error—you do not need a multi-agent system with semantic memory. You just need a while-loop and a shell connection.

Where the framework pays for itself

It is important to acknowledge that agent frameworks exist for a reason.

The 134-line script starts to crack when state becomes incredibly complex. If you are building a product where agents run asynchronously for hours, you need a framework to handle persistent state, so an agent can pause, serialize its state to a database, and resume later. If you are serving thousands of concurrent users, you need robust queueing, rate-limiting, and error-handling that our primitive try/except blocks cannot support.

Frameworks like LangGraph or AutoGen excel when you need deterministic control over non-deterministic systems. When you need to guarantee that an agent must pass its output to a specific validation node before returning to the user, a framework provides the scaffolding to enforce those rules.

But for personal automation, local development workflows, or simply understanding the paradigm, jumping straight into these frameworks is a mistake.

Conclusion

The AI industry has an unfortunate habit of wrapping simple concepts in impenetrable marketing jargon. An "Agent" is not a sentient digital worker. It is a language model enclosed in a loop, given access to functions, and allowed to repeatedly call them until a condition is met.

Before you invest weeks into learning a complex platform, write your own 200-line agent. See how it fails. Watch it hallucinate a bash command, read its own error, and correct itself. Once you understand the raw mechanics of an autonomous loop, you will be in a far better position to evaluate whether you actually need a framework, or if a simple Python script was all you needed all along.