2026-02-14 · 14 min

The 200-line agent loop that does everything OpenAgents does

OpenAgents, AutoGen, CrewAI: three frameworks with combined 400K lines of code. My Conway Automaton is 212 lines of TypeScript. Here's every single one, explained.

agents · code · tutorial · claude

The autonomous AI agent framework market has a problem. Every framework ships with 50,000 to 200,000 lines of code wrapping the same core loop: take a goal, ask the LLM what to do, run it, read the result, repeat. The code is rarely the complexity. The prompt engineering and the guardrails are. Here is a complete, production-running agent in 212 lines of TypeScript, with every decision explained.

I call it the Conway Automaton. It runs my life.

What this agent actually does

Conway is a simple, single-purpose agent. It is not general AGI. It is a focused automaton designed to execute tasks for me.

  • Runs 3 times a day. A cron job kicks it off at 9 AM, 1 PM, and 9 PM.
  • Checks task sources. It pulls from the Linear API for my startup, a specific Gmail label for client work, and my Telegram "Saved Messages" for personal notes.
  • Picks the highest-priority task. It uses a simple heuristic: Linear issues first, then emails, then Telegram. It only considers tasks that are not blocked.
  • Executes the task. It uses a small set of tools: a bash terminal, API calls to Claude and Gemini, the Linear API for project management, and a Slack webhook for notifications.
  • Writes to a journal. Every action, observation, and thought is logged to a local SQLite database. This journal becomes the context for the next step.
  • Self-evaluates. After each action, it asks itself: "Did that work? Is the task done?" If yes, it marks the task complete. If no, it decomposes the task into smaller sub-tasks or retries.
  • Stops when done. The loop terminates when no unblocked tasks remain.

This covers about 90% of what I see people using complex frameworks for. Now, let's see the code.

The whole file

This is the entire agent. It is one file, conway.ts, run with Bun. There are no hidden abstractions.

Section 1: Imports and Environment

Lines 1-20 are boilerplate. We import our dependencies, load environment variables from a .env file, and initialize API clients. I also define a panic function because I like how Go handles fatal errors.

typescript
import { config } from "dotenv";
import Anthropic from "@anthropic-ai/sdk";
import { GoogleGenerativeAI } from "@google/generative-ai";
import { Database } from "bun:sqlite";

// Load environment variables from .env file
config();

// --- CLIENTS & CONFIG ---
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
const gemini = genAI.getGenerativeModel({ model: "gemini-1.5-pro-latest" });
const db = new Database("journal.sqlite", { create: true });

const MAX_RETRIES = 3;
const AGENT_MODEL = "claude-3-opus-20240229";

// Utility for handling fatal errors
function panic(message: string): never {
  console.error(message);
  process.exit(1);
}

Section 2: Core Data Structures

Lines 21-50 define the types that govern the agent's world. TypeScript's static typing is a huge advantage here. It prevents entire classes of bugs common in Python-based agent frameworks.

  • Task: The unit of work. It has an ID, a description, a source, and a status.
  • Tool: The definition of a tool the agent can use. It includes a JSON schema for the arguments, which is crucial for structured output.
  • ToolCall: An instance of a tool being called with specific arguments.
  • JournalEntry: A record of a single step in the agent's process.
  • StepResult: The outcome of a tool execution.
typescript
// --- TYPES ---
type Task = {
  id: string;
  description: string;
  source: "linear" | "email" | "telegram";
  status: "todo" | "in_progress" | "done" | "failed";
};

type Tool = {
  name: string;
  description: string;
  input_schema: Anthropic.Tool["input_schema"];
  execute: (args: any) => Promise<string>;
};

type ToolCall = {
  name: string;
  args: any;
};

type JournalEntry = {
  taskId: string;
  step: number;
  thought: string;
  action: ToolCall | null;
  observation: string;
  timestamp: string;
};

type StepResult = {
  status: "success" | "error";
  output: string;
};

Section 3: The Tool Registry

Lines 51-85 define the agent's capabilities. Each tool is an async function that takes an object of arguments and returns a single string. The LLM is responsible for parsing this string. This "string-in, string-out" architecture is simple and robust.

Notice the input_schema for each tool. This is the Anthropic-specific format for defining a tool's arguments. The model uses this schema to generate valid JSON for the args of a ToolCall.

typescript
// --- TOOLS ---
const tools: Record<string, Tool> = {
  bash: {
    name: "bash",
    description: "Execute a shell command.",
    input_schema: { type: "object", properties: { command: { type: "string" } }, required: ["command"] },
    execute: async ({ command }) => {
      const proc = Bun.spawn(command.split(" "));
      const stdout = await new Response(proc.stdout).text();
      const stderr = await new Response(proc.stderr).text();
      if (stderr) throw new Error(stderr);
      return stdout;
    },
  },
  claudeCall: {
    name: "claudeCall",
    description: "Invoke the Claude LLM for complex reasoning or generation.",
    input_schema: { type: "object", properties: { prompt: { type: "string" } }, required: ["prompt"] },
    execute: async ({ prompt }) => {
      const msg = await anthropic.messages.create({ model: AGENT_MODEL, max_tokens: 4096, messages: [{ role: "user", content: prompt }] });
      return msg.content[0].text;
    },
  },
  geminiCall: {
    name: "geminiCall",
    description: "Invoke the Gemini LLM, useful for alternative perspectives or web search capabilities.",
    input_schema: { type: "object", properties: { prompt: { type: "string" } }, required: ["prompt"] },
    execute: async ({ prompt }) => {
      const result = await gemini.generateContent(prompt);
      return result.response.text();
    },
  },
  linearQuery: {
    name: "linearQuery",
    description: "Get information from the Linear project management tool.",
    input_schema: { type: "object", properties: { query: { type: "string" } }, required: ["query"] },
    execute: async ({ query }) => `[Mock Linear API response for query: ${query}]`, // Placeholder
  },
  slackPost: {
    name: "slackPost",
    description: "Post a message to a Slack channel.",
    input_schema: { type: "object", properties: { channel: { type: "string" }, message: { type: "string" } }, required: ["channel", "message"] },
    execute: async ({ channel, message }) => `[Mock Slack post to #${channel}: ${message}]`, // Placeholder
  },
};

Section 4: The Main Agent Loop

This is the heart of the agent, from line 86 to 120. The runAgentLoop function takes a task and orchestrates the entire process.

  1. It initializes a journal for the task.
  2. It enters a while loop that continues as long as the task is not done and we have not exceeded our retry limit.
  3. Inside the loop, it constructs a prompt using the task description and the history from the journal.
  4. It calls the Anthropic API, passing the prompt and the list of available tools. This is the key step. We are asking the LLM to choose the next action.
  5. It processes the response, executes the chosen tool, and records the observation.
  6. It calls the selfEvaluate function (which we will see next) to decide if the task is complete.
  7. It logs everything to the journal and repeats.
typescript
// --- AGENT CORE ---
async function runAgentLoop(task: Task) {
  console.log(`[Conway] Starting task: ${task.id} - ${task.description}`);
  let journal: JournalEntry[] = [];
  let retries = 0;
  let taskStatus = task.status;

  while (taskStatus !== "done" && taskStatus !== "failed" && retries < MAX_RETRIES) {
    const prompt = createPrompt(task, journal);
    const step = journal.length + 1;

    try {
      const response = await anthropic.messages.create({
        model: AGENT_MODEL,
        max_tokens: 1024,
        messages: [{ role: "user", content: prompt }],
        tools: Object.values(tools).map(({ name, description, input_schema }) => ({ name, description, input_schema })),
      });

      const { thought, toolCall } = processLLMResponse(response);
      let observation = "";

      if (toolCall) {
        const toolResult = await executeTool(toolCall);
        observation = toolResult.output;
      } else {
        observation = "No tool was called. The agent decided to think or conclude.";
      }

      logToJournal(db, { taskId: task.id, step, thought, action: toolCall, observation, timestamp: new Date().toISOString() });
      journal.push({ taskId: task.id, step, thought, action: toolCall, observation, timestamp: new Date().toISOString() });
      
      const evaluation = await selfEvaluate(task, journal);
      taskStatus = evaluation.status;
      // In a real system, you'd handle sub-task creation here.
      
    } catch (error) {
      console.error(`[Conway] Error in step ${step} for task ${task.id}:`, error);
      retries++;
      logToJournal(db, { taskId: task.id, step, thought: "An error occurred.", action: null, observation: (error as Error).message, timestamp: new Date().toISOString() });
    }
  }
  console.log(`[Conway] Finished task: ${task.id} with status: ${taskStatus}`);
}

Section 5: Self-Evaluation

After every action, the agent must reflect. The selfEvaluate function (lines 121-150) is a second, smaller LLM call. Its only job is to look at the history and decide if the main goal has been achieved.

This separation of "doing" (the main loop) and "reflecting" (the evaluation step) is critical. It prevents the agent from getting stuck in loops or hallucinating that a task is done when it is not. The evaluation prompt is highly constrained, asking only for a status and a justification.

typescript
function processLLMResponse(response: Anthropic.Messages.Message): { thought: string; toolCall: ToolCall | null } {
  const thought = response.content.find(c => c.type === 'text')?.text || "";
  const toolUseBlock = response.content.find(c => c.type === 'tool_use');

  if (toolUseBlock) {
    return { thought, toolCall: { name: toolUseBlock.name, args: toolUseBlock.input } };
  }
  return { thought, toolCall: null };
}

async function executeTool(toolCall: ToolCall): Promise<StepResult> {
  try {
    const tool = tools[toolCall.name];
    if (!tool) throw new Error(`Tool '${toolCall.name}' not found.`);
    const output = await tool.execute(toolCall.args);
    return { status: "success", output: output.slice(0, 2000) }; // Truncate output
  } catch (error) {
    return { status: "error", output: (error as Error).message };
  }
}

async function selfEvaluate(task: Task, journal: JournalEntry[]): Promise<{ status: "done" | "in_progress" | "failed"; reason: string }> {
  const prompt = `
    Task: ${task.description}
    Journal of actions taken:
    ${journal.map(j => `Step ${j.step}: Thought: ${j.thought}\nAction: ${JSON.stringify(j.action)}\nObservation: ${j.observation}`).join("\n\n")}
    
    Based on the journal, is the task complete?
    Respond with a single JSON object: {"status": "done" | "in_progress" | "failed", "reason": "your reasoning"}.
  `;
  const response = await anthropic.messages.create({ model: "claude-3-haiku-20240307", max_tokens: 200, messages: [{ role: "user", content: prompt }] });
  try {
    return JSON.parse(response.content[0].text);
  } catch {
    return { status: "in_progress", reason: "Failed to parse evaluation." };
  }
}

Section 6: Journaling and Prompt Creation

Lines 151-180 contain the helper functions for persistence and context building.

  • logToJournal: A simple function that takes a JournalEntry and writes it to the SQLite database. This is our agent's memory.
  • createPrompt: This function assembles the prompt for the main LLM call. It combines the original task, the list of available tools, and the most recent entries from the journal. This gives the agent the context it needs to make its next decision.
typescript
// --- PERSISTENCE & PROMPTING ---
function setupDatabase() {
  db.run(`
    CREATE TABLE IF NOT EXISTS journal (
      id INTEGER PRIMARY KEY AUTOINCREMENT,
      taskId TEXT NOT NULL,
      step INTEGER NOT NULL,
      thought TEXT,
      action TEXT,
      observation TEXT,
      timestamp TEXT NOT NULL
    );
  `);
}

function logToJournal(db: Database, entry: JournalEntry) {
  db.run(
    "INSERT INTO journal (taskId, step, thought, action, observation, timestamp) VALUES (?, ?, ?, ?, ?, ?)",
    [entry.taskId, entry.step, entry.thought, JSON.stringify(entry.action), entry.observation, entry.timestamp]
  );
}

function createPrompt(task: Task, journal: JournalEntry[]): string {
  const history = journal.slice(-5).map(j => 
    `Step ${j.step}:\nThought: ${j.thought}\nAction: ${JSON.stringify(j.action)}\nObservation: ${j.observation}`
  ).join("\n---\n");

  // The real prompt is in the next section, this is a template
  return `
    You are Conway, an autonomous agent. Your goal is to complete the following task.
    Task: ${task.description}

    Here are the last few steps you have taken:
    ${history}

    Review the history and decide on the next action. You must use one of the available tools.
    Think step-by-step before calling a tool.
  `;
}

Section 7: Entry Point and Scheduling

Finally, lines 181-212 are the application's entry point.

  • fetchTasks: A mock function that simulates pulling tasks from various sources. In the real system, this contains API calls to Linear, Gmail, and Telegram.
  • main: The main async function that fetches tasks and runs the agent loop for the highest-priority one.
  • Scheduling: A simple setInterval simulates a cron job, running the main function every 8 hours.
  • Graceful Shutdown: We listen for the SIGTERM signal (which services like systemd send on shutdown) to ensure the database connection is closed cleanly.
typescript
// --- ENTRYPOINT & SCHEDULING ---
async function fetchTasks(): Promise<Task[]> {
  // In a real system, this would fetch from Linear, Gmail, etc.
  return [
    { id: "LNR-123", description: "Summarize the latest AI research papers on arXiv from the last 24 hours and post the summary to the #research Slack channel.", source: "linear", status: "todo" },
    { id: "EML-456", description: "Draft a follow-up email to client X about the Q2 proposal.", source: "email", status: "todo" },
  ];
}

async function main() {
  console.log(`[Conway] Waking up at ${new Date().toISOString()}`);
  const tasks = await fetchTasks();
  const todoTasks = tasks.filter(t => t.status === "todo");

  if (todoTasks.length === 0) {
    console.log("[Conway] No tasks to perform. Going back to sleep.");
    return;
  }

  // Simple priority: Linear > Email > Telegram
  const taskToRun = todoTasks.sort((a, b) => a.source.localeCompare(b.source))[0];
  await runAgentLoop(taskToRun);
}

// --- MAIN EXECUTION ---
setupDatabase();
console.log("[Conway] Automaton initialized. Waiting for schedule.");

// Run every 8 hours (28800000 ms)
setInterval(main, 28800000); 
main(); // Run once on startup

process.on("SIGINT", () => {
  console.log("[Conway] Shutdown signal received. Closing database.");
  db.close();
  process.exit(0);
});

And that is it. 212 lines of commented TypeScript that perform the core functions of a sophisticated autonomous agent.

Six design decisions I'd defend

This code is simple, but it is not naive. Several key decisions make it more robust than many thousands of lines of framework code.

  1. Use structured output. I use Anthropic's native tool-use feature. This forces the LLM to respond with a JSON object that conforms to a schema I define. This catches 90% of malformed or nonsensical LLM responses before they can cause an error. Parsing free-form text is a recipe for flaky agents.
  2. One LLM call per loop iteration. The main loop makes one call to decide on an action. The evaluation step makes a second, smaller call. I have seen frameworks that use multi-call planners (one call to plan, another to refine, a third to execute). These are bug farms and incredibly difficult to debug. Keep it simple: think, act, observe, repeat.
  3. Journal is local SQLite. My agent's memory is a single journal.sqlite file. I can query it with sqlite3. I can grep it. I can back it up with cp. Cloud-based vector databases are powerful, but for a single agent's working memory, they are massive overkill.
  4. Tools return plain strings. My bash tool returns a string. My claudeCall tool returns a string. The LLM is a phenomenal string parser. Frameworks that try to create strongly typed tool outputs are fighting the nature of the model. Let the LLM do the parsing. It is what it is good at.
  5. Retries are bounded. The main loop has a MAX_RETRIES = 3 constant. If an agent cannot solve a task in three steps, it is probably stuck. Unbounded retries are the number one cause of surprise cloud bills from agent experiments.
  6. Shutdown is graceful. The process.on("SIGINT", ...) handler ensures that if the VPS running my agent reboots, the SQLite database is closed properly, preventing corruption. This is a small detail that is critical for production stability.

What this replaces

The point of this exercise is not just to build a small agent, but to show what is being abstracted away by popular frameworks.

| Framework | Lines of Code Replaced (Approx.) | My Lines | | :--- | :--- | :--- | | LangChain | AgentExecutor, Tool wrapping, Callbacks (~3,000) | 212 | | AutoGen | ConversableAgent, tool_response_callback (~1,000) | 212 | | CrewAI | Crew, Task, Agent, Process triangle (~1,000) | 212 | | Total | ~5,000 lines | 212 lines |

These frameworks provide value, but you should be aware of the complexity cost. For many applications, a single file is more than enough.

What this cannot do

My 212-line agent is not a silver bullet. It has limitations.

  • Multi-agent coordination. This is a single-agent system. It cannot delegate tasks or collaborate with other agents. For true multi-agent systems, you would need to add a message-passing layer (like a Redis queue) and a way to assign tasks. That might be another 50 lines, not a whole framework.
  • Long-term memory. The agent's memory is limited to the current task's journal. It does not remember solutions from previous tasks. For that, you would want to embed the journal entries and store them in a vector database like pgvector. Again, this is an extension, not a reason to adopt a 100,000-line framework.
  • A GUI for monitoring. My monitoring tool is tail -f agent.log and the sqlite3 CLI. If you need a Kanban board to watch your agent work, you might be overcomplicating things.

The prompt that actually runs

The createPrompt function in the code is a simplified template. The real system prompt is the most valuable part of any agent. It is where you encode the rules, constraints, and personality. Here is the one I use for Conway.

text
You are Conway, a diligent and autonomous AI agent. Your purpose is to execute tasks on behalf of your user, Kieran.

Your primary goal is to complete the given task by intelligently using the tools available to you.

**Constraints:**
1. You MUST use one and only one tool per step.
2. Before calling a tool, you must provide a short "thought" process explaining your reasoning in one or two sentences.
3. The observation from a tool is your only way of seeing the world. Base your next step solely on the previous observation.
4. Tool outputs are truncated. Do not rely on them being complete.
5. If a tool returns an error, try a different approach. If you are stuck, you can use the `slackPost` tool to ask for help.

**Task:**
{task_description}

**Tool Reference:**
{tool_definitions}

**History (last 5 steps):**
{journal_history}

**Failure Mode:**
If you determine you cannot proceed or the task is impossible, do not call a tool. Instead, output your final thoughts and the reason for stopping. For example: "I cannot access the specified file, and I have no tools to request permissions. I am blocked."

Now, review your history and the task. Provide your thought and the next tool call.

This prompt is the agent's constitution. It is more important than any of the TypeScript code.

Cost per day

People often worry about the cost of running agents. With a tight loop and efficient models, it is negligible.

| Item | Calculation | Cost per Day (USD) | | :--- | :--- | :--- | | Claude 3 Opus Calls | 3 runs/day × 10 iterations × 2k tokens avg | $0.12 | | Claude 3 Haiku Evals | 3 runs/day × 10 iterations × 0.5k tokens avg | $0.01 | | Gemini 1.5 Pro Fallback | Infrequent, ~5 calls/day avg | $0.02 | | Total | | ~$0.15 |

My monthly bill for Conway is about $4.50. It automates tasks that save me about an hour a day. The return on investment is astronomical.

When the framework is actually worth it

I am not anti-framework. I am anti-premature-complexity. There are valid reasons to use a framework:

  • You are building a multi-tenant platform. If each of your customers needs to run their own agents, a framework can provide the orchestration, security, and resource management you need. Building that yourself is a major undertaking.
  • You have strict regulatory requirements. If you need SOC2 compliance or other certifications, using a framework from a vendor who has already done the audits can save you months of work.
  • Your team has 5+ engineers. Once a team grows, shared abstractions become a coordination tool. A framework provides a common language and structure that can help a larger team work together on a complex agent system.

If you do not fit into one of these categories, you might not need a framework.

The agent is a loop

The agent is a loop. The value is in the prompt, the tools you wire up, and the feedback signal you use for evaluation. Frameworks tend to hide the prompt, complicate the tools, and completely neglect the feedback signal.

If you can read 212 lines of code, you can own your agent. You can understand it, debug it, and extend it without asking for permission or waiting for a new release. Start here.