Designing AI Agents that Use Tools — Learn

01An agent is a loop, not a prompt

A single LLM call answers a question. An agent runs a loop: send the task plus the tools it is allowed to use, let the model decide to call one, execute it, feed the result back, and repeat until the model produces a final answer (or hits a step limit). The model plans; your code executes.

tsx

async function runAgent(task, tools, maxSteps = 8) {
  const messages = [{ role: "user", content: task }];

  for (let step = 0; step < maxSteps; step++) {
    const res = await llm.chat({ messages, tools: tools.map(t => t.schema) });
    if (!res.toolCalls?.length) return res.text; // model is done

    messages.push({ role: "assistant", toolCalls: res.toolCalls });
    for (const call of res.toolCalls) {
      const tool = tools.find(t => t.name === call.name);
      const result = await tool.run(call.input);          // run the side effect
      messages.push({ role: "tool", toolCallId: call.id, content: result });
    }
  }
  throw new Error("agent exceeded its step budget");
}

02Tools are typed functions with a schema

Each tool is a name, a JSON schema describing its inputs, and a function to run. The schema is what the model reads to decide when and how to call it — so write it like documentation, naming the exact failure each field prevents.

tsx

const createWorkflow = {
  name: "create_workflow",
  schema: {
    type: "object",
    properties: {
      name:   { type: "string", description: "Human-readable workflow name" },
      app_id: { type: "string", description: "Target app — use the slug, not the label" },
    },
    required: ["name", "app_id"],
  },
  run: async ({ name, app_id }) => {
    const wf = await api.post("/workflows", { name, app_id });
    return JSON.stringify({ ok: true, id: wf.id });
  },
};

03Ground answers with retrieval (RAG)

For questions about a customer's own data, don't trust the model's memory — give it a search tool backed by your knowledge base, scoped to that tenant, and require it to cite what it used. An endpoint that refuses claims the retrieved evidence doesn't support is what makes AI output trustworthy.

tsx

const searchDocs = {
  name: "search_docs",
  schema: { type: "object", properties: { query: { type: "string" } }, required: ["query"] },
  run: async ({ query }) => {
    const hits = await kb.retrieve(query, { topK: 5, tenantId });   // tenant-scoped
    // Return text + ids so the model can cite its sources.
    return JSON.stringify(hits.map(h => ({ id: h.id, text: h.chunk })));
  },
};

04Pause for a human, then resume

Real workflows can't be fully autonomous. When an agent reaches a decision that needs sign-off, it should checkpoint its state, emit a “waiting for human” status, and stop — then resume from the exact same place once a person responds. Durability across that boundary is the difference between a demo and production.

tsx

// At a human-in-the-loop node: persist and stop.
if (node.type === "await_approval") {
  await store.checkpoint(runId, { messages, cursor: node.id });
  await store.setStatus(runId, "WAITING_FOR_HUMAN");
  return; // the run is durable — nothing is lost
}

// Later, a webhook resumes it from the checkpoint.
async function onApproval(runId, decision) {
  const state = await store.load(runId);
  state.messages.push({ role: "tool", content: "approval: " + decision });
  await resumeAgent(runId, state);
}

05Context is everything

An agent is only as good as what it can see. Build its prompt in layers — a persistent charter (who it is and its limits), the user's roles, the data it's operating on, the available tools, and the current task. The most common failure is a callback that wakes the agent with only the latest message instead of the accumulated context, so make “never lose state” an explicit goal.