AI & ML7 min read

Building AI Agents with LangChain: A Practical Guide

Learn how to build production-ready AI agents using LangChain and LLMs. Real code, real gotchas, and lessons from deploying agents in production.

Jay Salot

Senior Full Stack AI Engineer

June 29, 2026 · 7 min read

Last month, I built an AI agent that monitors our Slack channels, answers customer questions, and escalates complex issues to our support team. It saved us about 15 hours a week. The gotcha? My first version hallucinated pricing information and nearly caused a disaster before we caught it in staging.

Building AI agents isn't just about calling an LLM API and hoping for the best. You need structure, guardrails, and a solid understanding of how these systems actually work. This is what I learned building agents with LangChain that actually make it to production.

What Are AI Agents (Really)?

An AI agent is basically a program that uses an LLM to decide what actions to take. Instead of following a predetermined flow, it reasons about what to do next based on the context and available tools.

Think of it this way: a chatbot follows a script. An agent figures out the script as it goes.

The core loop looks like this:

Agent receives a task
LLM decides what action to take (or if it's done)
Agent executes that action using a tool
Results feed back into the LLM
Repeat until task is complete

In practice, this means your agent can chain together multiple API calls, database queries, or calculations without you hardcoding every possible path.

Setting Up LangChain Agents

I'm using TypeScript here because honestly, working with complex LLM responses without types is asking for runtime errors. Here's the basic setup:

npm install langchain @langchain/openai @langchain/core

Then the basic agent structure:

import { ChatOpenAI } from "@langchain/openai";
import { AgentExecutor, createOpenAIFunctionsAgent } from "langchain/agents";
import { pull } from "langchain/hub";
import { Tool } from "@langchain/core/tools";

const llm = new ChatOpenAI({
  modelName: "gpt-4-turbo-preview",
  temperature: 0,
});

// We'll add tools next
const tools: Tool[] = [];

const prompt = await pull<ChatPromptTemplate>("hwchase17/openai-functions-agent");

const agent = await createOpenAIFunctionsAgent({
  llm,
  tools,
  prompt,
});

const agentExecutor = new AgentExecutor({
  agent,
  tools,
  verbose: true,
});

The temperature: 0 bit is important. For agents, you want deterministic behavior, not creativity. I learned this the hard way when a customer-facing agent gave different answers to the same question.

Building Tools for Agents

Tools are where the magic happens. Each tool is a function your agent can call. The LLM decides when and how to use them based on their descriptions.

Simple Tool Example

Here's a tool that fetches user data from our PostgreSQL database:

import { DynamicStructuredTool } from "@langchain/core/tools";
import { z } from "zod";
import { pool } from "./db";

const getUserTool = new DynamicStructuredTool({
  name: "get_user_info",
  description: "Fetches user information from the database. Use this when you need details about a specific user. Input should be the user's email address.",
  schema: z.object({
    email: z.string().email().describe("The user's email address"),
  }),
  func: async ({ email }) => {
    try {
      const result = await pool.query(
        "SELECT id, name, email, plan, created_at FROM users WHERE email = $1",
        [email]
      );
      
      if (result.rows.length === 0) {
        return "No user found with that email.";
      }
      
      return JSON.stringify(result.rows[0]);
    } catch (error) {
      return `Error fetching user: ${error.message}`;
    }
  },
});

The description is critical. The LLM uses it to decide when to call this tool. Be specific about what it does and when to use it.

Tool That Calls External APIs

We have an agent that checks our deployment status via the GCP API:

const checkDeploymentTool = new DynamicStructuredTool({
  name: "check_deployment_status",
  description: "Checks the status of a Cloud Run service deployment. Returns the current status, latest revision, and traffic split.",
  schema: z.object({
    serviceName: z.string().describe("The name of the Cloud Run service"),
    region: z.string().default("us-central1"),
  }),
  func: async ({ serviceName, region }) => {
    const { CloudRunClient } = await import("@google-cloud/run");
    const client = new CloudRunClient();
    
    const [service] = await client.getService({
      name: `projects/${process.env.GCP_PROJECT}/locations/${region}/services/${serviceName}`,
    });
    
    return JSON.stringify({
      status: service.status?.conditions?.[0]?.state,
      latestRevision: service.status?.latestReadyRevisionName,
      traffic: service.status?.traffic,
    });
  },
});

Handling Agent Memory

Agents need memory to maintain context across interactions. LangChain offers a few options, but I've found BufferMemory works well for most cases:

import { BufferMemory } from "langchain/memory";
import { MessagesPlaceholder } from "@langchain/core/prompts";

const memory = new BufferMemory({
  memoryKey: "chat_history",
  returnMessages: true,
});

const agentExecutor = new AgentExecutor({
  agent,
  tools,
  memory,
  verbose: true,
});

For production, you'll want to persist this to Redis or your database. Here's how I do it with Redis:

import { RedisChatMessageHistory } from "@langchain/redis";
import { createClient } from "redis";

const redisClient = createClient({
  url: process.env.REDIS_URL,
});
await redisClient.connect();

const getMemoryForSession = (sessionId: string) => {
  return new BufferMemory({
    chatHistory: new RedisChatMessageHistory({
      sessionId,
      client: redisClient,
    }),
    memoryKey: "chat_history",
    returnMessages: true,
  });
};

The gotcha here is memory management. If you're not careful, the conversation history grows unbounded and you hit token limits. I use a sliding window approach that keeps only the last 10 exchanges.

Production Gotchas and Guardrails

This is the stuff that bit me in production.

Rate Limiting and Retries

LLM APIs rate limit aggressively. Your agent might make 5-10 API calls per task. That adds up fast.

const llm = new ChatOpenAI({
  modelName: "gpt-4-turbo-preview",
  temperature: 0,
  maxRetries: 3,
  timeout: 30000,
  maxConcurrency: 5,
});

I also wrap the entire agent execution in a timeout:

const executeWithTimeout = async (input: string, timeoutMs: number = 60000) => {
  const timeoutPromise = new Promise((_, reject) =>
    setTimeout(() => reject(new Error("Agent execution timeout")), timeoutMs)
  );
  
  return Promise.race([
    agentExecutor.invoke({ input }),
    timeoutPromise,
  ]);
};

Validating Agent Outputs

Never trust agent outputs blindly. I use Zod to validate critical responses:

const PricingResponseSchema = z.object({
  plan: z.enum(["free", "pro", "enterprise"]),
  price: z.number().min(0),
  currency: z.literal("USD"),
});

const result = await agentExecutor.invoke({ input: userQuery });

try {
  const validated = PricingResponseSchema.parse(JSON.parse(result.output));
  // Safe to use
} catch (error) {
  // Log error, return fallback, alert humans
  logger.error("Agent produced invalid pricing data", { error, output: result.output });
  return fallbackResponse;
}

Cost Monitoring

Agents are expensive. Each task might cost $0.10-$0.50 depending on complexity. Track this:

import { OpenAICallbackHandler } from "langchain/callbacks";

const callbackHandler = new OpenAICallbackHandler();

const result = await agentExecutor.invoke(
  { input },
  { callbacks: [callbackHandler] }
);

logger.info("Agent execution cost", {
  totalTokens: callbackHandler.totalTokens,
  totalCost: callbackHandler.totalCost,
});

Deploying Agents on Cloud Run

I run most of my agents on GCP Cloud Run because they're event-driven and don't need to be always-on. Here's the Express setup:

import express from "express";

const app = express();
app.use(express.json());

app.post("/agent/execute", async (req, res) => {
  const { sessionId, input } = req.body;
  
  try {
    const memory = getMemoryForSession(sessionId);
    const executor = new AgentExecutor({
      agent,
      tools,
      memory,
    });
    
    const result = await executeWithTimeout(input);
    
    res.json({ output: result.output });
  } catch (error) {
    logger.error("Agent execution failed", { error, sessionId, input });
    res.status(500).json({ error: "Agent execution failed" });
  }
});

const port = process.env.PORT || 8080;
app.listen(port, () => {
  console.log(`Agent service running on port ${port}`);
});

The Dockerfile is straightforward:

FROM node:20-slim
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
CMD ["node", "dist/index.js"]

Real-World Example: Support Agent

Here's a simplified version of the support agent I mentioned at the start:

const tools = [
  getUserTool,
  new DynamicStructuredTool({
    name: "search_documentation",
    description: "Searches our product documentation for relevant articles",
    schema: z.object({
      query: z.string(),
    }),
    func: async ({ query }) => {
      // In reality, this hits our vector DB with embeddings
      const results = await searchDocs(query);
      return JSON.stringify(results);
    },
  }),
  new DynamicStructuredTool({
    name: "escalate_to_human",
    description: "Escalates the conversation to a human support agent. Use when the question is too complex or involves account changes.",
    schema: z.object({
      reason: z.string(),
      priority: z.enum(["low", "medium", "high"]),
    }),
    func: async ({ reason, priority }) => {
      await notifySlack(reason, priority);
      return "Escalated to human support. They'll respond within 1 hour.";
    },
  }),
];

const prompt = ChatPromptTemplate.fromMessages([
  ["system", "You are a helpful customer support agent. Be concise and accurate. If you're unsure about pricing or account changes, escalate to a human. Never make up information."],
  new MessagesPlaceholder("chat_history"),
  ["human", "{input}"],
  new MessagesPlaceholder("agent_scratchpad"),
]);

This agent handles about 60% of our support queries autonomously. The other 40% get escalated appropriately. The key is the system prompt and the escalation tool—it knows its limits.

Key Takeaways

Building production AI agents with LangChain requires more than just plugging in an LLM. Here's what matters:

Tool descriptions are critical—spend time crafting them precisely
Always validate agent outputs, especially for critical operations
Set hard timeouts and implement proper error handling
Monitor costs religiously—agents can get expensive fast
Use temperature 0 for deterministic behavior in production
Persist memory to Redis or a database for multi-turn conversations
Give agents an escape hatch (like the escalate_to_human tool)

The agents I've built save our team dozens of hours each week, but they took iteration to get right. Start simple, add guardrails, and expand capabilities gradually. And for the love of everything, test thoroughly before letting them talk to customers.

#LangChain#AI Agents#LLMs#TypeScript#OpenAI

AI & ML

June 22, 20268 min read

Prompt Engineering Techniques for Software Developers

Practical prompt engineering strategies I use daily when building LLM-powered features in production web applications.

AILLMOpenAI+2

AI & ML

June 8, 20267 min read

GitHub Copilot vs Cursor: Real Developer Comparison

After using both AI code generation tools for months, here's what actually matters when you're shipping production TypeScript and React code.

AIDeveloper ToolsGitHub Copilot+2

AI & ML

May 29, 20267 min read

Claude and pgvector: Crafting AI-Powered Web Apps

Exploring how Claude, alongside a vector database like pgvector, can power full-stack web development with JavaScript and TypeScript. We'll dive into practical RAG examples and the challenges of integrating AI into your applications.

ClaudepgvectorAI+4

What Are AI Agents (Really)?

Setting Up LangChain Agents

Building Tools for Agents

Simple Tool Example

Tool That Calls External APIs

Handling Agent Memory

Production Gotchas and Guardrails

Rate Limiting and Retries

Validating Agent Outputs

Cost Monitoring

Deploying Agents on Cloud Run

Real-World Example: Support Agent

Key Takeaways

Related Articles

Prompt Engineering Techniques for Software Developers

GitHub Copilot vs Cursor: Real Developer Comparison

Claude and pgvector: Crafting AI-Powered Web Apps