AI & ML7 min read

Building with Anthropic's Claude API: A Dev's Guide

A practical guide to integrating Anthropic's Claude API into production apps, from stock SDK patterns to custom streaming implementations.

Jay Salot

Senior Full Stack AI Engineer

April 22, 2026 · 7 min read

Last month, I rebuilt our customer support chatbot to use Anthropic's Claude instead of OpenAI's GPT-4. The decision wasn't just about model performance—it came down to how their SDK behaves in production, rate limiting quirks, and honestly, some cost considerations that made the CFO happy. Here's what I learned building real applications with Anthropic's APIs.

Why Anthropic Over Alternatives

I'm not here to start a model war, but Claude 3.5 Sonnet has some specific strengths that matter for production web apps. The context window is massive (200k tokens), which means I can dump entire codebases or long conversation histories without aggressive truncation logic. More importantly, Claude seems better at following structured output instructions—critical when you're parsing responses into TypeScript interfaces.

The gotcha? Anthropic's stock SDK doesn't have as many community libraries and integrations as OpenAI's ecosystem. You'll be writing more glue code yourself. In practice, this bit me when trying to integrate with LangChain—the Anthropic adapters lagged behind in feature parity.

Getting Started with the SDK

Installation is straightforward. I'm using TypeScript for everything because type safety with AI responses saves debugging time later.

npm install @anthropic-ai/sdk
# or
pnpm add @anthropic-ai/sdk

Here's the stock implementation pattern I use in most projects:

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

async function generateResponse(userMessage: string) {
  try {
    const message = await anthropic.messages.create({
      model: 'claude-3-5-sonnet-20241022',
      max_tokens: 1024,
      messages: [
        {
          role: 'user',
          content: userMessage,
        },
      ],
    });

    return message.content[0].type === 'text' 
      ? message.content[0].text 
      : null;
  } catch (error) {
    console.error('Anthropic API error:', error);
    throw error;
  }
}

This works, but it's the happy path. In production, you need to handle rate limits, streaming, and conversation context.

Handling Streaming Responses

For any user-facing chat interface, streaming is non-negotiable. Users expect to see responses appear word-by-word, not wait 10 seconds for a wall of text. The stock streaming API from Anthropic is actually cleaner than OpenAI's in my opinion.

Basic Streaming Implementation

async function streamResponse(userMessage: string) {
  const stream = await anthropic.messages.create({
    model: 'claude-3-5-sonnet-20241022',
    max_tokens: 1024,
    messages: [{ role: 'user', content: userMessage }],
    stream: true,
  });

  for await (const chunk of stream) {
    if (
      chunk.type === 'content_block_delta' &&
      chunk.delta.type === 'text_delta'
    ) {
      process.stdout.write(chunk.delta.text);
    }
  }
}

Streaming to Next.js API Routes

Here's where it gets interesting. When building a Next.js 14 app with App Router, you can use the new streaming response APIs. This pattern sends chunks directly to the client:

// app/api/chat/route.ts
import { NextRequest } from 'next/server';
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY!,
});

export async function POST(req: NextRequest) {
  const { message } = await req.json();

  const encoder = new TextEncoder();
  const stream = new ReadableStream({
    async start(controller) {
      try {
        const apiStream = await anthropic.messages.create({
          model: 'claude-3-5-sonnet-20241022',
          max_tokens: 2048,
          messages: [{ role: 'user', content: message }],
          stream: true,
        });

        for await (const chunk of apiStream) {
          if (
            chunk.type === 'content_block_delta' &&
            chunk.delta.type === 'text_delta'
          ) {
            controller.enqueue(
              encoder.encode(`data: ${JSON.stringify({ text: chunk.delta.text })}\n\n`)
            );
          }
        }

        controller.close();
      } catch (error) {
        controller.error(error);
      }
    },
  });

  return new Response(stream, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive',
    },
  });
}

This bit me initially: make sure you're sending Server-Sent Events (SSE) format with the data: prefix and double newlines. The stock EventSource API on the client expects this exact format.

Managing Conversation Context

Claude doesn't maintain conversation state for you. Every API call is stateless. You need to send the entire conversation history with each request. Here's how I structure this in a typical Express or NestJS backend:

interface Message {
  role: 'user' | 'assistant';
  content: string;
}

class ConversationManager {
  private conversations = new Map<string, Message[]>();

  async addMessage(
    conversationId: string,
    role: 'user' | 'assistant',
    content: string
  ) {
    const messages = this.conversations.get(conversationId) || [];
    messages.push({ role, content });
    
    // Keep only last 20 messages to avoid hitting token limits
    if (messages.length > 20) {
      messages.splice(0, messages.length - 20);
    }
    
    this.conversations.set(conversationId, messages);
  }

  async getClaude(
    conversationId: string,
    userMessage: string
  ): Promise<string> {
    await this.addMessage(conversationId, 'user', userMessage);
    const messages = this.conversations.get(conversationId)!;

    const response = await anthropic.messages.create({
      model: 'claude-3-5-sonnet-20241022',
      max_tokens: 1024,
      messages,
    });

    const assistantMessage = response.content[0].type === 'text'
      ? response.content[0].text
      : '';

    await this.addMessage(conversationId, 'assistant', assistantMessage);
    return assistantMessage;
  }
}

In production, I store this in Redis with a TTL. The in-memory Map is fine for development but won't scale across multiple Node processes.

Cost Optimization Strategies

This is where Anthropic's stock pricing model matters. Claude 3.5 Sonnet costs $3 per million input tokens and $15 per million output tokens. When you're sending full conversation histories with every request, those input tokens add up fast.

Token Counting and Budgeting

Anthropic doesn't provide a stock tokenizer in their SDK (annoying). I use the tiktoken library which is close enough for estimation:

import { encoding_for_model } from 'tiktoken';

function estimateTokens(text: string): number {
  const encoder = encoding_for_model('gpt-4');
  const tokens = encoder.encode(text);
  encoder.free();
  return tokens.length;
}

function truncateConversation(
  messages: Message[],
  maxTokens: number = 10000
): Message[] {
  let totalTokens = 0;
  const truncated: Message[] = [];

  // Iterate from newest to oldest
  for (let i = messages.length - 1; i >= 0; i--) {
    const messageTokens = estimateTokens(messages[i].content);
    if (totalTokens + messageTokens > maxTokens) break;
    
    truncated.unshift(messages[i]);
    totalTokens += messageTokens;
  }

  return truncated;
}

Error Handling and Retries

The stock Anthropic SDK throws errors, but the error types aren't always clear. I wrap API calls with custom retry logic because rate limits happen, especially during traffic spikes.

async function callClaudeWithRetry<T>(
  fn: () => Promise<T>,
  maxRetries: number = 3
): Promise<T> {
  let lastError: Error;

  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (error: any) {
      lastError = error;
      
      // Don't retry on authentication errors
      if (error.status === 401 || error.status === 403) {
        throw error;
      }

      // Exponential backoff for rate limits
      if (error.status === 429) {
        const delay = Math.pow(2, i) * 1000;
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }

      // Retry on 5xx errors
      if (error.status >= 500) {
        await new Promise(resolve => setTimeout(resolve, 1000));
        continue;
      }

      throw error;
    }
  }

  throw lastError!;
}

Production Deployment Considerations

I run most of my AI services on GCP Cloud Run because the auto-scaling just works and cold starts aren't as painful as Lambda for Node.js. Here's what matters:

Environment Variables

Never hardcode API keys. Use Google Secret Manager or AWS Systems Manager Parameter Store. In Cloud Run:

gcloud run deploy claude-api \
  --image gcr.io/project/claude-api \
  --set-secrets ANTHROPIC_API_KEY=anthropic-key:latest \
  --timeout 300s \
  --memory 1Gi

Timeout Configuration

Claude can take 20-30 seconds for complex requests. Make sure your serverless function timeout is configured appropriately. The stock Cloud Run timeout is 60s, but I bump it to 300s for AI endpoints.

Lessons Learned

After shipping three production features using Anthropic's APIs, here's what I wish I knew upfront:

Prompt caching is your friend: Anthropic recently released prompt caching. If you're sending the same system prompt or context repeatedly, enable it. Cut my costs by 40%.
Monitor token usage religiously: Set up CloudWatch or GCP Monitoring alerts. We got a surprise $800 bill one month from a runaway loop sending massive contexts.
The stock SDK is stable: Unlike some AI libraries that break APIs every week, Anthropic has been consistent. Upgrading hasn't broken anything yet.
Type safety matters: Claude's responses can be unpredictable. Using Zod or TypeScript strict parsing on outputs saves debugging time.

The key to using any AI API in production isn't just calling it correctly—it's handling the 99% of cases where something goes slightly wrong.

Conclusion

Building with Anthropic's Claude API shares a lot of patterns with other LLM providers, but the stock implementation details matter. The SDK is clean and TypeScript-friendly, streaming works well with modern frameworks like Next.js, and the cost model is competitive if you optimize token usage.

My recommendation? Start with the stock SDK patterns I've shown here, add retry logic and proper error handling from day one, and monitor token usage closely. The code I've shared is running in production handling thousands of requests daily. It's not perfect, but it's battle-tested.

The AI space moves fast. What works today might need adjustment in six months. But these foundational patterns—streaming responses, conversation management, cost optimization—will remain relevant regardless of which model or provider you're using.

#Anthropic#Claude API#TypeScript#Next.js#AI Integration

AI & ML

June 8, 20267 min read

GitHub Copilot vs Cursor: Real Developer Comparison

After using both AI code generation tools for months, here's what actually matters when you're shipping production TypeScript and React code.

AIDeveloper ToolsGitHub Copilot+2

AI & ML

May 29, 20267 min read

Claude and pgvector: Crafting AI-Powered Web Apps

Exploring how Claude, alongside a vector database like pgvector, can power full-stack web development with JavaScript and TypeScript. We'll dive into practical RAG examples and the challenges of integrating AI into your applications.

ClaudepgvectorAI+4

AI & ML

May 20, 20266 min read

Gemini 2.0: A Full-Stack Dev's Perspective

Exploring Gemini 2.0's potential for full-stack JavaScript/TypeScript developers. From AI-powered code generation to enhanced application features, learn how to integrate this multimodal model into your projects.

geminiAIML+2

Why Anthropic Over Alternatives

Getting Started with the SDK

Handling Streaming Responses

Basic Streaming Implementation

Streaming to Next.js API Routes

Managing Conversation Context

Cost Optimization Strategies

Token Counting and Budgeting

Error Handling and Retries

Production Deployment Considerations

Environment Variables

Timeout Configuration

Lessons Learned

Conclusion

Related Articles

GitHub Copilot vs Cursor: Real Developer Comparison

Claude and pgvector: Crafting AI-Powered Web Apps

Gemini 2.0: A Full-Stack Dev's Perspective