Building with Anthropic's Claude API: A Dev's Guide
A practical guide to integrating Anthropic's Claude API into production apps, from stock SDK patterns to custom streaming implementations.
Last month, I rebuilt our customer support chatbot to use Anthropic's Claude instead of OpenAI's GPT-4. The decision wasn't just about model performance—it came down to how their SDK behaves in production, rate limiting quirks, and honestly, some cost considerations that made the CFO happy. Here's what I learned building real applications with Anthropic's APIs.
Why Anthropic Over Alternatives
I'm not here to start a model war, but Claude 3.5 Sonnet has some specific strengths that matter for production web apps. The context window is massive (200k tokens), which means I can dump entire codebases or long conversation histories without aggressive truncation logic. More importantly, Claude seems better at following structured output instructions—critical when you're parsing responses into TypeScript interfaces.
The gotcha? Anthropic's stock SDK doesn't have as many community libraries and integrations as OpenAI's ecosystem. You'll be writing more glue code yourself. In practice, this bit me when trying to integrate with LangChain—the Anthropic adapters lagged behind in feature parity.
Getting Started with the SDK
Installation is straightforward. I'm using TypeScript for everything because type safety with AI responses saves debugging time later.
npm install @anthropic-ai/sdk
# or
pnpm add @anthropic-ai/sdk
Here's the stock implementation pattern I use in most projects:
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
async function generateResponse(userMessage: string) {
try {
const message = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
messages: [
{
role: 'user',
content: userMessage,
},
],
});
return message.content[0].type === 'text'
? message.content[0].text
: null;
} catch (error) {
console.error('Anthropic API error:', error);
throw error;
}
}
This works, but it's the happy path. In production, you need to handle rate limits, streaming, and conversation context.
Handling Streaming Responses
For any user-facing chat interface, streaming is non-negotiable. Users expect to see responses appear word-by-word, not wait 10 seconds for a wall of text. The stock streaming API from Anthropic is actually cleaner than OpenAI's in my opinion.
Basic Streaming Implementation
async function streamResponse(userMessage: string) {
const stream = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
messages: [{ role: 'user', content: userMessage }],
stream: true,
});
for await (const chunk of stream) {
if (
chunk.type === 'content_block_delta' &&
chunk.delta.type === 'text_delta'
) {
process.stdout.write(chunk.delta.text);
}
}
}
Streaming to Next.js API Routes
Here's where it gets interesting. When building a Next.js 14 app with App Router, you can use the new streaming response APIs. This pattern sends chunks directly to the client:
// app/api/chat/route.ts
import { NextRequest } from 'next/server';
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY!,
});
export async function POST(req: NextRequest) {
const { message } = await req.json();
const encoder = new TextEncoder();
const stream = new ReadableStream({
async start(controller) {
try {
const apiStream = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 2048,
messages: [{ role: 'user', content: message }],
stream: true,
});
for await (const chunk of apiStream) {
if (
chunk.type === 'content_block_delta' &&
chunk.delta.type === 'text_delta'
) {
controller.enqueue(
encoder.encode(`data: ${JSON.stringify({ text: chunk.delta.text })}\n\n`)
);
}
}
controller.close();
} catch (error) {
controller.error(error);
}
},
});
return new Response(stream, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
},
});
}
This bit me initially: make sure you're sending Server-Sent Events (SSE) format with the data: prefix and double newlines. The stock EventSource API on the client expects this exact format.
Managing Conversation Context
Claude doesn't maintain conversation state for you. Every API call is stateless. You need to send the entire conversation history with each request. Here's how I structure this in a typical Express or NestJS backend:
interface Message {
role: 'user' | 'assistant';
content: string;
}
class ConversationManager {
private conversations = new Map<string, Message[]>();
async addMessage(
conversationId: string,
role: 'user' | 'assistant',
content: string
) {
const messages = this.conversations.get(conversationId) || [];
messages.push({ role, content });
// Keep only last 20 messages to avoid hitting token limits
if (messages.length > 20) {
messages.splice(0, messages.length - 20);
}
this.conversations.set(conversationId, messages);
}
async getClaude(
conversationId: string,
userMessage: string
): Promise<string> {
await this.addMessage(conversationId, 'user', userMessage);
const messages = this.conversations.get(conversationId)!;
const response = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
messages,
});
const assistantMessage = response.content[0].type === 'text'
? response.content[0].text
: '';
await this.addMessage(conversationId, 'assistant', assistantMessage);
return assistantMessage;
}
}
In production, I store this in Redis with a TTL. The in-memory Map is fine for development but won't scale across multiple Node processes.
Cost Optimization Strategies
This is where Anthropic's stock pricing model matters. Claude 3.5 Sonnet costs $3 per million input tokens and $15 per million output tokens. When you're sending full conversation histories with every request, those input tokens add up fast.
Token Counting and Budgeting
Anthropic doesn't provide a stock tokenizer in their SDK (annoying). I use the tiktoken library which is close enough for estimation:
import { encoding_for_model } from 'tiktoken';
function estimateTokens(text: string): number {
const encoder = encoding_for_model('gpt-4');
const tokens = encoder.encode(text);
encoder.free();
return tokens.length;
}
function truncateConversation(
messages: Message[],
maxTokens: number = 10000
): Message[] {
let totalTokens = 0;
const truncated: Message[] = [];
// Iterate from newest to oldest
for (let i = messages.length - 1; i >= 0; i--) {
const messageTokens = estimateTokens(messages[i].content);
if (totalTokens + messageTokens > maxTokens) break;
truncated.unshift(messages[i]);
totalTokens += messageTokens;
}
return truncated;
}
Error Handling and Retries
The stock Anthropic SDK throws errors, but the error types aren't always clear. I wrap API calls with custom retry logic because rate limits happen, especially during traffic spikes.
async function callClaudeWithRetry<T>(
fn: () => Promise<T>,
maxRetries: number = 3
): Promise<T> {
let lastError: Error;
for (let i = 0; i < maxRetries; i++) {
try {
return await fn();
} catch (error: any) {
lastError = error;
// Don't retry on authentication errors
if (error.status === 401 || error.status === 403) {
throw error;
}
// Exponential backoff for rate limits
if (error.status === 429) {
const delay = Math.pow(2, i) * 1000;
await new Promise(resolve => setTimeout(resolve, delay));
continue;
}
// Retry on 5xx errors
if (error.status >= 500) {
await new Promise(resolve => setTimeout(resolve, 1000));
continue;
}
throw error;
}
}
throw lastError!;
}
Production Deployment Considerations
I run most of my AI services on GCP Cloud Run because the auto-scaling just works and cold starts aren't as painful as Lambda for Node.js. Here's what matters:
Environment Variables
Never hardcode API keys. Use Google Secret Manager or AWS Systems Manager Parameter Store. In Cloud Run:
gcloud run deploy claude-api \
--image gcr.io/project/claude-api \
--set-secrets ANTHROPIC_API_KEY=anthropic-key:latest \
--timeout 300s \
--memory 1Gi
Timeout Configuration
Claude can take 20-30 seconds for complex requests. Make sure your serverless function timeout is configured appropriately. The stock Cloud Run timeout is 60s, but I bump it to 300s for AI endpoints.
Lessons Learned
After shipping three production features using Anthropic's APIs, here's what I wish I knew upfront:
- Prompt caching is your friend: Anthropic recently released prompt caching. If you're sending the same system prompt or context repeatedly, enable it. Cut my costs by 40%.
- Monitor token usage religiously: Set up CloudWatch or GCP Monitoring alerts. We got a surprise $800 bill one month from a runaway loop sending massive contexts.
- The stock SDK is stable: Unlike some AI libraries that break APIs every week, Anthropic has been consistent. Upgrading hasn't broken anything yet.
- Type safety matters: Claude's responses can be unpredictable. Using Zod or TypeScript strict parsing on outputs saves debugging time.
The key to using any AI API in production isn't just calling it correctly—it's handling the 99% of cases where something goes slightly wrong.
Conclusion
Building with Anthropic's Claude API shares a lot of patterns with other LLM providers, but the stock implementation details matter. The SDK is clean and TypeScript-friendly, streaming works well with modern frameworks like Next.js, and the cost model is competitive if you optimize token usage.
My recommendation? Start with the stock SDK patterns I've shown here, add retry logic and proper error handling from day one, and monitor token usage closely. The code I've shared is running in production handling thousands of requests daily. It's not perfect, but it's battle-tested.
The AI space moves fast. What works today might need adjustment in six months. But these foundational patterns—streaming responses, conversation management, cost optimization—will remain relevant regardless of which model or provider you're using.
Related Articles
OpenAI APIs: A Full-Stack Dev's Deep Dive
Explore practical applications of OpenAI's APIs for full-stack developers. Learn how to integrate them into your JavaScript/TypeScript projects with real-world examples and best practices.
Claude Mythos: A Full-Stack Dev's Take
Exploring Claude's capabilities and limitations from a full-stack JavaScript/TypeScript developer's perspective. Practical examples and real-world use cases included.
Multi-Agent AI Systems: Orchestration Patterns
Explore orchestration patterns for multi-agent AI systems from a full-stack JavaScript developer's perspective. Learn practical techniques and code examples for building robust and scalable AI applications.
