AI & ML8 min read

Prompt Engineering Techniques for Software Developers

Practical prompt engineering strategies I use daily when building LLM-powered features in production web applications.

Jay Salot

Senior Full Stack AI Engineer

June 22, 2026 · 8 min read

Last year, I shipped a customer support chatbot that answered questions about our API documentation. The first version was embarrassingly bad—it hallucinated endpoints, mixed up parameters, and occasionally told users to "just read the docs" when they were literally asking the docs. The problem wasn't the model. It was my prompts.

Prompt engineering isn't some mystical art. It's a skill like writing good API contracts or database queries. You need to be precise, understand the constraints, and test your assumptions. Here's what I've learned building AI features into production JavaScript applications.

Why Prompt Engineering Matters for Developers

When you're integrating OpenAI, Claude, or any LLM into your Node.js backend, the prompt is your interface. Just like you wouldn't write sloppy SQL queries or vague API requests, you can't be vague with prompts and expect consistent results.

I've seen developers treat prompts as an afterthought—concatenating strings together, throwing user input directly into the context, and wondering why their LangChain pipeline produces garbage. In practice, prompt quality directly impacts:

Response accuracy and relevance
Token usage (which equals money at scale)
Latency—better prompts often need fewer retries
The mental overhead of handling edge cases

The gotcha here is that prompts are code. They need versioning, testing, and iteration just like your TypeScript functions.

Structure Your Prompts Like Function Contracts

The single biggest improvement to my prompt quality came from thinking about them like function signatures. You have inputs, expected outputs, and constraints.

Here's a prompt I used early on:

const prompt = `Answer this question: ${userQuestion}`;

Terrible. No context, no output format, no guardrails. Here's the structured version:

const prompt = `You are a technical support assistant for our REST API.

Your knowledge base:
${relevantDocs}

User question: ${userQuestion}

Provide a JSON response with this exact structure:
{
  "answer": "your detailed answer here",
  "relatedEndpoints": ["array of relevant API endpoints"],
  "confidence": "high|medium|low"
}

If you cannot answer based on the provided documentation, set confidence to "low" and explain what information is missing.`;

Notice the specificity. I tell it what role to play, provide scoped context, define the exact output format (JSON makes parsing trivial), and handle the uncertainty case explicitly.

Defining Output Formats

Requesting structured output—especially JSON—is crucial when you're building real features. I use TypeScript interfaces to define what I expect back:

interface SupportResponse {
  answer: string;
  relatedEndpoints: string[];
  confidence: 'high' | 'medium' | 'low';
}

async function askSupport(question: string): Promise<SupportResponse> {
  const response = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: buildPrompt(question) }],
    temperature: 0.3,
  });

  return JSON.parse(response.choices[0].message.content);
}

Yes, the model can still return malformed JSON. I wrap this in try-catch and have a fallback prompt that's even more explicit about JSON formatting when parsing fails.

Few-Shot Prompting with Real Examples

Few-shot prompting means giving the model examples of what you want. This technique has saved me countless times when dealing with domain-specific tasks.

I had to build a feature that extracted structured data from unstructured user feedback. Zero-shot prompting (just asking without examples) gave inconsistent results. Adding three examples made it rock-solid:

const fewShotPrompt = `Extract structured feedback from user comments.

Example 1:
Input: "The dashboard is slow and the export button doesn't work"
Output: {"issues": [{"area": "performance", "severity": "medium"}, {"area": "export-feature", "severity": "high"}]}

Example 2:
Input: "Love the new UI! But I can't figure out how to filter by date"
Output: {"issues": [{"area": "filtering", "severity": "low"}], "positive": ["ui-redesign"]}

Example 3:
Input: "Everything works great"
Output: {"issues": [], "positive": ["general"]}

Now process this feedback:
Input: "${userFeedback}"
Output:`;

The model learns the pattern from examples. The key is making your examples cover different scenarios—positive feedback, multiple issues, edge cases.

When Few-Shot Becomes Too Much

I've made the mistake of including 10+ examples in a single prompt. You hit diminishing returns and waste tokens. In my testing, 3-5 well-chosen examples work best for most classification and extraction tasks.

If you need more examples, consider fine-tuning instead. I fine-tuned a GPT-3.5 model for a specific categorization task with 500 examples, and it outperformed GPT-4 with few-shot prompting while being faster and cheaper.

Chain-of-Thought for Complex Reasoning

Chain-of-thought (CoT) prompting asks the model to show its work before giving an answer. This sounds fluffy but genuinely improves accuracy for anything involving logic or multi-step reasoning.

I use this for a code review assistant that analyzes TypeScript PRs:

const reviewPrompt = `You are reviewing this TypeScript code change:

${codeDiff}

Before giving your verdict, think through:
1. What is this code trying to accomplish?
2. Are there any potential runtime errors?
3. Are there performance concerns?
4. Does it follow our style guide: ${styleGuideSnippet}?

Think step-by-step, then provide your final assessment as:
- APPROVE
- REQUEST_CHANGES with specific issues
- COMMENT with suggestions`;

Explicitly asking for step-by-step thinking improves the quality of the final verdict. The model is less likely to jump to conclusions or miss subtle issues.

Zero-Shot CoT

Honestly, sometimes just adding "Let's think step by step" to your prompt works. It's almost silly how effective this is:

const prompt = `${complexQuestion}

Let's think step by step:`;

I use this for debugging help. When I paste an error message and ask for solutions, adding that phrase gives more thorough, logical explanations instead of immediate guesses.

Role Prompting and System Messages

With chat-based models (GPT-4, Claude), you have system messages and user messages. System messages set the context and behavior. Use them.

const messages = [
  {
    role: 'system',
    content: 'You are a senior TypeScript developer with expertise in Next.js and React. You provide concise, production-ready code examples. You never suggest deprecated APIs or unsafe patterns.'
  },
  {
    role: 'user',
    content: userQuestion
  }
];

The system message is where I define the persona, constraints, and tone. It's separate from the user input, which means I can change system prompts without affecting the user-facing interface.

One gotcha: some developers stuff everything into the system message. I've found that works well for behavior and tone, but actual task instructions often work better in the user message. Test both.

Handling Context Windows and Retrieval

Even with large context windows (128k+ tokens), you shouldn't dump your entire codebase into a prompt. It's expensive, slow, and the model's attention degrades with massive contexts.

I use RAG (Retrieval-Augmented Generation) to cherry-pick relevant context. Here's a simplified version of what I do:

import { PineconeClient } from '@pinecone-database/pinecone';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';

async function getRelevantDocs(query: string, limit: number = 3) {
  const embeddings = new OpenAIEmbeddings();
  const queryEmbedding = await embeddings.embedQuery(query);
  
  const results = await pineconeIndex.query({
    vector: queryEmbedding,
    topK: limit,
    includeMetadata: true
  });
  
  return results.matches.map(m => m.metadata.text);
}

async function answerWithContext(question: string) {
  const relevantChunks = await getRelevantDocs(question);
  
  const prompt = `Context:
${relevantChunks.join('\n\n')}

Question: ${question}

Answer based only on the provided context:`;
  
  // send to LLM...
}

This retrieves only the most semantically relevant documentation chunks. My vector store has about 50k chunks, but I only send the top 3-5 to the LLM. Massive savings in tokens and latency.

Chunk Size Matters

When I first set up RAG, I chunked documents into 100-token pieces. Too small—I lost context. Then I tried 2000-token chunks. Too big—retrieval wasn't precise enough. I settled on 500-800 tokens with 100-token overlap between chunks. This bit me for a while until I found the sweet spot through testing.

Iteration and Testing Strategies

Here's the thing nobody tells you: your first prompt will suck. Your tenth might still suck. You need a process for iteration.

I maintain a test suite of expected inputs and outputs:

const testCases = [
  {
    input: 'How do I authenticate with the API?',
    expectedKeywords: ['API key', 'Authorization header', 'Bearer'],
    shouldNotInclude: ['username', 'password']
  },
  {
    input: 'The API is returning 429 errors',
    expectedKeywords: ['rate limit', 'retry', 'backoff'],
    shouldNotInclude: ['delete your account']
  }
];

async function testPrompt(promptTemplate: string) {
  for (const test of testCases) {
    const response = await runPrompt(promptTemplate, test.input);
    // assert expected keywords present
    // assert forbidden content absent
    // log results for comparison
  }
}

I run this test suite on every prompt change. It's not perfect—LLMs are probabilistic—but it catches regressions and helps me compare prompt versions objectively.

Prompt Versioning

I keep prompts in version control with semantic versioning. When I deploy a new prompt to production, it goes through the same CI/CD pipeline as code:

// prompts/support-assistant/v2.1.0.ts
export const systemPrompt = `You are a technical support assistant...`;
export const userPromptTemplate = (question: string, context: string) => 
  `Context: ${context}\n\nQuestion: ${question}`;

This lets me roll back bad prompts, A/B test versions, and track which prompt version is running in each environment.

Common Pitfalls to Avoid

A few mistakes I've made (so you don't have to):

Injection attacks: Users can manipulate your prompts by including instructions in their input. I sanitize user input and use delimiters to clearly separate instructions from data: User input: """${sanitize(userInput)}"""
Ignoring temperature: Temperature controls randomness. For deterministic tasks (extraction, classification), use low temperature (0.0-0.3). For creative tasks, higher is fine (0.7-1.0).
Not setting max_tokens: Cap your output length or you'll get surprise bills. I always set reasonable limits.
Over-engineering: Sometimes a simple, direct prompt works better than a complex chain. Start simple, add complexity only when needed.

Wrapping Up

Prompt engineering isn't magic. It's about clarity, structure, and iteration—skills you already have as a software developer.

The techniques that work best for me: structured prompts with defined outputs, few-shot examples for consistency, chain-of-thought for reasoning tasks, strategic use of RAG to manage context, and rigorous testing with version control.

Your mileage will vary based on your use case. The model you're using matters too—Claude handles instruction-following differently than GPT-4, which behaves differently than open-source models. Test everything with your specific models and data.

Start with simple, clear prompts. Add techniques incrementally when you hit limitations. Treat prompts like production code, because that's what they are.

#AI#LLM#OpenAI#Prompt Engineering#TypeScript

AI & ML

June 8, 20267 min read

GitHub Copilot vs Cursor: Real Developer Comparison

After using both AI code generation tools for months, here's what actually matters when you're shipping production TypeScript and React code.

AIDeveloper ToolsGitHub Copilot+2

AI & ML

May 29, 20267 min read

Claude and pgvector: Crafting AI-Powered Web Apps

Exploring how Claude, alongside a vector database like pgvector, can power full-stack web development with JavaScript and TypeScript. We'll dive into practical RAG examples and the challenges of integrating AI into your applications.

ClaudepgvectorAI+4

AI & ML

May 20, 20266 min read

Gemini 2.0: A Full-Stack Dev's Perspective

Exploring Gemini 2.0's potential for full-stack JavaScript/TypeScript developers. From AI-powered code generation to enhanced application features, learn how to integrate this multimodal model into your projects.

geminiAIML+2

Why Prompt Engineering Matters for Developers

Structure Your Prompts Like Function Contracts

Defining Output Formats

Few-Shot Prompting with Real Examples

When Few-Shot Becomes Too Much

Chain-of-Thought for Complex Reasoning

Zero-Shot CoT

Role Prompting and System Messages

Handling Context Windows and Retrieval

Chunk Size Matters

Iteration and Testing Strategies

Prompt Versioning

Common Pitfalls to Avoid

Wrapping Up

Related Articles

GitHub Copilot vs Cursor: Real Developer Comparison

Claude and pgvector: Crafting AI-Powered Web Apps

Gemini 2.0: A Full-Stack Dev's Perspective