# LLM Prompt Engineering Patterns

> Every prompt engineering technique that works in 2026 — chain-of-thought, RAG, ReAct, and production-grade patterns

Prompt engineering has evolved from simple tricks into a systematic discipline. This map covers every major technique — chain-of-thought, tree-of-thought, few-shot, ReAct, RAG, structured outputs, function calling, and prompt security — organized by complexity from beginner fundamentals to production-grade agentic patterns.

_Category: technology · Tags: #prompt-engineering #llm #chain-of-thought #rag #ai-agents #few-shot #function-calling #gpt #prompt-engineering-guide #chain-of-thought-prompting #rag-explained #best-prompting-techniques #llm-prompt-patterns · Published: 2026-03-23_

## Foundational Patterns

### Role & Persona Prompting

Role prompting and persona prompting: assign the model a specific identity before giving it a task. 'You are a senior security engineer conducting a code review' activates different knowledge, vocabulary, and reasoning patterns than a bare question. The CRAFT framework formalizes this: Context, Role, Action, Format, and Tone. Role prompting is the simplest technique but remains one of the most impactful — it sets the frame for everything that follows.

### Few-Shot Prompting

Provide 2-5 examples of input-output pairs before your actual request. The model pattern-matches on your examples far more reliably than on abstract instructions. Few-shot works best for classification, formatting, tone matching, and structured extraction. Zero-shot (no examples) works for simple tasks; one-shot is often enough for formatting; few-shot becomes essential when precision matters and the task is ambiguous.

### Instruction Tuning & System Prompts

System prompts set behavioral guardrails that persist across an entire conversation. Unlike user messages, they establish identity, constraints, and output format from the start. Modern instruction-tuned models respond better to clear, structured system prompts than to conversational requests. Use XML tags to separate sections: <ROLE>, <RULES>, <OUTPUT_FORMAT>. Structured system prompts reduce ambiguity and produce more consistent outputs, especially with GPT-5 and Claude.

### Constraint & Negative Prompting

Tell the model what NOT to do. 'Do not use jargon. Never start with Unfortunately. Keep responses under 200 words.' Constraints narrow the output space and prevent the model's worst habits — verbosity, hedging, and generic filler. Negative instructions are often more effective than positive ones because they eliminate failure modes directly rather than hoping the model infers them from a positive instruction.

## Reasoning Techniques

### Chain-of-Thought (CoT)

Chain-of-thought prompting is the breakthrough reasoning technique: ask the model to 'think step by step' or show worked examples with intermediate reasoning. CoT dramatically improves accuracy on math, logic, coding, and multi-step problems. The key insight — discovered by Wei et al. at Google — is that LLMs reason better when they write their reasoning out rather than jumping directly to answers. CoT only yields gains with models over ~100B parameters; smaller models produce illogical chains that reduce accuracy.

### Tree-of-Thought (ToT)

Tree-of-Thought (ToT) prompting is an advanced extension of CoT that generates multiple reasoning paths simultaneously and evaluates them like a search tree. Each node is an intermediate reasoning step; branches explore alternative approaches. The model evaluates which branches are most promising using breadth-first or depth-first search. ToT excels at strategic planning, creative writing with constraints, and puzzle-solving — any task where exploring multiple paths before committing outperforms linear reasoning.

### Self-Consistency

Generate multiple chain-of-thought answers to the same question (using temperature > 0), then take the majority vote on the final answer. This technique, proposed by Wang et al., significantly improves accuracy over single-path CoT because different reasoning paths may make different errors, but the correct answer appears most frequently. It trades compute cost for reliability — essential for high-stakes applications like medical diagnosis or legal analysis.

### Self-Reflection & Self-Critique

Ask the model to review and critique its own output: 'Now review your answer. What might be wrong? What did you miss? Revise if needed.' This metacognitive loop catches errors that single-pass generation misses. Constitutional AI extends this by giving the model explicit principles to evaluate against. Self-reflection is especially powerful for code generation — have the model write tests, then run them mentally against its own code to find bugs before you do.

## Production Patterns

### Retrieval-Augmented Generation (RAG)

RAG (Retrieval-Augmented Generation) is the dominant production pattern: retrieve relevant documents from a knowledge base, inject them into the prompt as context, and have the model generate answers grounded in that data. RAG solves the hallucination problem for domain-specific questions by providing the model with real source material rather than relying on its training data. The architecture: query → embedding search → top-K retrieval → prompt assembly → generation → citation. Vector databases like Pinecone, Weaviate, and pgvector power the retrieval step.

### Structured Outputs

Force the model to return data in a specific format — JSON, YAML, XML, or a custom schema. Modern APIs (OpenAI, Anthropic) support native structured output modes that guarantee valid JSON conforming to a provided JSON Schema. This eliminates fragile regex parsing and makes LLM outputs directly consumable by downstream systems. Use structured outputs whenever the LLM output feeds into code rather than being read by a human.

### Function Calling & Tool Use

LLM function calling explained: define available functions (tools) with their parameter schemas, and let the model decide when and how to call them. The model does not execute the function itself — it outputs a structured request specifying which function to call and with what arguments. Your application executes the function and feeds the result back. This is the foundation of all agentic AI: the model reasons about when to search the web, query a database, send an email, or call an API.

### Prompt Chaining & Pipelines

Break complex tasks into a sequence of simpler prompts where each step's output becomes the next step's input. Example: Step 1 extracts key entities from a document, Step 2 classifies them, Step 3 generates a summary using only the classified entities. Chaining produces better results than a single monolithic prompt because each step is focused and verifiable. It also enables mixing models — use a cheap model for extraction, an expensive one for reasoning.

## Agentic Patterns

### ReAct: Reasoning + Acting

The foundational agentic pattern: the model alternates between Thought (reasoning about what to do), Action (calling a tool or function), and Observation (reading the result). This loop continues until the task is complete. ReAct, proposed by Yao et al., enables models to solve problems that require real-world interaction — web searches, database queries, API calls — while maintaining a transparent chain of reasoning. LangChain, LangGraph, and CrewAI implement ReAct as their core agent loop.

### Multi-Agent Orchestration

Instead of one model doing everything, assign specialized agents to different subtasks: a Researcher agent searches the web, a Coder agent writes code, a Reviewer agent checks quality, and an Orchestrator agent coordinates them. Frameworks like CrewAI, AutoGen, and LangGraph enable multi-agent workflows where agents collaborate, debate, and refine each other's work. This mirrors how human teams operate — specialization and review produce better results than solo generalists.

### Model Context Protocol (MCP)

An open standard created by Anthropic that allows AI models to securely access external tools, data sources, and APIs through a unified protocol. MCP replaces the fragmented world of custom tool integrations with a standardized interface — similar to how USB standardized hardware connections. Any MCP-compatible tool works with any MCP-compatible model. This is the infrastructure layer that makes agentic AI practical at enterprise scale.

## Security & Reliability

### Prompt Injection Attacks

What is prompt injection? The most critical security risk in LLM applications. Direct injection: a user crafts input that overrides the system prompt ('ignore all previous instructions and...'). Indirect injection: malicious instructions are hidden in external content the model consumes — web pages, emails, documents retrieved by RAG. Research shows just 5 poisoned documents in a RAG database can manipulate responses 90% of the time. No production LLM application should go live without injection defenses.

### Defense-in-Depth for Prompts

No single defense stops prompt injection. Production systems need layered protection: 1) Input filtering and anomaly detection on user inputs, 2) Hierarchical system prompts with privileged instruction boundaries, 3) Output validation and response verification before returning results, 4) Sandboxed tool execution so injected instructions cannot cause real damage, 5) Content-based filtering on RAG retrieved documents. The PALADIN framework implements five protective layers that reduce successful attacks from 73% to under 9%.

### Hallucination Mitigation

LLMs generate plausible-sounding but fabricated information with high confidence. Mitigation strategies: Ground with RAG (provide real source documents), request citations (force the model to reference specific sources), use self-consistency (multiple generations expose inconsistencies), set lower temperature for factual tasks, and implement fact-checking pipelines that cross-reference claims against trusted databases. The fundamental rule: never deploy an LLM for factual queries without a grounding mechanism.

## Optimization & Evaluation

### Temperature & Sampling Controls

Temperature controls randomness: 0 produces deterministic, focused outputs ideal for factual tasks and code; 0.7-1.0 produces creative, varied outputs for brainstorming and writing. Top-p (nucleus sampling) limits the token pool to the most probable options. Top-k caps the number of tokens considered. For production systems, use temperature 0 with structured outputs. For creative work, use 0.8-1.0. Understanding these knobs is the difference between inconsistent demos and reliable products.

### Evaluation & Benchmarking

How do you know if your prompt is good? Systematic evaluation: create a test set of inputs with expected outputs, run your prompt against them, and measure accuracy, consistency, and latency. LLM-as-judge uses a second model to evaluate the first model's outputs against rubrics. Tools like Promptfoo, LangSmith, and Braintrust automate this workflow. Without evaluation, prompt engineering is guesswork. With it, it becomes iterative optimization.

### Prompt Optimization & DSPy

Instead of manually tweaking prompts, use automated optimization. DSPy (Declarative Self-improving Python) treats prompts as programs: you define inputs, outputs, and a metric, and the framework automatically optimizes the prompt through compilation. It can discover few-shot examples, generate chain-of-thought templates, and fine-tune instructions — all programmatically. This approach has outperformed hand-crafted prompts on multiple benchmarks and represents the future of prompt engineering at scale.

## Other Concepts

### The Prompt Engineer's Toolkit

The complete 2026 toolkit: 1) Start with CRAFT (Context, Role, Action, Format, Tone) for every prompt. 2) Add CoT for reasoning tasks. 3) Use few-shot for formatting and classification. 4) Ground with RAG for factual accuracy. 5) Structure outputs as JSON for code integration. 6) Enable function calling for real-world actions. 7) Layer security defenses against injection. 8) Evaluate systematically, not by vibes. 9) Automate optimization with DSPy. The field has moved from art to engineering.

## How These Concepts Connect
- **LLM Prompt Engineering Patterns** → **Role & Persona Prompting**
- **LLM Prompt Engineering Patterns** → **Chain-of-Thought (CoT)**
- **LLM Prompt Engineering Patterns** → **Retrieval-Augmented Generation (RAG)**
- **LLM Prompt Engineering Patterns** → **ReAct: Reasoning + Acting**
- **LLM Prompt Engineering Patterns** → **Prompt Injection Attacks**
- **LLM Prompt Engineering Patterns** → **Temperature & Sampling Controls**
- **Role & Persona Prompting** → **Few-Shot Prompting**
- **Role & Persona Prompting** → **Instruction Tuning & System Prompts**
- **Role & Persona Prompting** → **Constraint & Negative Prompting**
- **Chain-of-Thought (CoT)** → **Tree-of-Thought (ToT)**
- **Chain-of-Thought (CoT)** → **Self-Consistency**
- **Chain-of-Thought (CoT)** → **Self-Reflection & Self-Critique**
- **Retrieval-Augmented Generation (RAG)** → **Structured Outputs**
- **Retrieval-Augmented Generation (RAG)** → **Function Calling & Tool Use**
- **Retrieval-Augmented Generation (RAG)** → **Prompt Chaining & Pipelines**
- **ReAct: Reasoning + Acting** → **Multi-Agent Orchestration**
- **ReAct: Reasoning + Acting** → **Model Context Protocol (MCP)**
- **Prompt Injection Attacks** → **Defense-in-Depth for Prompts**
- **Prompt Injection Attacks** → **Hallucination Mitigation**
- **Temperature & Sampling Controls** → **Evaluation & Benchmarking**
- **Temperature & Sampling Controls** → **Prompt Optimization & DSPy**

---
_Source: https://mindlify.co/m/llm-prompt-engineering-patterns. Published by [Mindlify](https://mindlify.co), AI-powered thought networks._