How AI agents work: the ReAct loop, LangGraph vs CrewAI, MCP protocol, and memory systems — explained with real production stats
AI agents are no longer a research concept — 79% of enterprises have adopted them, the market is projected to hit $236B by 2034, and Claude Code alone accounts for 4% of all GitHub commits. But how do they actually work? An agent is an LLM in a loop: it perceives its environment, reasons about what to do, takes an action using a tool, observes the result, and repeats until the task is done. Everything else — memory, protocols, multi-agent orchestration — is infrastructure around this core loop.
At its core, every AI agent follows the same iterative loop: receive a task, reason about it, take an action (usually a tool call), observe the result, and repeat until done. This is not a one-shot prompt-response — it is a continuous cycle where each observation informs the next reasoning step. Most production systems combine 4-6 design patterns simultaneously to make this loop robust.
The ReAct agent pattern explained: at each step the model generates a reasoning trace ('I need to check the database before answering') then takes a task-specific action (calling the database API). It observes the result and feeds it back into the next reasoning step. Research shows ReAct outperformed both chain-of-thought-only and act-only approaches on interactive benchmarks. Reasoning and acting reinforce each other: reason to act, act to reason.
A high-reasoning model analyzes the user's request and breaks it into a directed acyclic graph (DAG) of subtasks. Smaller, faster models execute each subtask independently. A re-planner evaluates results and adjusts the plan if necessary. This architecture achieves up to a 92% task completion rate with a 3.6x speedup over sequential ReAct execution. Best for complex, multi-step tasks where upfront planning prevents wasted cycles.
The Reflection Pattern has agents critique their own output before returning answers, reducing hallucinations through iterative self-review. Structured critique loops generate, evaluate, and optionally regenerate — bounded by a quality threshold and maximum attempt count. An important caveat: naive self-correction loops can amplify failures, as tool errors become fuel for increasingly confident but misguided retries. Error classification distinguishes repairable, transient, and fatal errors.
LangGraph vs CrewAI vs OpenAI Agents SDK: the agent framework market has settled into clear lanes. Orchestration engines for complex workflows, rapid-prototyping tools for multi-agent systems, and vendor-native SDKs optimized for specific models. Choosing the right framework is less about 'best' and more about matching your use case — stateful workflows, role-based crews, or model-native safety.
LangGraph tutorial: 27,100 monthly searches make it the highest adoption AI agent framework. LangGraph models agent workflows as directed graphs with typed state: nodes represent agents or functions, edges define transitions. Built-in checkpointing enables time-travel debugging. Sub-graph composition allows nesting complex workflows. LangSmith provides observability. Best for complex branching workflows, conditional routing, and human-in-the-loop approvals.
14,800 monthly searches. CrewAI maps human team structures onto AI agents — researcher, writer, QA reviewer — each with a role, goal, and backstory. Sequential, hierarchical, and consensual process types. Model-agnostic and the fastest path to a working multi-agent prototype: 20 lines of Python to start. Weakness: no built-in checkpointing and coarse-grained error handling limit production reliability.
Claude Agent SDK: safety-first, tool-use-first approach with Constitutional AI constraints and MCP-native development. OpenAI Agents SDK: handoff-based model with built-in guardrails that run in parallel with agent execution — replaced Swarm in March 2025. Smolagents (HuggingFace): code-first, ~1,000 lines of core logic, model-agnostic. Google ADK: hierarchical agent trees with A2A protocol support. Microsoft merged AutoGen and Semantic Kernel into one unified framework (RC Feb 2026).
AI agent memory systems: an agent without memory is just an expensive autocomplete. To execute long-running tasks, maintain context across sessions, and learn from past actions, agents need structured memory systems. The field has converged on four memory types that map to human cognition — and the shift from RAG-only to persistent memory is the biggest architectural change of 2026.
Short-term (working memory): recent conversation turns and session context within the current interaction, limited by context window. Episodic memory: timestamped summaries of past interactions stored in vector databases. Semantic memory: structured factual knowledge — user preferences, domain facts, entity relationships. Procedural memory: stored workflows and skills that agents execute without re-reasoning each time.
VentureBeat predicts contextual memory will surpass RAG for agentic AI in 2026. RAG retrieves external documents at query time — essential but stateless. Persistent memory retains knowledge across sessions, tracks task history, and continuously learns from the environment. The spectrum is shifting from traditional RAG → agentic RAG → full memory systems. Production systems now combine PostgreSQL (structured facts), vector databases (semantic embeddings), and Neo4j (graph-based entity relationships).
Mem0: Dual-store (vector DB + knowledge graph), ~48K GitHub stars, most mature solution. Hindsight: 4 parallel retrieval strategies, 91.4% on LongMemEval (top score). Zep/Graphiti: Temporal knowledge graph where facts carry 'valid from' and 'valid until' timestamps. Letta (MemGPT): OS-inspired 3-tier architecture where agents self-edit their own memory blocks. Benchmarks show up to 26% accuracy gains from hybrid architectures over pure vector approaches.
How AI agents connect to the real world: LLMs inherently cannot do things — they reason about text. Everything else — searching the web, querying databases, executing code, calling APIs — requires tool use via function calling. Two protocols have emerged as the standards for how agents plug into external systems and communicate with each other: MCP for tool access and A2A for agent-to-agent communication.
Tool calling lets an LLM request execution of external functions by outputting a structured JSON object specifying which function to call and with what arguments. The LLM never executes functions directly — the application code runs the function and returns the result. The industry has converged on JSON Schema format for definitions: name, description, parameters, and required fields. The description field is the most critical — it determines when the model chooses to use the tool.
MCP protocol explained: the Model Context Protocol is the universal standard for connecting AI agents to external tools. 97 million monthly SDKdownloads. 10,000+ active MCP servers. 89% of new enterprise projects planning integration. Donated to the Linux Foundation's Agentic AI Foundation (AAIF) in December 2025, co-founded by Anthropic, Block, and OpenAI. Platinum members: AWS, Google, Microsoft, Bloomberg, Cloudflare. Uses JSON-RPC 2.0 with stdio (local) and HTTP+SSE (remote) transports.
MCP connects agents to tools. A2A connects agents to other agents. Announced by Google on April 9, 2025, with 50+ launch partners including Atlassian, Salesforce, SAP, and PayPal. Contributed to the Linux Foundation in June 2025. Four capabilities: capability discovery (Agent Cards in JSON), task management (lifecycle states), agent collaboration (context sharing), and UX negotiation (adapts to different UI capabilities). HTTP, SSE, JSON-RPC, and gRPC support.
Multi-agent orchestration patterns explained: complex problems are broken down across multiple specialized agents (researcher, writer, critic, etc.). The key architectural decision is how those agentscoordinate: who decides what, how tasks flow between them, and what happens when one agent fails. Four core patterns have emerged, each with distinct trade-offs in scalability, latency, fault tolerance, and observability.
Supervisor: A central 'manager' agent decomposes intent, routes sub-tasks to specialized workers, and synthesizes results. Clear accountability, highly scalable. The most common production pattern. Pipeline: Data flows through a fixed sequence of stages (research → outline → draft → edit → publish). Easy to monitor and optimize, but a single stage failure blocks everything. Typical latency: minimum 10 seconds for 5 stages.
Swarm: Agents operate as autonomous peers without centralized control, coordinating through shared state. Coordination is emergent — agents follow local rules, global behavior arises. High scalability and fault tolerance, but poor observability and difficult convergence. Mesh (emerging): Agents maintain persistent connections to specific peers. Traceable topology and graceful degradation, but N-squared connection growth limits it to 3-8 agents. Hierarchical: Tree structure with strategy → tactics → execution levels.
88% of deploying enterprises report at least one security incident. 34% of agents affected by prompt injection. $4.7M average cost of an agent-related data breach. Production systems deploy layered defense-in-depth: infrastructure-level redaction and sandboxing, application-level input/output guardrails, tool-level access mediation, and human-in-the-loop for high-stakes actions. The EU AI Act's high-risk provisions become fully enforceable in August 2026.
AI agents are no longer pilots and proofs-of-concept. 57% of organizations deploy multi-step agent workflows. 81% plan to expand into more complex use cases in 2026. The median payback period is 8.3 months, with $340K annual cost savings per deployed agent and 171% average ROI. But 88% of agents never reach production — the gap between prototype and deployment remains the industry's biggest challenge.
Claude Code: 4% of all GitHub public commits today, projected 20%+ by end 2026. Surpassed 20M commits across 1M+ repositories. Hit $1B run-rate revenue in 6 months (faster than ChatGPT). 71% of AI agent users employ Claude Code. Devin: ARR grew from $1M to $150M+ after acquiring Windsurf. Goldman Sachs is piloting Devin alongside 12,000 human developers. ~85% of developers now regularly use AI coding tools. 91% of enterprises use them in production.
Klarna's AI assistant handled 2.3 million conversations in its first month — two-thirds of all customer service. Equivalent work of 700 full-time agents. 25% drop in repeat inquiries. Resolution time: 11 minutes → under 2 minutes. Estimated $40M profit improvement. Sierra reached $100M ARR faster than almost any enterprise AI company. Customer service is the #1 AI agent deployment category (26.5% of respondents). Leading agents achieve 60-80% automation.
Global AI agents market: $7.8B (2025) → $236B by 2034 (40%+ CAGR, 31x growth). Worldwide AI spending: $2.52 trillion in 2026 (44% YoY). Agentic AI spending alone: $201.9B (141% growth). 40% of enterprise apps will embed AI agents by end 2026, up from <5% in 2025 (Gartner). But average monthly LLM API cost is $8,400 per production agent, with actual total cost 3.4x higher than API-only estimates. Average first-year infrastructure investment: $280K.
Turn your ideas into an interactive knowledge map. Start for free.
Start FreeBrowse all mindspacesView pricing