A Comprehensive Survey on Architectures, Strategies, Applications and Future Directions
"Foundation models have revolutionized agentic systems, yet the transition from static prompt-following to autonomous self-improvement remains a critical frontier. This survey systematically categorizes the mechanisms of self-evolution into Foundation Model Improvement and Scaffolding Improvement. We analyze how agents leverage self-generated data, environment-interactive experiences, and recursive prompt optimization to transcend their initial constraints..."
A component-level view of an FM-based agent: model, prompt, tools, memory, and environment loop.
Comparison of traditional agentic workflows vs. self-improving paradigms analyzed in this survey.
Tip: Click on images to view in full size.
Category §1
Self-Generated Data
Synthesize tasks and distill skills from rollouts.
Self-Generated Supervision
Derive supervision from automated feedback and checks.
Self-Generated Experience
Learn from interaction trajectories and delayed outcomes.
Category §2
Prompt Optimization
Refine prompts via feedback to boost reliability.
Memory Evolution
Store, retrieve, and forget to reuse experience.
Tool Governance
Select and audit tools for safe execution.
Full Scaffolding
Rewrite scaffold policies under controlled constraints.
Click to expand each section for detailed explanations
Foundation Model Improvement directly targets the agent's core parameters θFM, updating neural weights to internalize new knowledge, skills, and reasoning patterns. This paradigm enables lasting capability enhancement through gradient-based optimization methods.
The scaffolding remains fixed while model parameters evolve based on learning signal St
Tip: Click on images to view in full size.
St = Dt: Agent produces training instances
St = rt: Agent generates feedback signals
Outcome Verification
Executable tests and constraints as feedback
Self-Consistency
Consensus voting over diverse rollouts
Self-Refinement
Iterative critique-and-rewrite using self-feedback
St = τt: Learning from environmental interaction
Environment-Interaction
Direct engagement with real or simulated environments
Web navigation, Embodied control, Simulated environments
World Models
Learned simulators enable fast, safe exploration
Learned simulators, Imagined rollouts
Key Trade-off: FM improvement operates on longer time scales and incurs substantial computational cost, but leads to persistent, global changes in the agent's capabilities and generalization behavior.
Scaffolding Improvement modifies the operational framework Σt while keeping foundation model parameters fixed. This "fast loop" enables rapid, reversible capability expansion through optimizing prompts, memory, and tools.
Model parameters remain frozen while scaffolding components evolve: Σt := (pt, mt, Tt)
pt+1 = IMPROVEp(pt; St)
Tip: Click on images to view in full size.
🔍 Black-Box
Scalar score optimization
🔄 Refinement
Qualitative feedback loops
🧬 Evolutionary
Population-based search
📐 TextGrad
Directional optimization
mt+1 = IMPROVEm(mt; St)
Tip: Click on images to view in full size.
Examples: AWM, MemoryBank, Mem0, G-Memory, ACE
Tt+1 = IMPROVET(Tt; St)
Tip: Click on images to view in full size.
Dynamic Routing
Efficient allocation & orchestration
Iterative Refinement
Debug & harden execution
Autonomous Creation
On-demand tool synthesis
Σt+1 = IMPROVEΣ(Σt; St)
Tip: Click on images to view in full size.
The most ambitious paradigm: Agents modify their own defining implementation and operational code, enabling recursive self-improvement.
Key Advantage: Scaffolding improvement is computationally efficient, highly adaptive, and reversible—representing the "fast loop" that complements parameter-based "slow loop" learning.
@article{si_agents_survey2026,
title={Self-Improvement in Foundation Model Based Agentic Systems: A Survey},
author={Name, Author and Others},
journal={arXiv preprint arXiv:2602.XXXXX},
year={2026}
}
* Corresponding authors