2026 Comprehensive Review

Self-Improvement in Foundation Model Based
Agentic Systems

A Comprehensive Survey on Architectures, Strategies, Applications and Future Directions

Curated Paper List Preprint Paper
ABSTRACT

"Foundation models have revolutionized agentic systems, yet the transition from static prompt-following to autonomous self-improvement remains a critical frontier. This survey systematically categorizes the mechanisms of self-evolution into Foundation Model Improvement and Scaffolding Improvement. We analyze how agents leverage self-generated data, environment-interactive experiences, and recursive prompt optimization to transcend their initial constraints..."

Definition & Overview

Foundation Model Based Agentic Systems

Traditional Agents Diagram

A component-level view of an FM-based agent: model, prompt, tools, memory, and environment loop.

Self-Improvement Loop (Our Focus)

Self-Improvement Loop Diagram
Foundation Model Improvement Scaffolding Improvement

Comparison of traditional agentic workflows vs. self-improving paradigms analyzed in this survey.

Tip: Click on images to view in full size.

Taxonomy & Framework

Foundation Model Improvement

Category §1

  • Self-Generated Data

    Synthesize tasks and distill skills from rollouts.

  • Self-Generated Supervision

    Derive supervision from automated feedback and checks.

  • Self-Generated Experience

    Learn from interaction trajectories and delayed outcomes.

Scaffolding Improvement

Category §2

  • Prompt Optimization

    Refine prompts via feedback to boost reliability.

  • Memory Evolution

    Store, retrieve, and forget to reuse experience.

  • Tool Governance

    Select and audit tools for safe execution.

  • Full Scaffolding

    Rewrite scaffold policies under controlled constraints.

Detailed Methodologies

Click to expand each section for detailed explanations

Foundation Model Improvement

Foundation Model Improvement directly targets the agent's core parameters θFM, updating neural weights to internalize new knowledge, skills, and reasoning patterns. This paradigm enables lasting capability enhancement through gradient-based optimization methods.

Update Formalism

θt+1 = IMPROVEθt; St),   Σt+1 = Σt

The scaffolding remains fixed while model parameters evolve based on learning signal St

Foundation Model Improvement Illustration

Tip: Click on images to view in full size.

Three Learning Signal Categories

1
Self-Generated Data

St = Dt: Agent produces training instances

  • Methods: Automated task synthesis, self-training, self-distillation and filtering
  • Data Formats: Instruction-response pairs, reasoning traces, tool-use sequences, multimodal data
  • Applications: Instruction-following, mathematics, code generation, creative writing
2
Self-Generated Supervision

St = rt: Agent generates feedback signals

Outcome Verification

Executable tests and constraints as feedback

Self-Consistency

Consensus voting over diverse rollouts

Self-Refinement

Iterative critique-and-rewrite using self-feedback

3
Self-Generated Experience

St = τt: Learning from environmental interaction

Environment-Interaction

Direct engagement with real or simulated environments

Web navigation, Embodied control, Simulated environments

World Models

Learned simulators enable fast, safe exploration

Learned simulators, Imagined rollouts

Key Trade-off: FM improvement operates on longer time scales and incurs substantial computational cost, but leads to persistent, global changes in the agent's capabilities and generalization behavior.

Scaffolding Improvement

Scaffolding Improvement modifies the operational framework Σt while keeping foundation model parameters fixed. This "fast loop" enables rapid, reversible capability expansion through optimizing prompts, memory, and tools.

Update Formalism

Σt+1 = IMPROVEΣt; St),   θt+1 = θt

Model parameters remain frozen while scaffolding components evolve: Σt := (pt, mt, Tt)

Four Improvement Levels

Prompt Optimization

pt+1 = IMPROVEp(pt; St)

Prompt Optimization Illustration

Tip: Click on images to view in full size.

🔍 Black-Box

Scalar score optimization

🔄 Refinement

Qualitative feedback loops

🧬 Evolutionary

Population-based search

📐 TextGrad

Directional optimization

Memory Evolution

mt+1 = IMPROVEm(mt; St)

Agent Memory Illustration

Tip: Click on images to view in full size.

Memory Objects: Explicit (summaries, facts, entities) & Implicit (latent embeddings)
Memory Structures: Flat, Hierarchical, Graph-based, Vector retrieval (RAG)
CRUD Operations: Signal-driven Create, Read, Update, Delete with adaptive policies

Examples: AWM, MemoryBank, Mem0, G-Memory, ACE

Tool Governance

Tt+1 = IMPROVET(Tt; St)

Tool Governance Illustration

Tip: Click on images to view in full size.

Dynamic Routing

Efficient allocation & orchestration

Cost Risk Manual

Iterative Refinement

Debug & harden execution

Cost Risk Manual

Autonomous Creation

On-demand tool synthesis

Cost Risk Manual
Full Scaffolding

Σt+1 = IMPROVEΣt; St)

Full Scaffolding Illustration

Tip: Click on images to view in full size.

The most ambitious paradigm: Agents modify their own defining implementation and operational code, enabling recursive self-improvement.

GPTSwarm Darwin Gödel Machine AlphaEvolve Huxley-Gödel Machine ADAS Live-SWE-Agent

Key Advantage: Scaffolding improvement is computationally efficient, highly adaptive, and reversible—representing the "fast loop" that complements parameter-based "slow loop" learning.

Cite our Work

@article{si_agents_survey2026,
  title={Self-Improvement in Foundation Model Based Agentic Systems: A Survey},
  author={Name, Author and Others},
  journal={arXiv preprint arXiv:2602.XXXXX},
  year={2026}
}
Zhe Ren1 Jilin University
Yimeng Chen2* KAUST
Dandan Guo1,2* Jilin University · KAUST
Guowei Rong1 Jilin University
Tonghui Li1 Jilin University
R.B. Xiong3 Independent Researcher
Qingfeng Lan4 University of Alberta
Nanbo Li2 KAUST
Yibo Yang2 KAUST
Mingchen Zhuge2 KAUST
Jürgen Schmidhuber2,5 KAUST · IDSIA
Affiliations:
1 Jilin University · 2 King Abdullah University of Science and Technology (KAUST) · 3 Independent Researcher · 4 University of Alberta · 5 The Swiss AI Lab IDSIA/USI/SUPSI

* Corresponding authors

guodandan@jlu.edu.cn, {renzhe25, ronggw25, lith}@mails.jlu.edu.cn
{yimeng.chen, nanbo.li, yibo.yang, mingchen.zhuge, juergen.schmidhuber}@kaust.edu.sa
qlan3@ualberta.ca, rbxiong1@outlook.com