28
References · the papers behind the manual
Selected references and further reading.
Every citation in this manual links to its entry below. This list is curated, not exhaustive: it covers the work that most directly shaped the recommendations in each chapter. Where a paper appeared at a conference, the conference and year are given; arXiv preprints are noted as such with their identifier. Open access links are provided where available.
Inline citation format used in chapters: Author et al., Venue Year. Click any citation to jump to the matching entry on this page.
Foundational agent papers
- (2023). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 2023. arXiv:2210.03629
- (2023). Reflexion: Language Agents with Verbal Reinforcement Learning. NeurIPS 2023. arXiv:2303.11366
- (2023). Toolformer: Language Models Can Teach Themselves to Use Tools. NeurIPS 2023. arXiv:2302.04761
- (2023). Self-Refine: Iterative Refinement with Self-Feedback. NeurIPS 2023. arXiv:2303.17651
- (2023). ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs. arXiv preprint. arXiv:2307.16789
Multi-agent frameworks & orchestration
- (2024). MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework. ICLR 2024. arXiv:2308.00352
- (2023). AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. arXiv preprint. arXiv:2308.08155
- (2024). Improving Factuality and Reasoning in Language Models through Multiagent Debate. ICML 2024. arXiv:2305.14325
- (2025). Breaking Mental Set to Improve Reasoning through Diverse Multi-Agent Debate. ICLR 2025. ICLR 2025 slides
- (2025). MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems. arXiv preprint. arXiv:2503.03686
- (2025). Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation. arXiv preprint. arXiv:2506.09046
- (2025). Federation of Agents: A Semantics-Aware Communication Fabric for Large-Scale Agentic AI. arXiv preprint. arXiv:2509.20175
- (2025). An Outlook on the Opportunities and Challenges of Multi-Agent AI Systems. arXiv preprint. arXiv:2505.18397
- (2025). MAR: Multi-Agent Reflexion Improves Reasoning Abilities in LLMs. arXiv preprint. arXiv:2512.20845
- (2026). Benefits and Limitations of Communication in Multi-Agent Reasoning. ICLR 2026. Mila announcement
- (2026). Agents Arena: 103,000 battles across 31 scenarios in finance, healthcare, legal, and cybersecurity. ICLR 2026 Agents in the Wild Workshop. Lambda blog
- (2024). Multi-Agent Large Language Models for Conversational Task-Solving. arXiv preprint. arXiv:2410.22932
Protocols & interoperability
- (2025). A Survey of Agent Interoperability Protocols: Model Context Protocol (MCP), Agent Communication Protocol (ACP), Agent-to-Agent Protocol (A2A), and Agent Network Protocol (ANP). arXiv preprint. arXiv:2505.02279
- (2025). Survey of LLM Agent Communication with MCP: A Software Design Pattern Centric Review. arXiv preprint. arXiv:2506.05364
- (2025). Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions. arXiv preprint. arXiv:2503.23278
- (2026). Model Context Protocol (MCP) Tool Descriptions Are Smelly! Towards Improving AI Agent Efficiency with Augmented MCP Tool Descriptions. arXiv preprint. arXiv:2602.14878
- (2026). Enhancing Model Context Protocol (MCP) with Context-Aware Server Collaboration. arXiv preprint. arXiv:2601.11595
- (Dec 2025). Donation of Model Context Protocol to AAIF. News announcement. Wikipedia summary
Memory & long-context reasoning
- (2025). Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory. arXiv preprint. arXiv:2504.19413
- (2025). Zep: A Temporal Knowledge Graph Architecture for Agent Memory. arXiv preprint. arXiv:2501.13956
- (2025). MIRIX: Multi-Agent Memory System for LLM-Based Agents. arXiv preprint. arXiv:2507.07957
- (2025). LiCoMemory: Lightweight and Cognitive Agentic Memory for Efficient Long-Term Reasoning. arXiv preprint. arXiv:2511.01448
- (2026). MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents. arXiv preprint. arXiv:2604.04853
- (2026). Learning to Remember: End-to-End Training of Memory Agents for Long-Context Reasoning. arXiv preprint. arXiv:2602.18493
- (2025). Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents. arXiv preprint. arXiv:2509.23040
- (2025). COSMIR: Chain Orchestrated Structured Memory for Iterative Reasoning over Long Context. arXiv preprint. arXiv:2510.04568
- (2026). MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning. arXiv preprint. arXiv:2601.21468
- (2026). QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management. arXiv preprint. arXiv:2512.12967
- (2025). Context Rot: How Increasing Input Tokens Impacts LLM Performance. Chroma research. Chroma research blog
- (2026). Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models. ICLR 2026. OpenReview · arXiv:2511.04108. Disclosure: co-authored by the manual's author (Singh) and reviewers (Srivastava, Dey).
- (2025). LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory. ICLR 2025. arXiv:2410.10813
Reflection & self-improvement
- (2026). Learn Like Humans: Use Meta-cognitive Reflection for Efficient Self-Improvement. arXiv preprint. arXiv:2601.11974
- (2025). SAMULE: Self-Learning Agents Enhanced by Multi-level Reflection. arXiv preprint. arXiv:2509.20562
- (2025). WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback. arXiv preprint. arXiv:2505.20013
- (2026). Agentic Critical Training. arXiv preprint. arXiv:2603.08706
- (2026). ParamMem: Augmenting Language Agents with Parametric Reflective Memory. arXiv preprint. arXiv:2602.23320
- (2026). Teaching Large Reasoning Models Effective Reflection. arXiv preprint. arXiv:2601.12720
- (2025). Meta-Policy Reflexion: Reusable Reflective Memory and Rule Admissibility for Resource-Efficient LLM Agent. arXiv preprint. arXiv:2509.03990
- (2025). ILR: Interactive Learning for LLM Reasoning. arXiv preprint. arXiv:2509.26306
Evaluation & benchmarks
- (2024). SWE-bench: Can Language Models Resolve Real-World GitHub Issues? ICLR 2024. arXiv:2310.06770
- (2024). OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments. NeurIPS 2024. arXiv:2404.07972
- (2023). GAIA: A Benchmark for General AI Assistants. ICLR 2024. arXiv:2311.12983
- (2024). TAU-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains. arXiv preprint. arXiv:2406.12045
- (April 2026). How We Broke Top AI Agent Benchmarks. RDI Blog. Berkeley RDI blog
- (2026). OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models. arXiv preprint. arXiv:2604.10866
- (2024). Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale. arXiv preprint. arXiv:2409.08264
- (2026). AI Agent Benchmarking Infrastructure on GPU Cloud: Run SWE-bench, GAIA, Terminal-Bench, and OSWorld at Scale. Spheron Blog. Spheron 2026 guide
- (2026). SWE-Context Bench: A Benchmark for Context Learning in Coding. arXiv preprint. arXiv:2602.08316
Security & prompt injection
- (2023). Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. AISec Workshop, CCS 2023. arXiv:2302.12173
- (2024). Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems. arXiv preprint. arXiv:2410.07283
- (2025). Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks? arXiv preprint. arXiv:2510.05244
- (2026). Your Agent is More Brittle Than You Think: Uncovering Indirect Injection Vulnerabilities in Agentic LLMs. arXiv preprint. arXiv:2604.03870
- (2026). Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review. MDPI Information. MDPI 17(1):54
- (2025). OWASP Top 10 for LLM Applications 2025. OWASP. owasp.org
- (2025). Indirect Prompt Injection: The Hidden Threat Breaking Modern AI Systems. Lakera blog. lakera.ai
- (2025). Indirect Prompt Injection Attacks: Hidden AI Risks. CrowdStrike blog. crowdstrike.com
- (April 2026). Indirect Prompt Injection is Taking Hold in the Wild. Help Net Security. helpnetsecurity.com
Agent security frontier (2025–26 attacks and standards)
- (September 2025). Cross-Agent Privilege Escalation: When Agents Free Each Other. Embrace The Red (security research blog); also disclosed as CVE-2025-53773 with co-discoverer Markus Vervier. embracethered.com
- (October 2025). When AI Agents Go Rogue: Agent Session Smuggling Attack in A2A Systems. Unit 42 threat research. unit42.paloaltonetworks.com
- (December 2025). OWASP Top 10 for Agentic Applications 2026. OWASP Foundation; ASI01-ASI10 risk categories with the introduction of "least agency" as a guiding principle. genai.owasp.org
- (March 2026). Control the Chain, Secure the System: Fixing AI Agent Delegation. CSA blog; introduces the four foundations: scope attenuation, token-level lineage verification, persistent context alignment, out-of-band human approval. cloudsecurityalliance.org
- (2026). Human Delegation Provenance Protocol (HDP). IETF Internet-Draft draft-helixar-hdp-agentic-delegation-00; companion paper at arXiv:2604.04522. Ed25519-signed append-only delegation chains with offline verification. datatracker.ietf.org · arXiv:2604.04522
- (March 2026). Agent Identity Protocol (AIP): Verifiable Delegation for AI Agent Systems. IETF Internet-Draft draft-prakash-aip-00; companion paper at arXiv:2603.24775. Introduces Invocation-Bound Capability Tokens (IBCTs) in JWT and Biscuit/Datalog flavors. ietf.org · arXiv:2603.24775
Computer-use agents (NeurIPS 2025)
- (Nov 2025). NeurIPS 2025: 45 Computer-Use Agent Papers You Should Know About. Cua Blog. cua.ai blog
- (2025). OSWorld-G and Jedi: Scaling GUI Grounding with 4 Million Synthetic Examples. NeurIPS 2025. NeurIPS 2025 poster
Surveys & landscape papers
- (2026). Agentic Artificial Intelligence (AI): Architectures, Taxonomies, and Evaluation of Large Language Model Agents. arXiv preprint. arXiv:2601.12560
- (2026). From Prompt-Response to Goal-Directed Systems: The Evolution of Agentic AI Software Architecture. arXiv preprint. arXiv:2602.10479
- (2025). Agentic AI Frameworks: Architectures, Protocols, and Design Challenges. IEEE arXiv preprint. arXiv:2508.10146
- (2026). From the logic of coordination to goal-directed reasoning: the agentic turn in artificial intelligence. Frontiers in AI. Frontiers DOI
Guidance, rewards & preference learning
- (2025–2026). Reward Shaping to Mitigate Reward Hacking in RLHF. arXiv preprint, Feb 2025, latest revision Jan 2026. arXiv:2502.18770
- (2025–2026). Multi-Agent Collaborative Reward Design for Enhancing Reasoning in Reinforcement Learning (CRM). arXiv preprint, Nov 2025, last revised Jan 2026. arXiv:2511.16202
Future directions: self-improvement, world models, embodied agents
- (2026). Self-Improving AI Agents: The 2026 Guide. Industry overview, March 2026. o-mega.ai
- (2025). SIMA 2: A Generalist Embodied Agent for Virtual Worlds. arXiv preprint, December 2025. arXiv:2512.04797
- (2025). EvoAgent: Self-evolving Agent with Continual World Model for Long-Horizon Tasks. arXiv preprint, February 2025. arXiv:2502.05907
- (2025). Embodied AI Agents: Modeling the World. arXiv preprint, June 2025. arXiv:2506.22355
Trust, privileges & reputation in multi-agent systems
- (2025). Inter-Agent Trust Models: A Comparative Study of Brief, Claim, Proof, Stake, Reputation and Constraint in Agentic Web Protocol Design: A2A, AP2, ERC-8004, and Beyond. arXiv preprint, November 2025. arXiv:2511.03434
- (2025). DRF: LLM-AGENT Dynamic Reputation Filtering Framework. arXiv preprint, September 2025. arXiv:2509.05764
- (2026). The Provenance Paradox in Multi-Agent LLM Routing: Delegation Contracts and Attested Identity in LDP. arXiv preprint, March 2026. arXiv:2603.18043
- (2025). AgentBound Tokens (ABTs): AI-Governed Agent Architecture for Web-Trustworthy Tokenization of Alternative Assets. arXiv preprint, June 2025. arXiv:2507.00096
- (2026). CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents. arXiv preprint, January 2026. arXiv:2601.09923
Beyond software: medical and scientific frontiers
- (2025). AI-Powered early warning systems for clinical deterioration significantly improve patient outcomes: a meta-analysis. BMC Medical Informatics and Decision Making, 25:203. doi:10.1186/s12911-025-03048-x
- (2025). Early Warning Model for Patient Deterioration: A Machine Learning Approach for Nurse-Led Monitoring. medRxiv preprint, June 2025. medRxiv:2025.06.20.25329978
- (2025). AIME-ICU: Exploring the Use of AI-Assisted Video Monitoring to Predict Accidental Events in ICU Patients. ClinicalTrials.gov NCT07307521 (observational, 300 patients). NCT07307521
- (2025). Artificial intelligence applications in intensive care unit nursing: A narrative review (2020-2025). Open access journal, 2025. PMC12701216
- (2025). Machine Learning and Artificial Intelligence in Intensive Care Medicine: Critical Recalibrations from Rule-Based Systems to Frontier Models. Journal of Clinical Medicine, June 2025. MDPI 14:4026
- (2025). Brain Harmony: A unified 1D token representation integrating structural and functional MRI for foundation-model neuroimaging. Preprint, September 2025 (described in the Foundation Models for Neuroimaging survey). survey overview
- (2025). Prima: Health-system-scale neuroimaging diagnosis with explainable, fair, generalizable reasoning. Preprint, September 2025 (mean diagnostic AUROC 0.92). survey overview
- (2025). Unsupervised anomaly detection in brain MRI via VAE-, GAN-, and diffusion-model reconstruction. Preprint, October 2025. related work in the same family
- (2024). Brain decoding: toward real-time reconstruction of visual perception. Meta FAIR / ENS PSL University, ICLR 2024 spotlight. arXiv:2310.19812
- (2025). fMRI Brain Decoding and Its Applications in Brain–Computer Interface: A Survey. PMC8869956. PMC8869956
- (2025). Autonomous 'self-driving' laboratories: a review of technology and policy implications. Royal Society Open Science 12(7):250646, July 2025. doi:10.1098/rsos.250646
- (2024). Self-driving laboratories to autonomously navigate the protein fitness landscape (SAMPLE platform). Nature Chemical Engineering / bioRxiv, 2023-2024. bioRxiv preprint
- (2025). AI, agentic models and lab automation for scientific discovery: the beginning of scAInce. Frontiers in Artificial Intelligence, August 2025. doi:10.3389/frai.2025.1649155
- (2026). Inside the 'self-driving' lab revolution (covers Periodic Labs, founded 2025 by Liam Fedus and Ekin Dogus Cubuk). Nature, March 2026. Nature feature
Citation discipline grows on you. The first time you find a paper you cited two months ago and can't quite remember why, you understand. Hyperlinked references from the start are worth the small upfront effort.