Tone Dark
Tint
28 References · the papers behind the manual

Selected references and further reading.

Every citation in this manual links to its entry below. This list is curated, not exhaustive: it covers the work that most directly shaped the recommendations in each chapter. Where a paper appeared at a conference, the conference and year are given; arXiv preprints are noted as such with their identifier. Open access links are provided where available.

Inline citation format used in chapters: Author et al., Venue Year. Click any citation to jump to the matching entry on this page.

  1. Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 2023. arXiv:2210.03629
  2. Shinn, N., Cassano, F., Berman, E., Gopinath, A., Narasimhan, K., & Yao, S. (2023). Reflexion: Language Agents with Verbal Reinforcement Learning. NeurIPS 2023. arXiv:2303.11366
  3. Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., & Scialom, T. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools. NeurIPS 2023. arXiv:2302.04761
  4. Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., et al. (2023). Self-Refine: Iterative Refinement with Self-Feedback. NeurIPS 2023. arXiv:2303.17651
  5. Qin, Y., Liang, S., Ye, Y., Zhu, K., Yan, L., Lu, Y., Lin, Y., Cong, X., Tang, X., Qian, B., et al. (2023). ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs. arXiv preprint. arXiv:2307.16789
  1. Hong, S., Zhuge, M., Chen, J., Zheng, X., Cheng, Y., Zhang, C., et al. (2024). MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework. ICLR 2024. arXiv:2308.00352
  2. Wu, Q., Bansal, G., Zhang, J., Wu, Y., Zhang, S., Zhu, E., Li, B., Jiang, L., Zhang, X., & Wang, C. (2023). AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. arXiv preprint. arXiv:2308.08155
  3. Du, Y., Li, S., Torralba, A., Tenenbaum, J. B., & Mordatch, I. (2024). Improving Factuality and Reasoning in Language Models through Multiagent Debate. ICML 2024. arXiv:2305.14325
  4. Liu et al. (2025). Breaking Mental Set to Improve Reasoning through Diverse Multi-Agent Debate. ICLR 2025. ICLR 2025 slides
  5. Ye, R., Pang, S., Chai, Y., Chen, J., Yin, X., Zhang, Z., Lu, H., Liang, Y., Yan, Q., Wang, Y., Chen, S., & Shao, J. (2025). MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems. arXiv preprint. arXiv:2503.03686
  6. Lin, X., Wang, J., Tian, Q., Yu, Y., Yang, Y., Cao, S., et al. (2025). Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation. arXiv preprint. arXiv:2506.09046
  7. Federation of Agents authors (2025). Federation of Agents: A Semantics-Aware Communication Fabric for Large-Scale Agentic AI. arXiv preprint. arXiv:2509.20175
  8. Multi-agent outlook authors (2025). An Outlook on the Opportunities and Challenges of Multi-Agent AI Systems. arXiv preprint. arXiv:2505.18397
  9. Ozer, O., et al. (2025). MAR: Multi-Agent Reflexion Improves Reasoning Abilities in LLMs. arXiv preprint. arXiv:2512.20845
  10. Rizvi-Martel, M., Bhattamishra, S., Rathi, N., Rabusseau, G., & Hahn, M. (2026). Benefits and Limitations of Communication in Multi-Agent Reasoning. ICLR 2026. Mila announcement
  11. Lambda team (2026). Agents Arena: 103,000 battles across 31 scenarios in finance, healthcare, legal, and cybersecurity. ICLR 2026 Agents in the Wild Workshop. Lambda blog
  12. Becker, J. (2024). Multi-Agent Large Language Models for Conversational Task-Solving. arXiv preprint. arXiv:2410.22932
  1. Yang, Y., Chai, H., Song, Y., Qi, S., Wen, M., Li, N., Liao, J., Hu, H., Lin, J., Liu, G., et al. (2025). A Survey of Agent Interoperability Protocols: Model Context Protocol (MCP), Agent Communication Protocol (ACP), Agent-to-Agent Protocol (A2A), and Agent Network Protocol (ANP). arXiv preprint. arXiv:2505.02279
  2. Singh, A., Ehtesham, A., Kumar, S., & Khoei, T. T. (2025). Survey of LLM Agent Communication with MCP: A Software Design Pattern Centric Review. arXiv preprint. arXiv:2506.05364
  3. Hou, X., Zhao, Y., Wang, S., & Wang, H. (2025). Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions. arXiv preprint. arXiv:2503.23278
  4. MCP tool smells authors (2026). Model Context Protocol (MCP) Tool Descriptions Are Smelly! Towards Improving AI Agent Efficiency with Augmented MCP Tool Descriptions. arXiv preprint. arXiv:2602.14878
  5. Context-Aware MCP authors (2026). Enhancing Model Context Protocol (MCP) with Context-Aware Server Collaboration. arXiv preprint. arXiv:2601.11595
  6. Anthropic / Linux Foundation Agentic AI Foundation (Dec 2025). Donation of Model Context Protocol to AAIF. News announcement. Wikipedia summary
  1. Chhikara, A., et al. (2025). Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory. arXiv preprint. arXiv:2504.19413
  2. Rasmussen, P., et al. (2025). Zep: A Temporal Knowledge Graph Architecture for Agent Memory. arXiv preprint. arXiv:2501.13956
  3. Wang, Y., & Chen, P. (2025). MIRIX: Multi-Agent Memory System for LLM-Based Agents. arXiv preprint. arXiv:2507.07957
  4. LiCoMemory authors (2025). LiCoMemory: Lightweight and Cognitive Agentic Memory for Efficient Long-Term Reasoning. arXiv preprint. arXiv:2511.01448
  5. MemMachine authors (2026). MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents. arXiv preprint. arXiv:2604.04853
  6. Memory Agents authors (2026). Learning to Remember: End-to-End Training of Memory Agents for Long-Context Reasoning. arXiv preprint. arXiv:2602.18493
  7. Yao, R., Wang, X., Zhang, A., et al. (2025). Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents. arXiv preprint. arXiv:2509.23040
  8. COSMIR authors (2025). COSMIR: Chain Orchestrated Structured Memory for Iterative Reasoning over Long Context. arXiv preprint. arXiv:2510.04568
  9. MemOCR authors (2026). MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning. arXiv preprint. arXiv:2601.21468
  10. QwenLong-L1.5 team (2026). QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management. arXiv preprint. arXiv:2512.12967
  11. Hong, X., et al. (2025). Context Rot: How Increasing Input Tokens Impacts LLM Performance. Chroma research. Chroma research blog
  12. Srivastava, S., Bidhan, J., Yan, H., Dey, A., Kansal, T., Kath, P., Mansouri, S., Marvania, M., Simhadri, V. S., & Singh, G. (2026). Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models. ICLR 2026. OpenReview · arXiv:2511.04108. Disclosure: co-authored by the manual's author (Singh) and reviewers (Srivastava, Dey).
  13. Wu, D., et al. (2025). LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory. ICLR 2025. arXiv:2410.10813
  1. MARS authors (2026). Learn Like Humans: Use Meta-cognitive Reflection for Efficient Self-Improvement. arXiv preprint. arXiv:2601.11974
  2. SAMULE authors (2025). SAMULE: Self-Learning Agents Enhanced by Multi-level Reflection. arXiv preprint. arXiv:2509.20562
  3. WebCoT authors (2025). WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback. arXiv preprint. arXiv:2505.20013
  4. Agentic Critical Training authors (2026). Agentic Critical Training. arXiv preprint. arXiv:2603.08706
  5. ParamMem authors (2026). ParamMem: Augmenting Language Agents with Parametric Reflective Memory. arXiv preprint. arXiv:2602.23320
  6. Teaching Reasoning Models authors (2026). Teaching Large Reasoning Models Effective Reflection. arXiv preprint. arXiv:2601.12720
  7. Wu, C., et al. (2025). Meta-Policy Reflexion: Reusable Reflective Memory and Rule Admissibility for Resource-Efficient LLM Agent. arXiv preprint. arXiv:2509.03990
  8. Qin, C., et al. (2025). ILR: Interactive Learning for LLM Reasoning. arXiv preprint. arXiv:2509.26306
  1. Jimenez, C. E., Yang, J., Wettig, A., Yao, S., Pei, K., Press, O., & Narasimhan, K. (2024). SWE-bench: Can Language Models Resolve Real-World GitHub Issues? ICLR 2024. arXiv:2310.06770
  2. Xie, T., Zhang, D., Chen, J., Li, X., Zhao, S., Cao, R., et al. (2024). OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments. NeurIPS 2024. arXiv:2404.07972
  3. Mialon, G., Fourrier, C., Swift, C., Wolf, T., LeCun, Y., & Scialom, T. (2023). GAIA: A Benchmark for General AI Assistants. ICLR 2024. arXiv:2311.12983
  4. Yao, S., Shinn, N., Razavi, P., & Narasimhan, K. (2024). TAU-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains. arXiv preprint. arXiv:2406.12045
  5. Berkeley RDI (April 2026). How We Broke Top AI Agent Benchmarks. RDI Blog. Berkeley RDI blog
  6. OccuBench authors (2026). OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models. arXiv preprint. arXiv:2604.10866
  7. Windows Agent Arena team (2024). Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale. arXiv preprint. arXiv:2409.08264
  8. Spheron team (2026). AI Agent Benchmarking Infrastructure on GPU Cloud: Run SWE-bench, GAIA, Terminal-Bench, and OSWorld at Scale. Spheron Blog. Spheron 2026 guide
  9. SWE-Context Bench authors (2026). SWE-Context Bench: A Benchmark for Context Learning in Coding. arXiv preprint. arXiv:2602.08316
  1. Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. AISec Workshop, CCS 2023. arXiv:2302.12173
  2. Lee, D. R., & Tiwari, M. (2024). Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems. arXiv preprint. arXiv:2410.07283
  3. Firewall benchmark authors (2025). Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks? arXiv preprint. arXiv:2510.05244
  4. Brittle agents authors (2026). Your Agent is More Brittle Than You Think: Uncovering Indirect Injection Vulnerabilities in Agentic LLMs. arXiv preprint. arXiv:2604.03870
  5. Prompt Injection Review authors (2026). Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review. MDPI Information. MDPI 17(1):54
  6. OWASP (2025). OWASP Top 10 for LLM Applications 2025. OWASP. owasp.org
  7. Lakera (2025). Indirect Prompt Injection: The Hidden Threat Breaking Modern AI Systems. Lakera blog. lakera.ai
  8. CrowdStrike (2025). Indirect Prompt Injection Attacks: Hidden AI Risks. CrowdStrike blog. crowdstrike.com
  9. Help Net Security / Google (April 2026). Indirect Prompt Injection is Taking Hold in the Wild. Help Net Security. helpnetsecurity.com
  1. Rehberger, J. (September 2025). Cross-Agent Privilege Escalation: When Agents Free Each Other. Embrace The Red (security research blog); also disclosed as CVE-2025-53773 with co-discoverer Markus Vervier. embracethered.com
  2. Palo Alto Networks Unit 42 (October 2025). When AI Agents Go Rogue: Agent Session Smuggling Attack in A2A Systems. Unit 42 threat research. unit42.paloaltonetworks.com
  3. OWASP GenAI Security Project (Sotiropoulos, J., Katz, K., Del Rosario, R. F., et al.) (December 2025). OWASP Top 10 for Agentic Applications 2026. OWASP Foundation; ASI01-ASI10 risk categories with the introduction of "least agency" as a guiding principle. genai.owasp.org
  4. Cloud Security Alliance (March 2026). Control the Chain, Secure the System: Fixing AI Agent Delegation. CSA blog; introduces the four foundations: scope attenuation, token-level lineage verification, persistent context alignment, out-of-band human approval. cloudsecurityalliance.org
  5. Helixar Labs (2026). Human Delegation Provenance Protocol (HDP). IETF Internet-Draft draft-helixar-hdp-agentic-delegation-00; companion paper at arXiv:2604.04522. Ed25519-signed append-only delegation chains with offline verification. datatracker.ietf.org · arXiv:2604.04522
  6. Prakash, et al. (March 2026). Agent Identity Protocol (AIP): Verifiable Delegation for AI Agent Systems. IETF Internet-Draft draft-prakash-aip-00; companion paper at arXiv:2603.24775. Introduces Invocation-Bound Capability Tokens (IBCTs) in JWT and Biscuit/Datalog flavors. ietf.org · arXiv:2603.24775
  1. Cua AI (Nov 2025). NeurIPS 2025: 45 Computer-Use Agent Papers You Should Know About. Cua Blog. cua.ai blog
  2. Jedi / OSWorld-G authors (2025). OSWorld-G and Jedi: Scaling GUI Grounding with 4 Million Synthetic Examples. NeurIPS 2025. NeurIPS 2025 poster
  1. Architectures / Taxonomies authors (2026). Agentic Artificial Intelligence (AI): Architectures, Taxonomies, and Evaluation of Large Language Model Agents. arXiv preprint. arXiv:2601.12560
  2. Prompt-Response to Goal-Directed authors (2026). From Prompt-Response to Goal-Directed Systems: The Evolution of Agentic AI Software Architecture. arXiv preprint. arXiv:2602.10479
  3. Agentic Frameworks authors (2025). Agentic AI Frameworks: Architectures, Protocols, and Design Challenges. IEEE arXiv preprint. arXiv:2508.10146
  4. Haidemariam, T. (2026). From the logic of coordination to goal-directed reasoning: the agentic turn in artificial intelligence. Frontiers in AI. Frontiers DOI
  1. Fu, J., Zhao, X., Yao, C., Wang, H., Han, Q., Xiao, Y. (2025–2026). Reward Shaping to Mitigate Reward Hacking in RLHF. arXiv preprint, Feb 2025, latest revision Jan 2026. arXiv:2502.18770
  2. Yang, P., Zhang, K., Wang, J., Chen, X., Tang, Y., Yang, E., Ai, L., Shi, B. (2025–2026). Multi-Agent Collaborative Reward Design for Enhancing Reasoning in Reinforcement Learning (CRM). arXiv preprint, Nov 2025, last revised Jan 2026. arXiv:2511.16202
  1. o-mega Research (2026). Self-Improving AI Agents: The 2026 Guide. Industry overview, March 2026. o-mega.ai
  2. Google DeepMind SIMA team (2025). SIMA 2: A Generalist Embodied Agent for Virtual Worlds. arXiv preprint, December 2025. arXiv:2512.04797
  3. Feng, T., Wang, X., Zhou, Z., Wang, R., Zhan, Y., Li, G., Li, Q., Zhu, W. (2025). EvoAgent: Self-evolving Agent with Continual World Model for Long-Horizon Tasks. arXiv preprint, February 2025. arXiv:2502.05907
  4. Embodied AI Survey authors (2025). Embodied AI Agents: Modeling the World. arXiv preprint, June 2025. arXiv:2506.22355
  1. Inter-Agent Trust authors (2025). Inter-Agent Trust Models: A Comparative Study of Brief, Claim, Proof, Stake, Reputation and Constraint in Agentic Web Protocol Design: A2A, AP2, ERC-8004, and Beyond. arXiv preprint, November 2025. arXiv:2511.03434
  2. Bouchiha, M. A., et al. (2025). DRF: LLM-AGENT Dynamic Reputation Filtering Framework. arXiv preprint, September 2025. arXiv:2509.05764
  3. Prakash, S. (2026). The Provenance Paradox in Multi-Agent LLM Routing: Delegation Contracts and Attested Identity in LDP. arXiv preprint, March 2026. arXiv:2603.18043
  4. Chaffer, T. J. (2025). AgentBound Tokens (ABTs): AI-Governed Agent Architecture for Web-Trustworthy Tokenization of Alternative Assets. arXiv preprint, June 2025. arXiv:2507.00096
  5. Foerster, M., Blanchard, N., et al. (2026). CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents. arXiv preprint, January 2026. arXiv:2601.09923
  1. Wang, J., et al. (2025). AI-Powered early warning systems for clinical deterioration significantly improve patient outcomes: a meta-analysis. BMC Medical Informatics and Decision Making, 25:203. doi:10.1186/s12911-025-03048-x
  2. Authors of medRxiv 2025.06.20.25329978 (2025). Early Warning Model for Patient Deterioration: A Machine Learning Approach for Nurse-Led Monitoring. medRxiv preprint, June 2025. medRxiv:2025.06.20.25329978
  3. Shanghai Zhongshan Hospital, Fudan University (2025). AIME-ICU: Exploring the Use of AI-Assisted Video Monitoring to Predict Accidental Events in ICU Patients. ClinicalTrials.gov NCT07307521 (observational, 300 patients). NCT07307521
  4. Authors of PMC12701216 (2025). Artificial intelligence applications in intensive care unit nursing: A narrative review (2020-2025). Open access journal, 2025. PMC12701216
  5. Authors of MDPI Clinical Medicine 14:4026 (2025). Machine Learning and Artificial Intelligence in Intensive Care Medicine: Critical Recalibrations from Rule-Based Systems to Frontier Models. Journal of Clinical Medicine, June 2025. MDPI 14:4026
  6. Dong, et al. (2025). Brain Harmony: A unified 1D token representation integrating structural and functional MRI for foundation-model neuroimaging. Preprint, September 2025 (described in the Foundation Models for Neuroimaging survey). survey overview
  7. Lyu, et al. (2025). Prima: Health-system-scale neuroimaging diagnosis with explainable, fair, generalizable reasoning. Preprint, September 2025 (mean diagnostic AUROC 0.92). survey overview
  8. Mahé, et al. (2025). Unsupervised anomaly detection in brain MRI via VAE-, GAN-, and diffusion-model reconstruction. Preprint, October 2025. related work in the same family
  9. Benchetrit, Y., Banville, H., & King, J.-R. (2024). Brain decoding: toward real-time reconstruction of visual perception. Meta FAIR / ENS PSL University, ICLR 2024 spotlight. arXiv:2310.19812
  10. Authors of PMC8869956 (2025). fMRI Brain Decoding and Its Applications in Brain–Computer Interface: A Survey. PMC8869956. PMC8869956
  11. Tobias, A. V., & Wahab, A. (2025). Autonomous 'self-driving' laboratories: a review of technology and policy implications. Royal Society Open Science 12(7):250646, July 2025. doi:10.1098/rsos.250646
  12. Rapp, J. T., Bremer, B. J., & Romero, P. A. (2024). Self-driving laboratories to autonomously navigate the protein fitness landscape (SAMPLE platform). Nature Chemical Engineering / bioRxiv, 2023-2024. bioRxiv preprint
  13. Hartung, T. (2025). AI, agentic models and lab automation for scientific discovery: the beginning of scAInce. Frontiers in Artificial Intelligence, August 2025. doi:10.3389/frai.2025.1649155
  14. Nature News (2026). Inside the 'self-driving' lab revolution (covers Periodic Labs, founded 2025 by Liam Fedus and Ekin Dogus Cubuk). Nature, March 2026. Nature feature
Citation discipline grows on you. The first time you find a paper you cited two months ago and can't quite remember why, you understand. Hyperlinked references from the start are worth the small upfront effort.