Why AI Agents Need Their Own Risk Management Framework

Riccardo Fina

Riccardo Fina holds a Master’s degree in Artificial Intelligence for Science and Technology from the University of Milan, University of Milan-Bicocca, and University of Pavia. He has research experience at the Technical University of Munich (TUM) and the Institute for Ethics in Artificial Intelligence (IEAI), focusing on AI ethics and safety.

Artificial intelligence is no longer just a tool that responds to prompts. Over the past two years, a fundamental shift has taken place in how AI systems are designed and deployed. The emergence of AI agents, systems that can act autonomously, interact with complex environments, and pursue objectives over extended time horizons, represents a departure from the static, input output paradigm that dominated the early wave of large language model adoption. By early 2025, OpenAI’s Chief Product Officer had already declared it “the year of AI agents,” and industry data confirms this is not mere rhetoric: a growing number of organizations are already scaling agentic AI systems within their operations, while many others are actively experimenting with them. (McKinsey & Company, The State of AI in 2025). The technology is moving fast. The question is whether our governance and risk management tools are keeping pace. The short answer, at present, is that they are not.

To understand why existing frameworks fall short, it helps to think about what makes an AI agent fundamentally different from a conventional AI system. A large language model, on its own, generates text in response to a query. While it may retain limited contextual memory within a session, it lacks the capacity to act on the world independently or to pursue sustained objectives across time. An AI agent, by contrast, is built around that same language model but is augmented with planning capabilities, persistent memory, and the ability to use external tools such as web search, code execution, or API calls. These additional layers transform a passive text generator into a system that can decompose complex tasks, execute multi step plans, and adapt its behavior over time.

This transformation can be understood through four core dimensions of agency. The first is autonomy, meaning the extent to which the agent operates without human involvement, ranging from systems that merely assist a user to those that act independently with minimal oversight. The second is efficacy, which captures how much real world impact the agent can have, depending both on its own capabilities and on the environment in which it operates. The third is goal directedness, reflecting the complexity of the objectives the agent can formulate, pursue, and revise. And the fourth is generality, describing how broadly the agent can apply its skills across different tasks and contexts. Together, these four dimensions define what it means for an AI system to be agentic, and they explain why the risks associated with such systems are qualitatively different from those of traditional models.

Current regulatory instruments were not designed with these 4 properties in mind. The EU AI Act, the most comprehensive AI regulation to date, does not explicitly mention AI agents. It addresses them only indirectly, through broader categories such as “AI systems” and “general purpose AI.” While these categories are broad enough to encompass agentic systems in principle, they do not provide the analytical tools needed to assess the specific risks that autonomy, sustained environmental interaction, and open ended goal pursuit introduce. The Act’s risk management provisions, particularly those under Article 9, are structured around three assumptions: a defined intended purpose, a bounded set of foreseeable misuses, and a tractable distribution of potential harms. For a general purpose agent whose value lies precisely in doing whatever the user needs, all three assumptions break down. Independent risk management frameworks, such as those developed by SaferAI and Concordia AI, offer more operational guidance but remain anchored to model level properties rather than to the behavioral characteristics that define agency. They assess what a model can do, not how an agent behaves over time. The harm from an agentic system rarely comes from a single output; it emerges from the trajectory of behavior across multiple steps, interactions, and decisions.

This gap motivated the development of the AI Agent Risk Management Framework, or AARMF, a structured approach designed specifically for developers of LLM based AI agents. The framework is organized around the four dimensions of agency described above, which serve as analytical pillars for risk assessment. Rather than evaluating risk based solely on the underlying model or the declared use case, the AARMF requires model developers to map identified risks onto these pillars and to assess how each risk affects the agent’s autonomy, efficacy, goal directedness, and generality. The framework is designed so that risks are not assessed in isolation or at a single point in time, but are traced across the entire arc of the agent’s development and operation, from early design choices that may structurally enable certain vulnerabilities, through development and testing, all the way to post deployment monitoring where emergent behaviors and feedback loops may alter the risk profile in ways that were not initially anticipated. Each risk is evaluated against the four pillars, and mitigation strategies are applied proportionally, ensuring that the overall risk profile remains within acceptable bounds throughout the agent’s lifecycle. The framework operates through four sequential phases: Establishment, where the agent’s architecture and intended use are documented; Operationalization, where risks are identified, analyzed, and mitigated; Revision, where assessments are updated in response to changes in capabilities or deployment conditions; and Governance, where decision authority, oversight mechanisms, and transparency obligations are defined. The goal is to bridge the gap between high level regulatory principles and the concrete decisions that developers face when building and deploying agentic systems.

Figure 1: Graphical representation of the AARMF

To illustrate how this works in practice, consider the case of generative agents, systems designed to simulate believable human behavior by autonomously planning actions, forming relationships, and recalling past experiences.

Generative agents interacting in a simulated environment (Park et al., 2023)

One concrete risk associated with these agents is parasocial attachment: users may develop one sided emotional relationships with the agent, gradually substituting real world social interactions with simulated ones. When assessed through the AARMF, this risk scores high on goal directedness, because the agent operates with open ended social objectives and dynamically generates sub goals to maintain engagement. It also scores high on autonomy, since the agent independently manages memory and initiates interactions without requiring user approval. The pre mitigation average across all four pillars places this risk in the high risk category. After targeted mitigation strategies are applied, including explicit disclosure of the agent’s computational nature, relationship depth caps, and value aligned boundary setting, the residual risk decreases to medium. This kind of structured, pillar by pillar assessment is precisely what existing frameworks do not provide.

Table 1: Parasocial Attachment – Risk assessment summary

The governance of AI agents must evolve at the same pace as the technology itself. Developers, regulators, and researchers need to move toward more structured, collaborative approaches to agentic risk, grounded in the properties that make these systems distinctive and in the concrete challenges they pose across their entire lifecycle.