SmolAgents: A Simple Yet Powerful AI Agent Framework

SmolAgents is an open-source Python library developed by Hugging Face for building and running powerful AI agents with minimal code. The library is designed to be lightweight, with its core logic comprising around 1,000 lines of code, ensuring clarity and minimal abstraction. It prioritizes simplicity and efficiency, enabling developers to create robust agents with ease. SmolAgents is the successor to transformers.agents and will eventually replace it as transformers.agents is deprecated.

Key Features and Design Goals of SmolAgents

Key Features:

  • Code Agents
  • Lightweight Framework
  • Secure Execution
  • Open Model Compatibility
  • Multi-Agent Systems Support
  • Multi-Step Agent Structure

Design Goals:

  • Simplicity: SmolAgents aims to make building agents as straightforward as possible, requiring minimal code and configuration to get started. This is evident in its lightweight codebase and intuitive API design.
  • Efficiency: By enabling code agents, SmolAgents reduces the number of LLM calls required to complete tasks, leading to faster and more cost-effective workflows.
  • Security: Safeguarding code execution is paramount. SmolAgents achieves this through sandboxed environments and controlled execution mechanisms, ensuring that agent actions don’t compromise the system’s integrity.
  • Flexibility: SmolAgents is designed to accommodate diverse workflows and use cases. Its support for various LLMs, tools, and execution environments allows developers to tailor agents to specific needs.

What exactly are SmolAgents? What are the advantages of using them?

AI Agents

AI agents are autonomous programs that can perform tasks on behalf of a user or system by leveraging external tools such as web search engines, coding utilities, or APIs. They are driven by LLMs that process user instructions and make decisions on the workflow. The level of agency – the degree of control the LLM has over the workflow – varies across different systems.

The table below (credit: Huggingface blog) illustrates how agency varies across systems:

Agency LevelDescriptionHow that’s calledExample Pattern
☆☆☆LLM output has no impact on program flowSimple processorprocess_llm_output(llm_response)
★☆☆LLM output determines basic control flowRouterif llm_decision(): path_a() else: path_b()
★★☆LLM output determines function executionTool callrun_function(llm_chosen_tool, llm_chosen_args)
★★★LLM output controls iteration and program continuationMulti-step Agentwhile llm_should_continue(): execute_next_step()
★★★One agentic workflow can start another agentic workflowMulti-Agentif llm_trigger(): execute_agent()

SmolAgents focuses on the ★★☆ and ★★★ levels of agency, where the LLM output determines function execution and controls iteration and program continuation. This level of agency allows for more complex and dynamic workflows, making SmolAgents ideal for building sophisticated AI agents.

JSON-based Actions

Before we dive into SmolAgents, let’s understand the traditional JSON-based actions for agents.
JSON-based actions refer to a method of structuring an AI agent’s actions using JSON format. In this approach, the agent (powered by a LLM) produces a JSON object that specifies the tool to be used and the arguments to be passed to that tool.

For example, if an agent needs to search the web for information, the JSON-based action might look like this:

{
  "tool": "search_web",
  "query": "What is the capital of France?"
}

This JSON object instructs the agent to utilise the search_web tool and execute a search with the query “What is the capital of France?”. The system then parses this JSON object to execute the desired action.

However, this approach has limitations compared to using code directly for agent actions:

  • Limited Expressiveness: JSON is primarily designed for data exchange and lacks the expressiveness of programming languages to represent complex actions and workflows.
  • Reduced Composability: It is difficult to create reusable components or nest actions within each other using JSON, hindering the development of sophisticated and modular agents.
  • Cumbersome Object Management: Handling and manipulating data objects, such as storing the output of an image generation tool, is less straightforward in JSON compared to using code.

SmolAgents: Code Agents

SmolAgents specializes in Code Agents – agents that generate and execute Python code snippets to complete tasks. This approach aligns better with LLMs’ training data and allows for more efficient, flexible, and powerful agent designs. This approach offers several advantages over traditional agents that rely on parsing JSON-based instructions:

  • Improved efficiency and accuracy: Code agents require fewer steps to complete tasks, reducing the number of LLM calls and leading to faster and more cost-effective workflows.
  • Enhanced performance: Code agents consistently outperform traditional methods on challenging benchmarks.
  • Better composability: Code allows for nesting actions, defining reusable functions, and managing complex workflows more effectively than JSON.
  • Streamlined object management: Code provides straightforward ways to store and manipulate outputs from various tools.
  • Greater generality: Code can express virtually any computer action, unlike JSON, which is limited to predefined structures.
  • Leveraging LLM training data: LLMs are already trained on vast amounts of code, making them inherently adept at understanding and generating code-based actions.

Building a SmolAgent

To create a SmolAgent, you need two key elements:

  • Tools: A set of functions that the agent can use to interact with the external world.
  • Model: An LLM that serves as the agent’s brain, interpreting instructions, making decisions, and generating code.

Defining Tools

SmolAgents allows you to define custom tools by using the @tool decorator on Python functions. These functions should have type hints for inputs and outputs and include docstrings that clearly describe their functionality. For example, the following code defines a tool to get travel durations from Google Maps:

from typing import Optional
from smolagents import CodeAgent, HfApiModel, tool

@tool
def get_travel_duration(start_location: str, destination_location: str, departure_time: Optional[int] = None) -> str:
  """Gets the travel time in car between two places.

  Args:
    start_location: the place from which you start your ride
    destination_location: the place of arrival
    departure_time: the departure time, provide only a `datetime.datetime` if you want to specify this
  """
  import googlemaps  # All imports are placed within the function, to allow for sharing to Hub.
  import os
  gmaps = googlemaps.Client(os.getenv("GMAPS_API_KEY"))
  if departure_time is None:
    from datetime import datetime
    departure_time = datetime(2025, 1, 6, 11, 0)
  # Obtain the travel direction and estimated travel duration
  directions_result = gmaps.directions(start_location, destination_location, mode="transit", departure_time=departure_time)
  return directions_result["legs"]["duration"]["text"]

Once defined, you can share your custom tools on the Hugging Face Hub for others to use.

Choosing a Model

SmolAgents supports a wide range of LLMs, including both open-source and proprietary models. You can use models hosted on the Hugging Face Hub via the HfApiModel class, which leverages Hugging Face’s free inference API. Alternatively, you can use the LiteLLMModel class to access a variety of cloud-based LLMs, including those from OpenAI, Anthropic, and others.

Running a SmolAgent

After defining your tools and selecting a model, you can instantiate a CodeAgent and run it with a task description. The agent will then generate and execute Python code to complete the task, using the provided tools. For instance, you could create a travel planner agent using the previously defined get_travel_duration tool:

agent = CodeAgent(tools=[get_travel_duration], model=HfApiModel(), additional_authorized_imports=["datetime"])
agent.run("Can you give me a nice one-day trip around Paris with a few locations and the times? Could be in the city or outside, but should fit in one day. I'm travelling only via public transportation.")

Security Considerations

SmolAgents prioritizes security when running code generated by LLMs. It offers two primary mechanisms for secure execution:

  • Local Python Interpreter: The CodeAgent executes code within a custom Python interpreter that enforces strict security measures. This interpreter controls imports, limits the number of operations, and restricts execution to predefined actions, preventing potentially harmful code from running.
  • E2B Code Executor: SmolAgents can integrate with E2B, a remote execution service that runs code in a sandboxed environment. This provides robust protection by isolating the code execution within a container, preventing any impact on the local environment.

Multi-Agent Systems (MAS)

For more complex tasks, you can use SmolAgents to build multi-agent systems (MAS), where multiple AI agents collaborate and communicate to achieve a common goal. Each agent in the system can have its own set of tools and expertise, allowing for specialization and efficient task decomposition.

Key Features of MAS

  • Autonomy: Each agent in a MAS operates independently without the need for constant human intervention.
  • Decentralization: There isn’t a single control point in a MAS. Instead, agents make decisions locally while still collaborating towards a global goal.
  • Collaboration: Agents interact and communicate, sharing information and coordinating actions to achieve the collective objective. This often involves dividing tasks into smaller subtasks that agents with specific skills can handle.
  • Adaptability: Agents within a MAS can modify their strategies in response to changes in the environment or actions taken by other agents. This dynamic adaptation makes MAS robust and flexible in handling complex, real-world situations.

SmolAgents facilitates MAS development through the ManagedAgent class. This class allows you to encapsulate individual agents and embed them within a manager agent’s system prompt, enabling the manager agent to call upon the specialized agents as needed. For example, you could have one agent for web search, another for image generation, and a manager agent that coordinates their actions to complete a complex task.

Multi-Step Agent Structure

Multi-step agents in SmolAgents are designed to handle tasks that require multiple actions and iterations. They operate on a loop, executing actions and updating their knowledge based on the results until a satisfactory solution is reached. Here’s a breakdown of their structure:

  • Initialization: The agent is initialized with a task description, a set of tools, and an LLM model.
  • Memory: The agent maintains a memory that stores the user’s initial request, the agent’s thoughts, the code it has executed, and the observations from each step. This memory provides context for the agent’s decisions in subsequent steps.
  • Loop: The agent operates within a loop that continues until the LLM determines that the task is completed.
  • Thought Generation: At each step, the LLM analyzes the current state of the memory and formulates a thought about how to proceed. This thought typically includes the reasoning behind the next action and the tools the agent plans to use.
  • Code Generation: Based on the thought, the LLM generates a code snippet in Python to execute the desired action. This code snippet may involve calling one or more tools available to the agent.
  • Action Execution: The generated code snippet is executed, and the results (observations) are stored in the agent’s memory.
  • Memory Update: The agent updates its memory with the thought, code, and observations from the current step. This updated memory serves as the input for the next iteration of the loop.

This cyclical process of thought, code generation, action execution, and memory update continues until the LLM determines, based on the information in its memory, that the task has been successfully completed.

Let’s illustrate this with a simple example. Suppose you task a multi-step agent with finding the temperature difference between New York and San Francisco. The agent might follow these steps:

  1. Initialization: The agent is initialized with the task description ( Find the temperature difference between New York and San Francisco.), tools for retrieving weather information (WeatherTool), and an LLM model.
  2. Thought: “I need to find the temperatures for New York and San Francisco. I can use the WeatherTool to get this information.”
  3. Code:
    python print(WeatherTool().run("New York"))
    print(WeatherTool().run("San Francisco"))
  4. Observation: The WeatherTool returns “New York: 25°C” and “San Francisco: 15°C.”
  5. Memory Update: The memory now contains the initial request, the thought, the code executed, and the temperatures for both cities.
  6. Thought: “I have the temperatures. Now I need to calculate the difference.”
  7. Code:
    new_york_temp = 25
    san_francisco_temp = 15
    temperature_difference = new_york_temp - san_francisco_temp
    print(f"The temperature difference is {temperature_difference}°C.")
  8. Observation: The calculation yields a temperature difference of 10°C.
  9. Memory Update: The final observation is added to the memory.

The agent can now provide the answer: “The temperature difference is 10°C.”

This example demonstrates how a multi-step agent uses its memory, iterates through a loop, and executes actions to solve a problem that requires more than a single step.

Closing Thoughts

SmolAgents is a powerful and versatile library for building AI agents, offering a unique blend of simplicity, flexibility, and security. Its focus on code generation, secure execution, and support for multi-agent systems makes it a valuable tool for a wide range of applications. Whether you’re developing a single-agent system or a complex multi-agent setup, SmolAgents provides the tools and framework to bring your AI agent ideas to life.

Resources

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top