The Browser Toolkit provides a powerful set of tools to automate and interact with web browsers. It allows CAMEL agents to perform complex web-based tasks, from simple page navigation to intricate form submissions and data extraction.
Two-Agent System
Uses a sophisticated two-agent system: a
planning_agent
to create and refine high-level plans, and a web_agent
to observe the screen and execute low-level actions.Visual Reasoning
The
web_agent
can analyze “Set-of-Marks” (SoM) screenshots, which are visual representations of the page with interactive elements highlighted, enabling it to perform complex visual reasoning.Persistent Sessions
Supports persistent browser sessions by saving and loading cookies and user data, allowing the agent to stay logged into websites across multiple sessions.
Video Analysis
Can analyze videos on the current page (e.g., YouTube) to answer questions about their content, leveraging the
VideoAnalysisToolkit
.Initialization
To get started, initialize theBrowserToolkit
. You can configure the underlying models for the planning and web agents.
- Default
- Custom Models
Core Functionality: browse_url
The main entry point for the toolkit is the browse_url
function. It takes a high-level task and a starting URL, and then autonomously navigates the web to complete the task.
Example: Researching a Topic
How It Works: The Two-Agent System
Thebrowse_url
function orchestrates a loop between the planning_agent
and the web_agent
.
1
Planning
The
planning_agent
creates a high-level plan to accomplish the task.2
Observation
The
web_agent
observes the current page by taking a “Set-of-Marks” (SoM) screenshot.3
Action
Based on the observation and the plan, the
web_agent
decides on the next action to take (e.g., click, type, scroll).4
Execution
The toolkit executes the action and the loop repeats.
5
Replanning
If the
web_agent
gets stuck, the planning_agent
can re-evaluate the situation and create a new plan.Advanced Usage
Persistent Sessions
You can maintain login sessions across runs by providing a path to acookies.json
file or a user_data_dir
.