The Browser Toolkit provides a powerful set of tools to automate and interact with web browsers. It allows CAMEL agents to perform complex web-based tasks, from simple page navigation to intricate form submissions and data extraction.

Two-Agent System

Uses a sophisticated two-agent system: a planning_agent to create and refine high-level plans, and a web_agent to observe the screen and execute low-level actions.

Visual Reasoning

The web_agent can analyze “Set-of-Marks” (SoM) screenshots, which are visual representations of the page with interactive elements highlighted, enabling it to perform complex visual reasoning.

Persistent Sessions

Supports persistent browser sessions by saving and loading cookies and user data, allowing the agent to stay logged into websites across multiple sessions.

Video Analysis

Can analyze videos on the current page (e.g., YouTube) to answer questions about their content, leveraging the VideoAnalysisToolkit.

Initialization

To get started, initialize the BrowserToolkit. You can configure the underlying models for the planning and web agents.
from camel.toolkits import BrowserToolkit

# Initialize with default models
browser_toolkit = BrowserToolkit()

Core Functionality: browse_url

The main entry point for the toolkit is the browse_url function. It takes a high-level task and a starting URL, and then autonomously navigates the web to complete the task.

Example: Researching a Topic

task_prompt = "Find the main contributions of the paper 'Sparks of AGI' by Microsoft Research."
start_url = "https://www.google.com"

# The agent will navigate from Google, find the paper, and extract the information.
result = browser_toolkit.browse_url(
    task_prompt=task_prompt,
    start_url=start_url,
)

print(result)

How It Works: The Two-Agent System

The browse_url function orchestrates a loop between the planning_agent and the web_agent.
1

Planning

The planning_agent creates a high-level plan to accomplish the task.
2

Observation

The web_agent observes the current page by taking a “Set-of-Marks” (SoM) screenshot.
3

Action

Based on the observation and the plan, the web_agent decides on the next action to take (e.g., click, type, scroll).
4

Execution

The toolkit executes the action and the loop repeats.
5

Replanning

If the web_agent gets stuck, the planning_agent can re-evaluate the situation and create a new plan.

Advanced Usage

Persistent Sessions

You can maintain login sessions across runs by providing a path to a cookies.json file or a user_data_dir.

Using a Persistent Session

# The toolkit will save cookies and local storage to this file
cookie_path = "./my_browser_session.json"

# First run: Log in to a website
# browser_toolkit = BrowserToolkit(cookie_json_path=cookie_path)
# browser_toolkit.browse_url(task_prompt="Log in to my account...", start_url="...")

# Subsequent runs: The agent will be logged in automatically
browser_toolkit_loggedin = BrowserToolkit(cookie_json_path=cookie_path)

Video Analysis

The toolkit can answer questions about videos on a webpage.

Asking a Question About a YouTube Video

# First, navigate to the video
browser_toolkit.browse_url(task_prompt="Navigate to a specific YouTube video", start_url="...")

# Then, ask a question about it
# Note: This is an example of how you might use the underlying BaseBrowser.
# The browse_url function would orchestrate this automatically.
question = "What is the main topic of this video?"
answer = browser_toolkit.browser.ask_question_about_video(question=question)

print(answer)