The HybridBrowserToolkit provides a powerful set of browser automation tools for CAMEL agents. It enables web navigation, form interaction, screenshot capture, and data extraction through a unified interface with TypeScript (WebSocket-based) and Python implementations.
Dual Implementation
Choose between TypeScript (WebSocket-based, recommended) or pure Python (Playwright) implementations based on your needs.
Set-of-Marks (SoM)
Capture annotated screenshots with interactive elements highlighted and numbered, enabling visual reasoning for AI agents.
Persistent Sessions
Maintain browser sessions with
user_data_dir, keeping login states and cookies across multiple runs.CDP Connection
Connect to existing Chrome instances via Chrome DevTools Protocol (CDP) for debugging or reusing browser sessions.
- Toolkit:
camel/toolkits/hybrid_browser_toolkit/ - Example:
examples/toolkits/hybrid_browser_toolkit_example.py
Installation
The HybridBrowserToolkit requires Node.js for the TypeScript implementation (recommended) or Playwright for Python mode.- TypeScript Mode (Recommended)
- Python Mode
Quick Start
Basic Usage
Initialization
TheHybridBrowserToolkit supports extensive configuration options.
- Basic
- With Persistence
- Python Mode
- CDP Connection
Configuration Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
mode | "typescript" | "python" | "typescript" | Implementation mode |
headless | bool | True | Run browser without visible window |
user_data_dir | str | None | Directory for persistent browser data |
stealth | bool | False | Enable stealth mode to avoid bot detection |
cache_dir | str | None | Directory for caching |
enabled_tools | List[str] | DEFAULT_TOOLS | List of enabled tool methods |
browser_log_to_file | bool | False | Log browser actions to file |
log_dir | str | "browser_log" | Directory for log files |
session_id | str | None | Session identifier for logging |
viewport_limit | bool | False | Filter snapshot to visible viewport only |
full_visual_mode | bool | False | Return minimal snapshots, rely on screenshots |
Timeout Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
default_timeout | int | None | Default timeout in milliseconds |
navigation_timeout | int | None | Page navigation timeout |
network_idle_timeout | int | None | Wait for network idle |
screenshot_timeout | int | None | Screenshot capture timeout |
page_stability_timeout | int | None | Wait for page stability |
Available Tools
Default Tools
The default tool set provides essential browser functionality:All Available Tools
Useenabled_tools=HybridBrowserToolkit.ALL_TOOLS for full functionality:
Custom Tool Selection
Core Tool Methods
Navigation
browser_visit_page
Navigate to a URL and get the page snapshot.
browser_back / browser_forward
Navigate through browser history.
Interaction
browser_click
Click on an element by ref ID (from SoM screenshot) or pixel coordinates.
browser_type
Type text into an input field.
browser_scroll
Scroll the page in any direction.
Page Observation
browser_get_som_screenshot
Capture a screenshot with Set-of-Marks annotations. Each interactive element is labeled with a ref ID (e.g.,
e1, e2).browser_get_page_snapshot
Get the page structure as text, showing all interactive elements with their ref IDs.
Tab Management
Tab Operations
Console Operations
JavaScript Execution
Advanced Usage
Full Visual Mode
Full Visual Mode is designed for vision-capable models that can reason directly from screenshots using pixel coordinates. When enabled, several key behaviors change:Automatic Tool Signature Switching
Automatic Tool Signature Switching
Tools that normally use
Tools that require
ref IDs automatically switch to pixel-based parameters. The docstrings are also updated accordingly - you will only see the pixel-based signatures, not both versions simultaneously.| Tool | Standard Mode | Full Visual Mode |
|---|---|---|
browser_click | click(ref="e15") | click(x=350, y=200) |
browser_type | type(ref="e8", text="...") | type(x=350, y=200, text="...") |
browser_mouse_drag | drag(from_ref="e1", to_ref="e2") | drag(from_x=100, from_y=100, to_x=300, to_y=200) |
ref with no pixel alternative (browser_select, browser_get_page_snapshot, browser_get_som_screenshot) are automatically excluded from the tool list.Screenshot with Pixel Rulers
Screenshot with Pixel Rulers
browser_get_screenshot returns screenshots with pixel rulers added to the top and left edges. This helps vision models accurately identify pixel coordinates for click and type operations.The rulers show:- Major tick marks every 100 pixels with numeric labels
- Medium tick marks every 50 and 10 pixels
- Minor tick marks every 5 pixels
Ineffective Click Detection
Ineffective Click Detection
When a click does not change the page content (snapshot remains the same), the toolkit detects this as a potentially ineffective click and returns helpful feedback including the 5 nearest interactive elements with their clickable coordinates.Example response:This helps the model correct its click position without needing another screenshot.
Diff Snapshot for Dropdowns and Autocomplete
When interacting with combobox (dropdown) or textbox (input/textarea) elements, the toolkit intelligently returns a diff snapshot instead of the full page snapshot. This optimization is particularly useful for:- Dropdown menus that expand with options
- Autocomplete/typeahead suggestions
- Search result suggestions
How Diff Snapshot Works
Trigger elements:This significantly reduces context size compared to returning the entire page snapshot, helping the model focus on the relevant options.
combobox- dropdown select elementstextbox,input,textarea- text input fields
- Only new
optionandmenuitemelements that appeared after the interaction - For combobox: includes the combobox’s updated state (since its ref may change after expansion)
Viewport Limiting
Reduce context size by only including elements visible in the current viewport:Action Logging
Enable detailed logging for debugging or replay:Spreadsheet Operations
The toolkit includes specialized tools for interacting with web-based spreadsheets (Google Sheets, Excel Online):Integration with ChatAgent
Complete Example
Mode Comparison
| Feature | TypeScript Mode | Python Mode |
|---|---|---|
| Performance | Faster (WebSocket) | Standard |
| CDP Connection | Supported | Not supported |
| Viewport Limit | Supported | Not supported |
| Full Visual Mode | Supported | Supported |
| Dependencies | Node.js (auto-installed) | Playwright |
| Recommended For | Production use | Simple tasks |
Troubleshooting
Browser fails to start
Browser fails to start
Ensure Node.js is installed for TypeScript mode, or run
playwright install chromium for Python mode.Elements not found
Elements not found
Use
browser_get_page_snapshot to see the current page snapshot. Elements may change refs after actions.CDP connection fails
CDP connection fails
Ensure Chrome is started with remote debugging enabled:
Stealth mode not working
Stealth mode not working
Some websites have advanced bot detection. Try using a persistent
user_data_dir with realistic browsing history.