Camel.toolkits.hybrid browser toolkit.hybrid browser toolkit
HybridBrowserToolkit
A hybrid browser toolkit that combines non-visual, DOM-based browser automation with visual, screenshot-based capabilities.
This toolkit exposes a set of actions as CAMEL FunctionTools for agents to interact with web pages. It can operate in headless mode and supports both programmatic control of browser actions (like clicking and typing) and visual analysis of the page layout through screenshots with marked interactive elements.
init
Initialize the HybridBrowserToolkit.
Parameters:
- headless (bool): Whether to run the browser in headless mode. Defaults to
True
. - user_data_dir (Optional[str]): Path to a directory for storing browser data like cookies and local storage. Useful for maintaining sessions across runs. Defaults to
None
(a temporary directory is used). - web_agent_model (Optional[BaseModelBackend]): The language model backend to use for the high-level
solve_task
agent. This is required only if you plan to usesolve_task
. Defaults toNone
. - cache_dir (str): The directory to store cached files, such as screenshots. Defaults to
"tmp/"
. - enabled_tools (Optional[List[str]]): List of tool names to enable. If None, uses DEFAULT_TOOLS. Available tools: open_browser, close_browser, visit_page, get_page_snapshot, get_som_screenshot, get_page_links, click, type, select, scroll, enter, wait_user, solve_task. Defaults to
None
.
del
Cleanup browser resources on garbage collection.
_load_unified_analyzer
Load the unified analyzer JavaScript script.
_validate_ref
Validate ref parameter.
_convert_analysis_to_rects
Convert analysis data to rect format for visual marking.
_add_set_of_mark
Add visual marks to the image.
_format_snapshot_from_analysis
Format analysis data into snapshot string.
_ensure_agent
Create PlaywrightLLMAgent on first use.
get_tools
Get available function tools based on enabled_tools configuration.