Camel.toolkits.browser toolkit
_get_str
Safely retrieve a string value from a dictionary.
_get_number
Safely retrieve a number (int or float) from a dictionary
_get_bool
Safely retrieve a boolean value from a dictionary.
BaseBrowser
init
Initialize the WebBrowser instance.
Parameters:
- headless (bool): Whether to run the browser in headless mode.
- cache_dir (Union[str, None]): The directory to store cache files.
- channel (
Literal["chrome", "msedge", "chromium"]
): The browser channel to use. Must be one of “chrome”, “msedge”, or “chromium”. - cookie_json_path (Optional[str]): Path to a JSON file containing authentication cookies and browser storage state. If provided and the file exists, the browser will load this state to maintain authenticated sessions without requiring manual login.
Returns:
None
init
Initialize the browser.
clean_cache
Delete the cache directory and its contents.
_wait_for_load
Wait for a certain amount of time for the page to load.
click_blank_area
Click a blank area of the page to unfocus the current element.
visit_page
Visit a page with the given URL.
ask_question_about_video
Ask a question about the video on the current page, such as YouTube video.
Parameters:
- question (str): The question to ask.
Returns:
str: The answer to the question.
get_screenshot
Get a screenshot of the current page.
Parameters:
- save_image (bool): Whether to save the image to the cache directory.
Returns:
Tuple[Image.Image, str]: A tuple containing the screenshot
image and the path to the image file if saved, otherwise
:obj:None
.
capture_full_page_screenshots
Capture full page screenshots by scrolling the page with a buffer zone.
Parameters:
- scroll_ratio (float): The ratio of viewport height to scroll each step. (default: :obj:
0.8
)
Returns:
List[str]: A list of paths to the screenshot files.
get_visual_viewport
Returns:
VisualViewport: The visual viewport of the current page.
get_interactive_elements
Returns:
Dict[str, InteractiveRegion]: A dictionary of interactive elements.
get_som_screenshot
Get a screenshot of the current viewport with interactive elements marked.
Parameters:
- save_image (bool): Whether to save the image to the cache directory.
Returns:
Tuple[Image.Image, Union[str, None]]: A tuple containing the
screenshot image
and an optional path to the image file if saved, otherwise
:obj:None
.
scroll_up
Scroll up the page.
scroll_down
Scroll down the page.
get_url
Get the URL of the current page.
click_id
Click an element with the given identifier.
extract_url_content
Extract the content of the current page.
download_file_id
Download a file with the given selector.
Parameters:
- identifier (str): The identifier of the file to download.
Returns:
str: The result of the action.
fill_input_id
Fill an input field with the given text, and then press Enter.
Parameters:
- identifier (str): The identifier of the input field.
- text (str): The text to fill.
Returns:
str: The result of the action.
scroll_to_bottom
scroll_to_top
hover_id
Hover over an element with the given identifier.
Parameters:
- identifier (str): The identifier of the element to hover over.
Returns:
str: The result of the action.
find_text_on_page
Find the next given text on the page, and scroll the page to the targeted text. It is equivalent to pressing Ctrl + F and searching for the text.
back
Navigate back to the previous page.
close
show_interactive_elements
Show simple interactive elements on the current page.
get_webpage_content
_ensure_browser_installed
Ensure the browser is installed.
BrowserToolkit
A class for browsing the web and interacting with web pages.
This class provides methods for browsing the web and interacting with web pages.
init
Initialize the BrowserToolkit instance.
Parameters:
- headless (bool): Whether to run the browser in headless mode.
- cache_dir (Union[str, None]): The directory to store cache files.
- channel (
Literal["chrome", "msedge", "chromium"]
): The browser channel to use. Must be one of “chrome”, “msedge”, or “chromium”. - history_window (int): The window size for storing the history of actions.
- web_agent_model (Optional[BaseModelBackend]): The model backend for the web agent.
- planning_agent_model (Optional[BaseModelBackend]): The model backend for the planning agent.
- output_language (str): The language to use for output. (default: :obj:
"en
”) - cookie_json_path (Optional[str]): Path to a JSON file containing authentication cookies and browser storage state. If provided and the file exists, the browser will load this state to maintain authenticated sessions without requiring manual login. (default: :obj:
None
)
_reset
_initialize_agent
Initialize the agent.
_observe
Let agent observe the current environment, and get the next action.
_act
Let agent act based on the given action code.
Parameters:
- action_code (str): The action code to act.
Returns:
Tuple[bool, str]: A tuple containing a boolean indicating whether the action was successful, and the information to be returned.
_get_final_answer
Get the final answer based on the task prompt and current browser state. It is used when the agent thinks that the task can be completed without any further action, and answer can be directly found in the current viewport.
_task_planning
Plan the task based on the given task prompt.
_task_replanning
Replan the task based on the given task prompt.
Parameters:
- task_prompt (str): The original task prompt.
- detailed_plan (str): The detailed plan to replan.
Returns:
Tuple[bool, str]: A tuple containing a boolean indicating whether the task needs to be replanned, and the replanned schema.
browse_url
A powerful toolkit which can simulate the browser interaction to solve the task which needs multi-step actions.
Parameters:
- task_prompt (str): The task prompt to solve.
- start_url (str): The start URL to visit.
- round_limit (int): The round limit to solve the task. (default: :obj:
12
).
Returns:
str: The simulation result to the task.