Camel.benchmarks.gaia
RetrieverProtocol
Protocol for the retriever class. Any retriever class implementing this protocol can be used in the benchmark class.
retrieve
Retrieve the relevant content for the query.
Parameters:
- query (str): The query to retrieve the content for.
- contents (List[str]): The list of contents to search in. **kwargs (Dict[str, Any]): Additional keyword arguments.
Returns:
Dict[str, Any]: The relevant content for the query.
reset
Reset the retriever. Some benchmarks may require resetting the retriever after each query.
Returns:
bool: True if the reset was successful, False otherwise.
DefaultGAIARetriever
Default retriever for the GAIA benchmark. This retriever uses AutoRetriever in camel to retrieve the content based on the query.
retrieve
Retrieve the content based on the query.
Parameters:
- query (str): The query to search for.
- contents (List[str]): The list of contents to search from. **kwargs (Any): The keyword arguments to pass to the retriever.
Returns:
Dict[str, Any]: The retrieved content.
reset
Reset the retriever.
Returns:
bool: Whether the reset was successful.
GAIABenchmark
GAIA Benchmark adapted from “GAIA: a benchmark for General AI Assistants”.
Parameters:
- data_dir (str): The directory to save the data.
- save_to (str): The file to save the results.
- retriever (Optional[RetrieverProtocol]): The retriever to use. (default: :obj:
None
) - processes (int, optional): The number of processes to use. (default: :obj:
1
)
init
Initialize the GAIA benchmark.
Parameters:
- data_dir (str): The directory to save the data.
- save_to (str): The file to save the results.
- retriever (Optional[RetrieverProtocol], optional): The retriever to use. (default: :obj:
None
) - processes (int, optional): The number of processes to use for parallel processing. (default: :obj:
1
)
download
Download the GAIA dataset.
load
Load the GAIA dataset.
Parameters:
- force_download (bool, optional): Whether to force download the data.
train
Get the training set.
run
Run the benchmark.
Parameters:
- agent (ChatAgent): The agent to run the benchmark.
- on (
Literal["valid", "test"]
): The set to run the benchmark. - level (
Union[int, List[int], Literal["all"]]
): The level to run the benchmark. - randomize (bool, optional): Whether to randomize the data. (default: :obj:
False
) - subset (Optional[int], optional): The subset of data to run. (default: :obj:
None
)
Returns:
Dict[str, Any]: The results of the benchmark.
_prepare_task
Prepare the task by validating and enriching its data.
_create_user_message
Create a user message from a task.
_process_result
Process and store the result of a task.
_handle_error
Handle errors encountered during task processing.
_generate_summary
Generate and return a summary of the benchmark results.
question_scorer
Scorer for the GAIA benchmark. https://huggingface.co/spaces/gaia-benchmark/leaderboard/blob/main/ scorer.py
Parameters:
- model_answer (str): The model answer.
- ground_truth (str): The ground truth answer.
Returns:
bool: The score of the model
normalize_number_str
split_string
Split a string based on a list of characters.
Parameters:
- s (str): The string to split.
- char_list (Optional[List[str]], optional): T he list of characters to split on. (default: :obj:
None
)
normalize_str
Normalize a string.
Parameters:
- input_str: The input string to normalize.
- remove_punct: Whether to remove punctuation.
Returns:
str: The normalized string.
get_final_answer
Get the final answer from the content.
Parameters:
- content (str): The content to extract the final answer from.
Returns:
str: The final answer.