Camel.benchmarks.base - CAMEL-AI Documentation

BaseBenchmark

class BaseBenchmark(ABC):

Base class for benchmarks.

Attributes: name (str): Name of the benchmark. data_dir (str): Path to the data directory. save_to (str): Path to save the results. processes (int): Number of processes to use for parallel processing. :(default: :obj:1)

init

def __init__(
    self,
    name: str,
    data_dir: str,
    save_to: str,
    processes: int = 1
):

Initialize the benchmark.

Parameters:

name (str): Name of the benchmark.
data_dir (str): Path to the data directory.
save_to (str): Path to save the results.
processes (int): Number of processes to use for parallel processing. :(default: :obj:1)

download

def download(self):

Returns:

BaseBenchmark: The benchmark instance.

load

def load(self, force_download: bool = False):

Load the benchmark data.

Parameters:

force_download (bool): Whether to force download the data.

Returns:

BaseBenchmark: The benchmark instance.

train

def train(self):

Returns:

List[Dict[str, Any]]: The training data.

valid

def valid(self):

Returns:

List[Dict[str, Any]]: The validation data.

test

def test(self):

Returns:

List[Dict[str, Any]]: The test data.

run

def run(
    self,
    agent: ChatAgent,
    on: Literal['train', 'valid', 'test'],
    randomize: bool = False,
    subset: Optional[int] = None,
    *args,
    **kwargs
):

Run the benchmark.

Parameters:

agent (ChatAgent): The chat agent.
on (str): The data split to run the benchmark on.
randomize (bool): Whether to randomize the data.
subset (int): The subset of the data to run the benchmark on.

Returns:

BaseBenchmark: The benchmark instance.

results

def results(self):

Returns:

List[Dict[str, Any]]: The results.

Camel.benchmarks.apibench Camel.benchmarks.browsecomp

On this page

BaseBenchmark
init
download
load
train
valid
test
run
results

Overview

Agents

Configs

Data Generation

Datasets

Embeddings

Models

Interpreters

Memory

Messages

Prompts

Responses

Retrievers

Societies

Storage

Tasks

Terminators

Toolkits

Types

Verifiers

Bots

Runtime

Utilities

Environments

Extractors

Personas

Benchmarks

Data Collector

Datahubs

Loaders

Schemas

​BaseBenchmark

​init

​download

​load

​train

​valid

​test

​run

​results

BaseBenchmark

init

download

load

train

valid

test

run

results