BaseBenchmark

class BaseBenchmark(ABC):

Base class for benchmarks.

Attributes: name (str): Name of the benchmark. data_dir (str): Path to the data directory. save_to (str): Path to save the results. processes (int): Number of processes to use for parallel processing. :(default: :obj:1)

init

def __init__(
    self,
    name: str,
    data_dir: str,
    save_to: str,
    processes: int = 1
):

Initialize the benchmark.

Parameters:

  • name (str): Name of the benchmark.
  • data_dir (str): Path to the data directory.
  • save_to (str): Path to save the results.
  • processes (int): Number of processes to use for parallel processing. :(default: :obj:1)

download

def download(self):

Returns:

BaseBenchmark: The benchmark instance.

load

def load(self, force_download: bool = False):

Load the benchmark data.

Parameters:

  • force_download (bool): Whether to force download the data.

Returns:

BaseBenchmark: The benchmark instance.

train

def train(self):

Returns:

List[Dict[str, Any]]: The training data.

valid

def valid(self):

Returns:

List[Dict[str, Any]]: The validation data.

test

def test(self):

Returns:

List[Dict[str, Any]]: The test data.

run

def run(
    self,
    agent: ChatAgent,
    on: Literal['train', 'valid', 'test'],
    randomize: bool = False,
    subset: Optional[int] = None,
    *args,
    **kwargs
):

Run the benchmark.

Parameters:

  • agent (ChatAgent): The chat agent.
  • on (str): The data split to run the benchmark on.
  • randomize (bool): Whether to randomize the data.
  • subset (int): The subset of the data to run the benchmark on.

Returns:

BaseBenchmark: The benchmark instance.

results

def results(self):

Returns:

List[Dict[str, Any]]: The results.