QueryResponse
GradingResponse
SingleEvalResult
EvalResult
JinjaEnv
init
new
get_instance
env
from_string
- template_str (str): The template string.
message_to_html
<div>
) for a message.
Parameters:
- message (Message): The message to convert to HTML.
derive_key
decrypt
_compute_stat
aggregate_results
- single_eval_results (List[SingleEvalResult]): A list of
SingleEvalResult
objects. - default_stats (Tuple[str, str]): A tuple of default statistics to compute. (default: :obj:
("mean", "std")
) - name2stats (Optional[Dict[str, Tuple[str]]]): A dictionary mapping metric names to statistics to compute. (default: :obj:
None
)
EvalResult
object containing aggregated results.
BrowseCompBenchmark
init
- save_to (str): The file to save the results.
- processes (int, optional): The number of processes to use for parallel processing. (default: :obj:
1
) - num_examples (Optional[int]): Number of examples to evaluate. If None, all examples are used. Controls the sample size for testing. (default: :obj:
None
) - n_repeats (int, optional): Number of times to repeat each example. Useful for evaluating consistency across multiple runs. (default: :obj:
1
)
download
load
train
run
- pipeline_template (Union[ChatAgent, RolePlaying, Workforce]): The template agent or framework to use for processing examples. Can be a ChatAgent, RolePlaying, or Workforce instance that will be cloned for each example.
- chat_turn_limit (int): Maximum number of conversation turns allowed when using RolePlaying pipeline. (default: :obj:
10
) - roleplaying_summarizer (Optional[ChatAgent]): Optional ChatAgent to summarize RolePlaying conversations. If None and RolePlaying is used, a default summarizer will be created. (default: :obj:
None
) - task_json_formatter (Optional[ChatAgent]): Optional ChatAgent to format task JSON. If None and Workforce is used, a default formatter will be created. (default: :obj:
None
)
make_report
validate
- grader: The ChatAgent used for validation. If None, a default agent will be created in each thread. If provided, the provided agent will be used as a template and be cloned into new agents in each thread. (default: :obj:
None
)