> ## Documentation Index > Fetch the complete documentation index at: https://docs.camel-ai.org/llms.txt > Use this file to discover all available pages before exploring further. # Camel.benchmarks.ragbench ## RagasFields ```python theme={"system"} class RagasFields: ``` Constants for RAGAS evaluation field names. ## annotate\_dataset ```python theme={"system"} def annotate_dataset( dataset: Dataset, context_call: Optional[Callable[[Dict[str, Any]], List[str]]], answer_call: Optional[Callable[[Dict[str, Any]], str]] ): ``` Annotate the dataset by adding context and answers using the provided functions. **Parameters:** * **dataset** (Dataset): The input dataset to annotate. * **context\_call** (Optional\[Callable\[\[Dict\[str, Any]], List\[str]]]): Function to generate context for each example. * **answer\_call** (Optional\[Callable\[\[Dict\[str, Any]], str]]): Function to generate answer for each example. **Returns:** Dataset: The annotated dataset with added contexts and/or answers. ## rmse ```python theme={"system"} def rmse(input_trues: Sequence[float], input_preds: Sequence[float]): ``` Calculate Root Mean Squared Error (RMSE). **Parameters:** * **input\_trues** (Sequence\[float]): Ground truth values. * **input\_preds** (Sequence\[float]): Predicted values. **Returns:** Optional\[float]: RMSE value, or None if inputs have different lengths. ## auroc ```python theme={"system"} def auroc(trues: Sequence[bool], preds: Sequence[float]): ``` Calculate Area Under Receiver Operating Characteristic Curve (AUROC). **Parameters:** * **trues** (Sequence\[bool]): Ground truth binary values. * **preds** (Sequence\[float]): Predicted probability values. **Returns:** float: AUROC score. ## ragas\_calculate\_metrics ```python theme={"system"} def ragas_calculate_metrics( dataset: Dataset, pred_context_relevance_field: Optional[str], pred_faithfulness_field: Optional[str], metrics_to_evaluate: Optional[List[str]] = None, ground_truth_context_relevance_field: str = 'relevance_score', ground_truth_faithfulness_field: str = 'adherence_score' ): ``` Calculate RAGAS evaluation metrics. **Parameters:** * **dataset** (Dataset): The dataset containing predictions and ground truth. * **pred\_context\_relevance\_field** (Optional\[str]): Field name for predicted context relevance. * **pred\_faithfulness\_field** (Optional\[str]): Field name for predicted faithfulness. * **metrics\_to\_evaluate** (Optional\[List\[str]]): List of metrics to evaluate. * **ground\_truth\_context\_relevance\_field** (str): Field name for ground truth relevance. * **ground\_truth\_faithfulness\_field** (str): Field name for ground truth adherence. **Returns:** Dict\[str, Optional\[float]]: Dictionary of calculated metrics. ## ragas\_evaluate\_dataset ```python theme={"system"} def ragas_evaluate_dataset( dataset: Dataset, contexts_field_name: Optional[str], answer_field_name: Optional[str], metrics_to_evaluate: Optional[List[str]] = None ): ``` Evaluate the dataset using RAGAS metrics. **Parameters:** * **dataset** (Dataset): Input dataset to evaluate. * **contexts\_field\_name** (Optional\[str]): Field name containing contexts. * **answer\_field\_name** (Optional\[str]): Field name containing answers. * **metrics\_to\_evaluate** (Optional\[List\[str]]): List of metrics to evaluate. **Returns:** Dataset: Dataset with added evaluation metrics. ## RAGBenchBenchmark ```python theme={"system"} class RAGBenchBenchmark(BaseBenchmark): ``` RAGBench Benchmark for evaluating RAG performance. This benchmark uses the rungalileo/ragbench dataset to evaluate retrieval-augmented generation (RAG) systems. It measures context relevancy and faithfulness metrics as described in [https://arxiv.org/abs/2407.11005](https://arxiv.org/abs/2407.11005). **Parameters:** * **processes** (int, optional): Number of processes for parallel processing. * **subset** (str, optional): Dataset subset to use (e.g., "hotpotqa"). * **split** (str, optional): Dataset split to use (e.g., "test"). ### **init** ```python theme={"system"} def __init__( self, processes: int = 1, subset: Literal['covidqa', 'cuad', 'delucionqa', 'emanual', 'expertqa', 'finqa', 'hagrid', 'hotpotqa', 'msmarco', 'pubmedqa', 'tatqa', 'techqa'] = 'hotpotqa', split: Literal['train', 'test', 'validation'] = 'test' ): ``` ### download ```python theme={"system"} def download(self): ``` Download the RAGBench dataset. ### load ```python theme={"system"} def load(self, force_download: bool = False): ``` Load the RAGBench dataset. **Parameters:** * **force\_download** (bool, optional): Whether to force download the data. ### run ```python theme={"system"} def run(self, agent: ChatAgent, auto_retriever: AutoRetriever): ``` Run the benchmark evaluation. **Parameters:** * **agent** (ChatAgent): Chat agent for generating answers. * **auto\_retriever** (AutoRetriever): Retriever for finding relevant contexts. **Returns:** Dict\[str, Optional\[float]]: Dictionary of evaluation metrics.