> ## Documentation Index
> Fetch the complete documentation index at: https://docs.camel-ai.org/llms.txt
> Use this file to discover all available pages before exploring further.

# Camel.benchmarks.ragbench

<a id="camel.benchmarks.ragbench" />

<a id="camel.benchmarks.ragbench.RagasFields" />

## RagasFields

```python theme={"system"}
class RagasFields:
```

Constants for RAGAS evaluation field names.

<a id="camel.benchmarks.ragbench.annotate_dataset" />

## annotate\_dataset

```python theme={"system"}
def annotate_dataset(
    dataset: Dataset,
    context_call: Optional[Callable[[Dict[str, Any]], List[str]]],
    answer_call: Optional[Callable[[Dict[str, Any]], str]]
):
```

Annotate the dataset by adding context and answers using the provided
functions.

**Parameters:**

* **dataset** (Dataset): The input dataset to annotate.
* **context\_call** (Optional\[Callable\[\[Dict\[str, Any]], List\[str]]]): Function to generate context for each example.
* **answer\_call** (Optional\[Callable\[\[Dict\[str, Any]], str]]): Function to generate answer for each example.

**Returns:**

Dataset: The annotated dataset with added contexts and/or answers.

<a id="camel.benchmarks.ragbench.rmse" />

## rmse

```python theme={"system"}
def rmse(input_trues: Sequence[float], input_preds: Sequence[float]):
```

Calculate Root Mean Squared Error (RMSE).

**Parameters:**

* **input\_trues** (Sequence\[float]): Ground truth values.
* **input\_preds** (Sequence\[float]): Predicted values.

**Returns:**

Optional\[float]: RMSE value, or None if inputs have different lengths.

<a id="camel.benchmarks.ragbench.auroc" />

## auroc

```python theme={"system"}
def auroc(trues: Sequence[bool], preds: Sequence[float]):
```

Calculate Area Under Receiver Operating Characteristic Curve (AUROC).

**Parameters:**

* **trues** (Sequence\[bool]): Ground truth binary values.
* **preds** (Sequence\[float]): Predicted probability values.

**Returns:**

float: AUROC score.

<a id="camel.benchmarks.ragbench.ragas_calculate_metrics" />

## ragas\_calculate\_metrics

```python theme={"system"}
def ragas_calculate_metrics(
    dataset: Dataset,
    pred_context_relevance_field: Optional[str],
    pred_faithfulness_field: Optional[str],
    metrics_to_evaluate: Optional[List[str]] = None,
    ground_truth_context_relevance_field: str = 'relevance_score',
    ground_truth_faithfulness_field: str = 'adherence_score'
):
```

Calculate RAGAS evaluation metrics.

**Parameters:**

* **dataset** (Dataset): The dataset containing predictions and ground truth.
* **pred\_context\_relevance\_field** (Optional\[str]): Field name for predicted context relevance.
* **pred\_faithfulness\_field** (Optional\[str]): Field name for predicted faithfulness.
* **metrics\_to\_evaluate** (Optional\[List\[str]]): List of metrics to evaluate.
* **ground\_truth\_context\_relevance\_field** (str): Field name for ground truth relevance.
* **ground\_truth\_faithfulness\_field** (str): Field name for ground truth adherence.

**Returns:**

Dict\[str, Optional\[float]]: Dictionary of calculated metrics.

<a id="camel.benchmarks.ragbench.ragas_evaluate_dataset" />

## ragas\_evaluate\_dataset

```python theme={"system"}
def ragas_evaluate_dataset(
    dataset: Dataset,
    contexts_field_name: Optional[str],
    answer_field_name: Optional[str],
    metrics_to_evaluate: Optional[List[str]] = None
):
```

Evaluate the dataset using RAGAS metrics.

**Parameters:**

* **dataset** (Dataset): Input dataset to evaluate.
* **contexts\_field\_name** (Optional\[str]): Field name containing contexts.
* **answer\_field\_name** (Optional\[str]): Field name containing answers.
* **metrics\_to\_evaluate** (Optional\[List\[str]]): List of metrics to evaluate.

**Returns:**

Dataset: Dataset with added evaluation metrics.

<a id="camel.benchmarks.ragbench.RAGBenchBenchmark" />

## RAGBenchBenchmark

```python theme={"system"}
class RAGBenchBenchmark(BaseBenchmark):
```

RAGBench Benchmark for evaluating RAG performance.

This benchmark uses the rungalileo/ragbench dataset to evaluate
retrieval-augmented generation (RAG) systems. It measures context
relevancy and faithfulness metrics as described in
[https://arxiv.org/abs/2407.11005](https://arxiv.org/abs/2407.11005).

**Parameters:**

* **processes** (int, optional): Number of processes for parallel processing.
* **subset** (str, optional): Dataset subset to use (e.g., "hotpotqa").
* **split** (str, optional): Dataset split to use (e.g., "test").

<a id="camel.benchmarks.ragbench.RAGBenchBenchmark.__init__" />

### **init**

```python theme={"system"}
def __init__(
    self,
    processes: int = 1,
    subset: Literal['covidqa', 'cuad', 'delucionqa', 'emanual', 'expertqa', 'finqa', 'hagrid', 'hotpotqa', 'msmarco', 'pubmedqa', 'tatqa', 'techqa'] = 'hotpotqa',
    split: Literal['train', 'test', 'validation'] = 'test'
):
```

<a id="camel.benchmarks.ragbench.RAGBenchBenchmark.download" />

### download

```python theme={"system"}
def download(self):
```

Download the RAGBench dataset.

<a id="camel.benchmarks.ragbench.RAGBenchBenchmark.load" />

### load

```python theme={"system"}
def load(self, force_download: bool = False):
```

Load the RAGBench dataset.

**Parameters:**

* **force\_download** (bool, optional): Whether to force download the data.

<a id="camel.benchmarks.ragbench.RAGBenchBenchmark.run" />

### run

```python theme={"system"}
def run(self, agent: ChatAgent, auto_retriever: AutoRetriever):
```

Run the benchmark evaluation.

**Parameters:**

* **agent** (ChatAgent): Chat agent for generating answers.
* **auto\_retriever** (AutoRetriever): Retriever for finding relevant contexts.

**Returns:**

Dict\[str, Optional\[float]]: Dictionary of evaluation metrics.
