> ## Documentation Index
> Fetch the complete documentation index at: https://docs.camel-ai.org/llms.txt
> Use this file to discover all available pages before exploring further.

# Camel.datasets.base generator

<a id="camel.datasets.base_generator" />

<a id="camel.datasets.base_generator.BaseGenerator" />

## BaseGenerator

```python theme={"system"}
class BaseGenerator(ABC, IterableDataset):
```

Abstract base class for data generators.

This class defines the interface for generating synthetic datapoints.
Concrete implementations should provide specific generation strategies.

<a id="camel.datasets.base_generator.BaseGenerator.__init__" />

### **init**

```python theme={"system"}
def __init__(
    self,
    seed: int = 42,
    buffer: int = 20,
    cache: Union[str, Path, None] = None,
    data_path: Union[str, Path, None] = None,
    **kwargs
):
```

Initialize the base generator.

**Parameters:**

* **seed** (int): Random seed for reproducibility. (default: :obj:`42`) (default: 42)
* **buffer** (int): Amount of DataPoints to be generated when the iterator runs out of DataPoints in data. (default: :obj:`20`)
* **cache** (Union\[str, Path, None]): Optional path to save generated datapoints during iteration. If None is provided, datapoints will be discarded every 100 generations.
* **data\_path** (Union\[str, Path, None]): Optional path to a JSONL file to initialize the dataset from. \*\*kwargs: Additional generator parameters.

<a id="camel.datasets.base_generator.BaseGenerator.__aiter__" />

### **aiter**

```python theme={"system"}
def __aiter__(self):
```

Async iterator that yields datapoints dynamically.

If a `data_path` was provided during initialization, those datapoints
are yielded first. When self.\_data is empty, 20 new datapoints
are generated. Every 100 yields, the batch is appended to the
JSONL file or discarded if `cache` is None.

Yields:
DataPoint: A single datapoint.

<a id="camel.datasets.base_generator.BaseGenerator.__iter__" />

### **iter**

```python theme={"system"}
def __iter__(self):
```

Synchronous iterator for PyTorch IterableDataset compatibility.

If a `data_path` was provided during initialization, those datapoints
are yielded first. When self.\_data is empty, 20 new datapoints
are generated. Every 100 yields, the batch is appended to the
JSONL file or discarded if `cache` is None.

Yields:
DataPoint: A single datapoint.

<a id="camel.datasets.base_generator.BaseGenerator.sample" />

### sample

```python theme={"system"}
def sample(self):
```

**Returns:**

DataPoint: The next DataPoint.

**Note:**

This method is intended for synchronous contexts.
Use 'async\_sample' in asynchronous contexts to
avoid blocking or runtime errors.

<a id="camel.datasets.base_generator.BaseGenerator.save_to_jsonl" />

### save\_to\_jsonl

```python theme={"system"}
def save_to_jsonl(self, file_path: Union[str, Path]):
```

Saves the generated datapoints to a JSONL (JSON Lines) file.

Each datapoint is stored as a separate JSON object on a new line.

**Parameters:**

* **file\_path** (Union\[str, Path]): Path to save the JSONL file.

**Note:**

* Uses `self._data`, which contains the generated datapoints.
* Appends to the file if it already exists.
* Ensures compatibility with large datasets by using JSONL format.

<a id="camel.datasets.base_generator.BaseGenerator.flush" />

### flush

```python theme={"system"}
def flush(self, file_path: Union[str, Path]):
```

Flush the current data to a JSONL file and clear the data.

**Parameters:**

* **file\_path** (Union\[str, Path]): Path to save the JSONL file.

**Note:**

* Uses `save_to_jsonl` to save `self._data`.

<a id="camel.datasets.base_generator.BaseGenerator._init_from_jsonl" />

### \_init\_from\_jsonl

```python theme={"system"}
def _init_from_jsonl(self, file_path: Path):
```

Load and parse a dataset from a JSONL file.

**Parameters:**

* **file\_path** (Path): Path to the JSONL file.

**Returns:**

List\[Dict\[str, Any]]: A list of datapoint dictionaries.
