BaseGenerator
init
- seed (int): Random seed for reproducibility. (default: :obj:
42
) (default: 42) - buffer (int): Amount of DataPoints to be generated when the iterator runs out of DataPoints in data. (default: :obj:
20
) - cache (Union[str, Path, None]): Optional path to save generated datapoints during iteration. If None is provided, datapoints will be discarded every 100 generations.
- data_path (Union[str, Path, None]): Optional path to a JSONL file to initialize the dataset from. **kwargs: Additional generator parameters.
aiter
data_path
was provided during initialization, those datapoints
are yielded first. When self._data is empty, 20 new datapoints
are generated. Every 100 yields, the batch is appended to the
JSONL file or discarded if cache
is None.
Yields:
DataPoint: A single datapoint.
iter
data_path
was provided during initialization, those datapoints
are yielded first. When self._data is empty, 20 new datapoints
are generated. Every 100 yields, the batch is appended to the
JSONL file or discarded if cache
is None.
Yields:
DataPoint: A single datapoint.
sample
save_to_jsonl
- file_path (Union[str, Path]): Path to save the JSONL file.
- Uses
self._data
, which contains the generated datapoints. - Appends to the file if it already exists.
- Ensures compatibility with large datasets by using JSONL format.
flush
- file_path (Union[str, Path]): Path to save the JSONL file.
- Uses
save_to_jsonl
to saveself._data
.
_init_from_jsonl
- file_path (Path): Path to the JSONL file.