StaticDataset
Dataset
from PyTorch and should
be used when its size is fixed at runtime.
This class can initialize from Hugging Face Datasets,
PyTorch Datasets, JSON file paths, or lists of dictionaries,
converting them into a consistent internal format.
init
- data (Union[HFDataset, Dataset, Path, List[Dict[str, Any]]]): Input data, which can be one of the following: - A Hugging Face Dataset (:obj:
HFDataset
). - A PyTorch Dataset (:obj:torch.utils.data.Dataset
). - A :obj:Path
object representing a JSON or JSONL file. - A list of dictionaries with :obj:DataPoint
-compatible fields. - seed (int): Random seed for reproducibility. (default: :obj:
42
) - min_samples (int): Minimum required number of samples. (default: :obj:
1
) - strict (bool): Whether to raise an error on invalid datapoints (:obj:
True
) or skip/filter them (:obj:False
). (default: :obj:False
) **kwargs: Additional dataset parameters.
_init_data
DataPoint
instances.
Parameters:
- data (Union[HFDataset, Dataset, Path, List[Dict[str, Any]]]): Input dataset in one of the supported formats.
DataPoint
instances.
len
getitem
- idx (Union[int, slice]): Index or slice of the datapoint(s).
DataPoint
objects.
sample
DataPoint
.
metadata
_init_from_hf_dataset
- data (HFDataset): A Hugging Face dataset.
_init_from_pytorch_dataset
- data (Dataset): A PyTorch dataset.
_init_from_json_path
- data (Path): Path to the JSON file.
_init_from_jsonl_path
- data (Path): Path to the JSONL file.
_init_from_list
- data (List[Dict[str, Any]]): A list of dictionaries where each dictionary must be a valid :obj:
DataPoint
.
save_to_json
- file_path (Union[str, Path]): Path to the output JSON file. If a string is provided, it will be converted to a Path object.
save_to_huggingface
- dataset_name (str): The name of the dataset on Hugging Face Hub. Should be in the format ‘username/dataset_name’ .
- token (Optional[str]): The Hugging Face API token. If not provided, the token will be read from the environment variable
HF_TOKEN
(default: :obj:None
) - filepath (str): The path in the repository where the dataset will be saved. (default: :obj:
"records/records.json"
) - private (bool): Whether the dataset should be private. (default: :obj:
False
) - description (Optional[str]): A description of the dataset. (default: :obj:
None
) - license (Optional[str]): The license of the dataset. (default: :obj:
None
) - version (Optional[str]): The version of the dataset. (default: :obj:
None
) - tags (Optional[List[str]]): A list of tags for the dataset. (default: :obj:
None
) - language (Optional[List[str]]): A list of languages the dataset is in. (default: :obj:
None
) - task_categories (Optional[List[str]]): A list of task categories. (default: :obj:
None
) - authors (Optional[List[str]]): A list of authors of the dataset. (default: :obj:
None
) **kwargs (Any): Additional keyword arguments to pass to the Hugging Face API.