UserDataProcessor
- config (ProcessorConfig): Configuration for data processing parameters.
- rng (random.Random): Random number generator for reproducibility.
- multi_hop_agent (Optional[MultiHopGeneratorAgent]): Agent for generating QA pairs.
init
- config (Optional[ProcessorConfig], optional): Configuration for data processing. (default: :obj:
None
)
process_text
- text (str): The input text to process.
- source (str, optional): Source identifier for the text. (default: :obj:
"user_input"
)
process_batch
- texts (List[str]): List of input texts to process.
- sources (Optional[List[str]], optional): List of source identifiers. (default: :obj:
None
)
ExampleConstructor
- config (ProcessorConfig): Configuration for example construction.
- multi_hop_agent (Optional[MultiHopGeneratorAgent]): Agent for QA generation.
init
- config (ProcessorConfig): Configuration for example construction.
- multi_hop_agent (Optional[MultiHopGeneratorAgent], optional): Agent for generating multi-hop QA pairs. (default: :obj:
None
)
construct_examples
- raw_data (List[Dict[str, Any]]): List of raw data dictionaries containing text and metadata.
_preprocess_text
- text (str): Input text to preprocess.
_check_text_quality
- text (str): Text to check quality for.
_extract_info_pairs
- text (str): Input text to extract information from.
_generate_qa_pairs
- info_pairs (List[Dict[str, Sequence[str]]]): List of information pairs extracted from text.
_calculate_complexity
- qa_pairs (List[Dict[str, Any]]): List of QA pairs to calculate complexity for.
DataCurator
- config (ProcessorConfig): Configuration for data curation parameters.
- rng (random.Random): Random number generator for reproducible sampling.
init
- config (ProcessorConfig): Configuration for data curation.
- rng (random.Random): Random number generator for reproducibility.
curate_dataset
- examples (List[Dict[str, Any]]): List of examples to curate.
_quality_filter
- examples (List[Dict[str, Any]]): List of examples to filter.
_check_qa_quality
- qa_pairs (List[Dict[str, str]]): List of QA pairs to check.
_complexity_filter
- examples (List[Dict[str, Any]]): List of examples to filter.
_remove_duplicates
- examples (List[Dict[str, Any]]): List of examples to deduplicate.
_sample_dataset
- examples (List[Dict[str, Any]]): List of examples to sample from.