A processor for generating multi-hop question-answer pairs from user
data.This class handles the processing of text data to generate multi-hop
question-answer pairs using either an AI model or rule-based approaches.
It manages the entire pipeline from text preprocessing to dataset curation.Parameters:
config (ProcessorConfig): Configuration for data processing parameters.
rng (random.Random): Random number generator for reproducibility.
multi_hop_agent (Optional[MultiHopGeneratorAgent]): Agent for generating QA pairs.
Constructs training examples from raw text data.This class handles the construction of training examples by preprocessing
text, extracting information pairs, and generating question-answer pairs.Parameters:
config (ProcessorConfig): Configuration for example construction.
multi_hop_agent (Optional[MultiHopGeneratorAgent]): Agent for QA generation.
Manages and curates datasets of multi-hop question-answer pairs.This class handles dataset management tasks including quality filtering,
complexity filtering, deduplication, and dataset sampling.Parameters:
config (ProcessorConfig): Configuration for data curation parameters.
rng (random.Random): Random number generator for reproducible sampling.