SelfInstructPipeline
- agent (ChatAgent): The agent used to interact and generate instructions.
- seed (str): The path to the human-written instructions.
- num_machine_instructions (int): Number of machine-generated instructions to generate. (default::obj:
5
) - data_output_path (Optional[str]): Path to save the generated data. (default::obj:
./data_output.json
) - human_to_machine_ratio (tuple): Ratio of human to machine tasks used for instruction generation. (default::obj:
(6, 2)
) - instruction_filter (InstructionFilter): A filter to validate generated instructions. (default::obj:
None
) - filter_config (Optional[Dict[str, Dict[str, Any]]]): configuration for the filter functions registered in FILE_REGISTRY. (default::obj:
None
) - stop_on_first_failure (bool): If True, stops checking filters after the first failure.
init
load_seed
- path (str): Path to the seed file.
sample_human_tasks
- count (int): Number of human tasks to sample.
sample_machine_tasks
- count (int): Number of machine tasks to sample.
generate_machine_instruction
identify_instruction
- instruction (str): The instruction to classify.
generate_machine_instances
generate_machine_instance
- instruction (str): The instruction to create instances for.
- classification (bool): Whether the instruction is a classification task.
parse_classification_output
- generated_text (str): The raw text generated by the agent for classification tasks.
parse_non_classification_output
- generated_text (str): The raw text generated by the agent for non-classification tasks.
construct_data
generate
- timeout_minutes (int): Maximum time in minutes to run the generation process before timing out. (default: :obj:
600
)