Camel.datagen.self instruct.self instruct
SelfInstructPipeline
A pipeline to generate and manage machine-generated instructions for tasks, combining human and machine task samples.
Parameters:
- agent (ChatAgent): The agent used to interact and generate instructions.
- seed (str): The path to the human-written instructions.
- num_machine_instructions (int): Number of machine-generated instructions to generate. (default::obj:
5
) - data_output_path (Optional[str]): Path to save the generated data. (default::obj:
./data_output.json
) - human_to_machine_ratio (tuple): Ratio of human to machine tasks used for instruction generation. (default::obj:
(6, 2)
) - instruction_filter (InstructionFilter): A filter to validate generated instructions. (default::obj:
None
) - filter_config (Optional[Dict[str, Dict[str, Any]]]): configuration for the filter functions registered in FILE_REGISTRY. (default::obj:
None
) - stop_on_first_failure (bool): If True, stops checking filters after the first failure.
init
load_seed
Load seed tasks from a file. Defaults to a predefined seed file if no path is provided.
Parameters:
- path (str): Path to the seed file.
sample_human_tasks
Sample a specified number of human tasks from the loaded seed.
Parameters:
- count (int): Number of human tasks to sample.
Returns:
List[dict]: A list of sampled human tasks.
sample_machine_tasks
Sample a specified number of machine tasks.
Parameters:
- count (int): Number of machine tasks to sample.
Returns:
List[dict]: A list of sampled machine tasks, with placeholders if insufficient tasks are available.
generate_machine_instruction
Returns:
List: The prompt and a machine-generated instruction.
identify_instruction
Determine if the given instruction is a classification task.
Parameters:
- instruction (str): The instruction to classify.
Returns:
bool: True if the instruction is a classification task, otherwise False.
generate_machine_instances
Generate instances for each machine task based on its classification status.
generate_machine_instance
Generate instances for a given instruction.
Parameters:
- instruction (str): The instruction to create instances for.
- classification (bool): Whether the instruction is a classification task.
Returns:
List[dict]: A list of generated instances in input-output format.
parse_classification_output
Parse the generated text for classification tasks into input-output pairs.
Parameters:
- generated_text (str): The raw text generated by the agent for classification tasks.
Returns:
List[Dict[str, str]]: A list of dictionaries with ‘input’ and ‘output’ keys.
parse_non_classification_output
Parse the generated text for non-classification tasks into input-output pairs.
Parameters:
- generated_text (str): The raw text generated by the agent for non-classification tasks.
Returns:
List[Dict[str, str]]: A list of dictionaries with ‘input’ and ‘output’ keys.
construct_data
Save the machine-generated tasks to the specified output path in JSON format.
generate
Execute the entire pipeline to generate machine instructions and instances.
Parameters:
- timeout_minutes (int): Maximum time in minutes to run the generation process before timing out. (default: :obj:
600
)