seed_path
with the path to your seed file, and replace data_output_path
with your desired output location.
num_human_sample
) from the seed_path
.num_machine_sample
) from previous rounds.human_to_machine_ratio
helps control the balance between human guidance and the model’s creativity throughout this process. By adjusting this ratio, you can influence the quality and diversity of the generated instructions.
Feel free to alter num_human_sample
and num_machine_sample
, which both will be passed into human_to_machine_ratio
later
target_num_instructions
with the number of machine instructions you want to generate
True
if the instruction is valid, False
otherwise.
LengthFilter
filters out all the instructions which has a length less than min_len
or greater than max_len
.
KeywordFilter
filters instructions that contain specific undesirable keyword.
PunctuationFilter
filters instructions that begin with a non-alphanumeric character.
NonEnglishFilter
filters instructions that do not begin with English letters.
RougeSimilarityFilter
filters instructions that are too similar to existing instructions based on ROUGE scores.
InstructionFilter
manages all filter functions. And we can use a custom InstructionFilter to initialize the pipeline
Start by adding filter functions you want and configure them.
InstructionFilter
InstructionFilter