SelfImprovingCoTPipeline
- Initial reasoning trace generation
- Self-evaluation
- Feedback-based improvement
- Iterative refinement
init
- reason_agent (ChatAgent): The chat agent used for generating and improving reasoning traces.
- problems (List[Dict]): List of problem dictionaries to process.
- max_iterations (int, optional): Maximum number of improvement iterations. If set to
0
, the pipeline will generate an initial trace without any improvement iterations. (default: :obj:3
) - score_threshold (Union[float, Dict[str, float]], optional): Quality threshold. Can be either a single float value applied to average score, or a dictionary mapping score dimensions to their thresholds. For example:
{"correctness": 0.8, "coherence": 0.7}
. If using reward model and threshold for a dimension is not specified, will use the default value 0.7. (default: :obj:0.7
) - rejection_sampling_n (int, optional): Specifies the number of samples to be drawn using the rejection sampling method, where samples are accepted or rejected based on a predefined condition to achieve a desired distribution. (default: :obj:
None
) - evaluate_agent (Optional[ChatAgent]): The chat agent used for evaluating reasoning traces. (default: :obj:
None
) - reward_model (BaseRewardModel, optional): Model used to evaluate reasoning traces. If
None
, uses Agent self-evaluation. (default: :obj:None
) - output_path (str, optional): Output path for saving traces. If
None
, results will only be returned without saving to file. (default: :obj:None
) - few_shot_examples (str, optional): Examples to use for few-shot generation. (default: :obj:
None
) - batch_size (int, optional): Batch size for parallel processing. (default: :obj:
None
) - max_workers (int, optional): Maximum number of worker threads. (default: :obj:
None
) - solution_pattern (str, optional): Regular expression pattern with one capture group to extract answers from solution text. (default: :obj:
r'\\boxed{(.*?)}'
) - trace_pattern (str, optional): Regular expression pattern with one capture group to extract answers from trace text. If
None
, uses the same pattern as solution_pattern. (default: :obj:None
)
safe_write_json
clean_json
_check_score_threshold
- scores (Dict[str, float]): Dictionary of scores for different dimensions.
_generate_feedback
- scores (Dict[str, float]): Dictionary of scores for different dimensions.
generate_reasoning_trace
- problem (str): The problem text to generate reasoning for.
evaluate_trace
- problem (str): The original problem text to evaluate against.
- trace (str): The reasoning trace to evaluate.
- solution (Optional[str]): The solution to the problem, if provided. (default: :obj:
None
)
- scores: Dict of evaluation dimensions and their scores
- feedback: Detailed feedback for improvement
- correctness: Score for logical correctness
- clarity: Score for clarity of explanation
- completeness: Score for completeness of reasoning
generate_reasoning_trace_rejection
- problem (str): The problem text for generating a reasoning trace.
improve_trace
- problem (str): The original problem text.
- trace (str): The current reasoning trace.
- feedback (str): Feedback for improving the trace.
- solution (Optional[str]): The solution to the problem, if provided. (default: :obj:
None
)
validate_problem_format
- problem (Dict): Problem dictionary to validate.
_check_boxed_answers
- solution (str): The problem solution string.
- trace (str): The reasoning trace string.
process_problem
- problem (Dict): Problem dictionary containing the problem text.
- rationalization (bool, optional): Whether to use rationalization. (default: :obj:
False
)
generate
- rationalization (bool, optional): Whether to use rationalization. (default: :obj:
False
)