Camel.datagen.self improving cot
SelfImprovingCoTPipeline
Pipeline for generating self-taught reasoning traces using the self-improving methodology.
This implements the STaR paper’s approach of:
- Initial reasoning trace generation
- Self-evaluation
- Feedback-based improvement
- Iterative refinement
init
Initialize the self-improving cot pipeline.
Parameters:
- reason_agent (ChatAgent): The chat agent used for generating and improving reasoning traces.
- problems (List[Dict]): List of problem dictionaries to process.
- max_iterations (int, optional): Maximum number of improvement iterations. If set to
0
, the pipeline will generate an initial trace without any improvement iterations. (default: :obj:3
) - score_threshold (Union[float, Dict[str, float]], optional): Quality threshold. Can be either a single float value applied to average score, or a dictionary mapping score dimensions to their thresholds. For example:
{"correctness": 0.8, "coherence": 0.7}
. If using reward model and threshold for a dimension is not specified, will use the default value 0.7. (default: :obj:0.7
) - rejection_sampling_n (int, optional): Specifies the number of samples to be drawn using the rejection sampling method, where samples are accepted or rejected based on a predefined condition to achieve a desired distribution. (default: :obj:
None
) - evaluate_agent (Optional[ChatAgent]): The chat agent used for evaluating reasoning traces. (default: :obj:
None
) - reward_model (BaseRewardModel, optional): Model used to evaluate reasoning traces. If
None
, uses Agent self-evaluation. (default: :obj:None
) - output_path (str, optional): Output path for saving traces. If
None
, results will only be returned without saving to file. (default: :obj:None
) - few_shot_examples (str, optional): Examples to use for few-shot generation. (default: :obj:
None
) - batch_size (int, optional): Batch size for parallel processing. (default: :obj:
None
) - max_workers (int, optional): Maximum number of worker threads. (default: :obj:
None
) - solution_pattern (str, optional): Regular expression pattern with one capture group to extract answers from solution text. (default: :obj:
r'\\boxed{(.*?)}'
) - trace_pattern (str, optional): Regular expression pattern with one capture group to extract answers from trace text. If
None
, uses the same pattern as solution_pattern. (default: :obj:None
)
safe_write_json
clean_json
_check_score_threshold
Check if scores meet the threshold requirements.
Parameters:
- scores (Dict[str, float]): Dictionary of scores for different dimensions.
Returns:
bool: True if scores meet threshold requirements, False otherwise.
_generate_feedback
Generate feedback based on which dimensions need improvement.
Parameters:
- scores (Dict[str, float]): Dictionary of scores for different dimensions.
Returns:
str: Feedback message indicating which dimensions need improvement.
generate_reasoning_trace
Generate initial reasoning trace for a given problem.
Parameters:
- problem (str): The problem text to generate reasoning for.
Returns:
str: Generated reasoning trace.
evaluate_trace
Evaluate the quality of a reasoning trace.
Parameters:
- problem (str): The original problem text to evaluate against.
- trace (str): The reasoning trace to evaluate.
- solution (Optional[str]): The solution to the problem, if provided. (default: :obj:
None
)
Returns:
Dict[str, Any]: Evaluation results containing:
- scores: Dict of evaluation dimensions and their scores
- feedback: Detailed feedback for improvement
For Agent self-evaluation, the scores will include:
- correctness: Score for logical correctness
- clarity: Score for clarity of explanation
- completeness: Score for completeness of reasoning
For reward model evaluation, the scores will depend on the model’s evaluation dimensions.
generate_reasoning_trace_rejection
Generate multiple candidate reasoning traces for a problem and select the best one based on evaluation.
Parameters:
- problem (str): The problem text for generating a reasoning trace.
Returns:
str: The best candidate trace that meets quality criteria, or the first candidate if none qualify.
improve_trace
Generate improved reasoning trace based on feedback.
Parameters:
- problem (str): The original problem text.
- trace (str): The current reasoning trace.
- feedback (str): Feedback for improving the trace.
- solution (Optional[str]): The solution to the problem, if provided. (default: :obj:
None
)
Returns:
str: Improved reasoning trace.
validate_problem_format
Validate that a problem dictionary has the required format.
Parameters:
- problem (Dict): Problem dictionary to validate.
_check_boxed_answers
Check if the answer in the trace matches the solution using the configured patterns.
Parameters:
- solution (str): The problem solution string.
- trace (str): The reasoning trace string.
Returns:
bool: True if answers match, False otherwise
process_problem
Process a single problem through the self-improving cot pipeline.
Parameters:
- problem (Dict): Problem dictionary containing the problem text.
- rationalization (bool, optional): Whether to use rationalization. (default: :obj:
False
)
Returns:
ProblemResult: Results with final trace and history.
generate
Execute the self-improving cot pipeline on all problems.
Process problems and return results. If output_path is specified, also save results to file.
Parameters:
- rationalization (bool, optional): Whether to use rationalization. (default: :obj:
False
)
Returns:
List[Dict[str, Any]]: List of processed results