SelfImprovingCoTPipeline

class SelfImprovingCoTPipeline:

Pipeline for generating self-taught reasoning traces using the self-improving methodology.

This implements the STaR paper’s approach of:

  1. Initial reasoning trace generation
  2. Self-evaluation
  3. Feedback-based improvement
  4. Iterative refinement

init

def __init__(
    self,
    reason_agent: ChatAgent,
    problems: List[Dict],
    max_iterations: int = 3,
    score_threshold: Union[float, Dict[str, float]] = 0.7,
    rejection_sampling_n: Optional[int] = None,
    evaluate_agent: Optional[ChatAgent] = None,
    reward_model: Optional[BaseRewardModel] = None,
    output_path: Optional[str] = None,
    few_shot_examples: Optional[str] = None,
    batch_size: Optional[int] = None,
    max_workers: Optional[int] = None,
    solution_pattern: str = '\\\\boxed{(.*?)}',
    trace_pattern: Optional[str] = None
):

Initialize the self-improving cot pipeline.

Parameters:

  • reason_agent (ChatAgent): The chat agent used for generating and improving reasoning traces.
  • problems (List[Dict]): List of problem dictionaries to process.
  • max_iterations (int, optional): Maximum number of improvement iterations. If set to 0, the pipeline will generate an initial trace without any improvement iterations. (default: :obj:3)
  • score_threshold (Union[float, Dict[str, float]], optional): Quality threshold. Can be either a single float value applied to average score, or a dictionary mapping score dimensions to their thresholds. For example: {"correctness": 0.8, "coherence": 0.7}. If using reward model and threshold for a dimension is not specified, will use the default value 0.7. (default: :obj:0.7)
  • rejection_sampling_n (int, optional): Specifies the number of samples to be drawn using the rejection sampling method, where samples are accepted or rejected based on a predefined condition to achieve a desired distribution. (default: :obj: None)
  • evaluate_agent (Optional[ChatAgent]): The chat agent used for evaluating reasoning traces. (default: :obj:None)
  • reward_model (BaseRewardModel, optional): Model used to evaluate reasoning traces. If None, uses Agent self-evaluation. (default: :obj:None)
  • output_path (str, optional): Output path for saving traces. If None, results will only be returned without saving to file. (default: :obj:None)
  • few_shot_examples (str, optional): Examples to use for few-shot generation. (default: :obj:None)
  • batch_size (int, optional): Batch size for parallel processing. (default: :obj:None)
  • max_workers (int, optional): Maximum number of worker threads. (default: :obj:None)
  • solution_pattern (str, optional): Regular expression pattern with one capture group to extract answers from solution text. (default: :obj:r'\\boxed{(.*?)}')
  • trace_pattern (str, optional): Regular expression pattern with one capture group to extract answers from trace text. If None, uses the same pattern as solution_pattern. (default: :obj:None)

safe_write_json

def safe_write_json(self, file_path, data):

clean_json

def clean_json(self, data):

_check_score_threshold

def _check_score_threshold(self, scores: Dict[str, float]):

Check if scores meet the threshold requirements.

Parameters:

  • scores (Dict[str, float]): Dictionary of scores for different dimensions.

Returns:

bool: True if scores meet threshold requirements, False otherwise.

_generate_feedback

def _generate_feedback(self, scores: Dict[str, float]):

Generate feedback based on which dimensions need improvement.

Parameters:

  • scores (Dict[str, float]): Dictionary of scores for different dimensions.

Returns:

str: Feedback message indicating which dimensions need improvement.

generate_reasoning_trace

def generate_reasoning_trace(self, problem: str):

Generate initial reasoning trace for a given problem.

Parameters:

  • problem (str): The problem text to generate reasoning for.

Returns:

str: Generated reasoning trace.

evaluate_trace

def evaluate_trace(
    self,
    problem: str,
    trace: str,
    solution: Optional[str] = None
):

Evaluate the quality of a reasoning trace.

Parameters:

  • problem (str): The original problem text to evaluate against.
  • trace (str): The reasoning trace to evaluate.
  • solution (Optional[str]): The solution to the problem, if provided. (default: :obj:None)

Returns:

Dict[str, Any]: Evaluation results containing:

  • scores: Dict of evaluation dimensions and their scores
  • feedback: Detailed feedback for improvement

For Agent self-evaluation, the scores will include:

  • correctness: Score for logical correctness
  • clarity: Score for clarity of explanation
  • completeness: Score for completeness of reasoning

For reward model evaluation, the scores will depend on the model’s evaluation dimensions.

generate_reasoning_trace_rejection

def generate_reasoning_trace_rejection(self, problem: str):

Generate multiple candidate reasoning traces for a problem and select the best one based on evaluation.

Parameters:

  • problem (str): The problem text for generating a reasoning trace.

Returns:

str: The best candidate trace that meets quality criteria, or the first candidate if none qualify.

improve_trace

def improve_trace(
    self,
    problem: str,
    trace: str,
    feedback: str,
    solution: Optional[str] = None
):

Generate improved reasoning trace based on feedback.

Parameters:

  • problem (str): The original problem text.
  • trace (str): The current reasoning trace.
  • feedback (str): Feedback for improving the trace.
  • solution (Optional[str]): The solution to the problem, if provided. (default: :obj:None)

Returns:

str: Improved reasoning trace.

validate_problem_format

def validate_problem_format(self, problem: Dict):

Validate that a problem dictionary has the required format.

Parameters:

  • problem (Dict): Problem dictionary to validate.

_check_boxed_answers

def _check_boxed_answers(self, solution: str, trace: str):

Check if the answer in the trace matches the solution using the configured patterns.

Parameters:

  • solution (str): The problem solution string.
  • trace (str): The reasoning trace string.

Returns:

bool: True if answers match, False otherwise

process_problem

def process_problem(self, problem: Dict, rationalization: bool = False):

Process a single problem through the self-improving cot pipeline.

Parameters:

  • problem (Dict): Problem dictionary containing the problem text.
  • rationalization (bool, optional): Whether to use rationalization. (default: :obj:False)

Returns:

ProblemResult: Results with final trace and history.

generate

def generate(self, rationalization: bool = False):

Execute the self-improving cot pipeline on all problems.

Process problems and return results. If output_path is specified, also save results to file.

Parameters:

  • rationalization (bool, optional): Whether to use rationalization. (default: :obj:False)

Returns:

List[Dict[str, Any]]: List of processed results