Camel.datagen.self improving cot - CAMEL-AI Documentation

SelfImprovingCoTPipeline

class SelfImprovingCoTPipeline:

Pipeline for generating self-taught reasoning traces using the self-improving methodology. This implements the STaR paper’s approach of:

Initial reasoning trace generation
Self-evaluation
Feedback-based improvement
Iterative refinement

init

def __init__(
    self,
    reason_agent: ChatAgent,
    problems: List[Dict],
    max_iterations: int = 3,
    score_threshold: Union[float, Dict[str, float]] = 0.7,
    rejection_sampling_n: Optional[int] = None,
    evaluate_agent: Optional[ChatAgent] = None,
    reward_model: Optional[BaseRewardModel] = None,
    output_path: Optional[str] = None,
    few_shot_examples: Optional[str] = None,
    batch_size: Optional[int] = None,
    max_workers: Optional[int] = None,
    solution_pattern: str = '\\\\boxed{(.*?)}',
    trace_pattern: Optional[str] = None
):

Initialize the self-improving cot pipeline. Parameters:

reason_agent (ChatAgent): The chat agent used for generating and improving reasoning traces.
problems (List[Dict]): List of problem dictionaries to process.
max_iterations (int, optional): Maximum number of improvement iterations. If set to 0, the pipeline will generate an initial trace without any improvement iterations. (default: :obj:3)
score_threshold (Union[float, Dict[str, float]], optional): Quality threshold. Can be either a single float value applied to average score, or a dictionary mapping score dimensions to their thresholds. For example: {"correctness": 0.8, "coherence": 0.7}. If using reward model and threshold for a dimension is not specified, will use the default value 0.7. (default: :obj:0.7)
rejection_sampling_n (int, optional): Specifies the number of samples to be drawn using the rejection sampling method, where samples are accepted or rejected based on a predefined condition to achieve a desired distribution. (default: :obj: None)
evaluate_agent (Optional[ChatAgent]): The chat agent used for evaluating reasoning traces. (default: :obj:None)
reward_model (BaseRewardModel, optional): Model used to evaluate reasoning traces. If None, uses Agent self-evaluation. (default: :obj:None)
output_path (str, optional): Output path for saving traces. If None, results will only be returned without saving to file. (default: :obj:None)
few_shot_examples (str, optional): Examples to use for few-shot generation. (default: :obj:None)
batch_size (int, optional): Batch size for parallel processing. (default: :obj:None)
max_workers (int, optional): Maximum number of worker threads. (default: :obj:None)
solution_pattern (str, optional): Regular expression pattern with one capture group to extract answers from solution text. (default: :obj:r'\\boxed{(.*?)}')
trace_pattern (str, optional): Regular expression pattern with one capture group to extract answers from trace text. If None, uses the same pattern as solution_pattern. (default: :obj:None)

safe_write_json

def safe_write_json(self, file_path, data):

clean_json

def clean_json(self, data):

_check_score_threshold

def _check_score_threshold(self, scores: Dict[str, float]):

Check if scores meet the threshold requirements. Parameters:

scores (Dict[str, float]): Dictionary of scores for different dimensions.

Returns: bool: True if scores meet threshold requirements, False otherwise.

_generate_feedback

def _generate_feedback(self, scores: Dict[str, float]):

Generate feedback based on which dimensions need improvement. Parameters:

scores (Dict[str, float]): Dictionary of scores for different dimensions.

Returns: str: Feedback message indicating which dimensions need improvement.

generate_reasoning_trace

def generate_reasoning_trace(self, problem: str):

Generate initial reasoning trace for a given problem. Parameters:

problem (str): The problem text to generate reasoning for.

Returns: str: Generated reasoning trace.

evaluate_trace

def evaluate_trace(
    self,
    problem: str,
    trace: str,
    solution: Optional[str] = None
):

Evaluate the quality of a reasoning trace. Parameters:

problem (str): The original problem text to evaluate against.
trace (str): The reasoning trace to evaluate.
solution (Optional[str]): The solution to the problem, if provided. (default: :obj:None)

Returns: Dict[str, Any]: Evaluation results containing:

scores: Dict of evaluation dimensions and their scores
feedback: Detailed feedback for improvement

For Agent self-evaluation, the scores will include:

correctness: Score for logical correctness
clarity: Score for clarity of explanation
completeness: Score for completeness of reasoning

For reward model evaluation, the scores will depend on the model’s evaluation dimensions.

generate_reasoning_trace_rejection

def generate_reasoning_trace_rejection(self, problem: str):

Generate multiple candidate reasoning traces for a problem and select the best one based on evaluation. Parameters:

problem (str): The problem text for generating a reasoning trace.

Returns: str: The best candidate trace that meets quality criteria, or the first candidate if none qualify.

improve_trace

def improve_trace(
    self,
    problem: str,
    trace: str,
    feedback: str,
    solution: Optional[str] = None
):

Generate improved reasoning trace based on feedback. Parameters:

problem (str): The original problem text.
trace (str): The current reasoning trace.
feedback (str): Feedback for improving the trace.
solution (Optional[str]): The solution to the problem, if provided. (default: :obj:None)

Returns: str: Improved reasoning trace.

validate_problem_format

def validate_problem_format(self, problem: Dict):

Validate that a problem dictionary has the required format. Parameters:

problem (Dict): Problem dictionary to validate.

_check_boxed_answers

def _check_boxed_answers(self, solution: str, trace: str):

Check if the answer in the trace matches the solution using the configured patterns. Parameters:

solution (str): The problem solution string.
trace (str): The reasoning trace string.

Returns: bool: True if answers match, False otherwise

process_problem

def process_problem(self, problem: Dict, rationalization: bool = False):

Process a single problem through the self-improving cot pipeline. Parameters:

problem (Dict): Problem dictionary containing the problem text.
rationalization (bool, optional): Whether to use rationalization. (default: :obj:False)

Returns: ProblemResult: Results with final trace and history.

generate

def generate(self, rationalization: bool = False):

Execute the self-improving cot pipeline on all problems. Process problems and return results. If output_path is specified, also save results to file. Parameters:

rationalization (bool, optional): Whether to use rationalization. (default: :obj:False)

Returns: List[Dict[str, Any]]: List of processed results

Overview

Agents

Configs

Data Generation

Datasets

Embeddings

Models

Interpreters

Memory

Messages

Prompts

Responses

Retrievers

Societies

Storage

Tasks

Terminators

Toolkits

Types

Verifiers

Bots

Utilities

Environments

Extractors

Personas

Benchmarks

Data Collectors

Datahubs

Loaders

Runtimes

Schemas

​SelfImprovingCoTPipeline

​init

​safe_write_json

​clean_json

​_check_score_threshold

​_generate_feedback

​generate_reasoning_trace

​evaluate_trace

​generate_reasoning_trace_rejection

​improve_trace

​validate_problem_format

​_check_boxed_answers

​process_problem

​generate