reason_agent
: This agent is responsible for generating or improving reasoning traces.evaluate_agent
: An optional agent that evaluates the quality of the reasoning trace. This can be replaced by a reward model if needed.reward_model
: An optional model that provides numerical feedback on the trace, evaluating dimensions such as correctness, coherence, complexity, and verbosity.reason_agent
plays a central role here, creating a coherent and logical explanation of how to solve a given problem. The agent breaks down the problem into smaller steps, illustrating the thought process at each stage. We also support the use of non-reasoning LLMs to generate traces through prompt engineering.
The generation could also guided by few-shot examples, which provide context and help the agent understand the desired reasoning style. Hereβs how this is accomplished:
reason_agent
, we can optionally provide the ground truth to guide the reasoning process.evaluate_agent
or a reward_model
.evaluate_agent
is available, it examines the reasoning trace for:
reason_agent
is used again to generate an improved version of the reasoning trace. This trace incorporates the feedback provided, refining the earlier steps and enhancing the overall reasoning.
SelfImprovingCoTPipeline
:
BatchProcessor
to:
batch_size
) based on the success/failure rates and system resource usage (CPU/memory).retry_on_error
decorator:
ChatAgent
is useful for handling errors and ensuring smooth operation. This setup allows the system to switch between models as needed, maintaining reasoning continuity.
To add models to a ChatAgent
, you can create instances of models and include them in the agentβs model list:
output_path
) and update it in-placeβrather than saving everything at the very end. This ensures data integrity if the process is interrupted partway through.
.tmp
file then replace) prevents partial writes from corrupting the output file.