CAMEL’s data generation modules for high-quality, instruction-tuned, and reasoning-rich datasets.
Key Features
Core Components
Solution Generation Process
Direct Solution Attempt
MCTS-Based Exploration
Error Detection & Correction
Solution Verification
Configuration Options
search_limit
: Maximum number of search iterations (default: 100)generator_agent
: Specialized agent for answer generationverifier_agent
: Specialized agent for answer verificationgolden_answers
: Pre-defined correct answers for validationOutput Format
Key Features
Core Components
Pipeline Stages
Seed Loading
Instruction Generation
Task Classification
Instance Generation
Data Output
Pipeline Parameters
agent
: ChatAgent instance for generating instructionsseed
: Path to human-written seed tasks in JSONL formatnum_machine_instructions
: Number of machine-generated instructions (default: 5)data_output_path
: Path for saving generated data (default: ./data_output.json
)human_to_machine_ratio
: Ratio of human to machine tasks (default: (6, 2))instruction_filter
: Custom InstructionFilter
instance (optional)filter_config
: Configuration dictionary for default filters (optional)Filter Configuration
Input/Output Format
Core Components
Key Features
ProcessorConfig Parameters
seed
: Random seed for reproducibilitymin_length
: Minimum text length for processingmax_length
: Maximum text length for processingcomplexity_threshold
: Minimum complexity score (0.0–1.0)dataset_size
: Target size for the final datasetuse_ai_model
: Toggle between AI model and rule-based generationhop_generating_agent
: Custom MultiHopGeneratorAgent
(optional)Pipeline Stages
Text Preprocessing
Information Extraction
QA Generation
Dataset Curation
Key Components
Architecture Stages
Initial Reasoning Trace Generation
Self-Evaluation
Feedback-Based Improvement
Iterative Refinement
Input/Output Format
Configuration Options
max_iterations
: Maximum number of improvement iterations (default: 3)score_threshold
: Minimum quality thresholds for evaluation dimensions (default: 0.7)few_shot_examples
: (Optional) Examples for few-shot learningoutput_path
: (Optional) Path for saving generated results