BaseScorer

class BaseScorer(ABC):

score

def score(self, reference_prompt: str, candidate_prompt: str):

Compare a candidate prompt against a reference prompt and return a tuple of scores. The higher the score, the better. For example, (diversity, difficulty, feasibility).

MathScorer

class MathScorer(BaseScorer):

init

def __init__(self, agent: Optional[ChatAgent] = None):

score

def score(self, reference_problem: str, new_problem: str):

Evaluates the new math problem relative to the reference math problem.

Parameters:

  • reference_problem (str): The reference math problem.
  • new_problem (str): The new or evolved math problem.

Returns:

Dict[str, int]: A dictionary with scores for diversity, difficulty, validity, and solvability.

GeneralScorer

class GeneralScorer(BaseScorer):

init

def __init__(self, agent: Optional[ChatAgent] = None):

score

def score(self, reference_problem: str, new_problem: str):

Evaluates the new problem against the reference problem using structured scoring.

Parameters:

  • reference_problem (str): The original problem.
  • new_problem (str): The evolved or new problem.

Returns:

Dict[str, int]: A dictionary with scores for diversity, complexity, and validity.