camel.environments package#

Submodules#

camel.environments.base module#

Module contents#

class camel.environments.Action(*, index: int | None = None, llm_response: str, metadata: Dict[str, Any] = None, timestamp: datetime = None)[source]#

Bases: BaseModel

Represents an action taken in an environment.

This class defines the input context, the LLM-generated output, and metadata required for verification and tracking within an RL framework.

llm_response#

The response generated by the LLM.

Type:

str

metadata#

Additional metadata such as model parameters, prompt details, or response confidence scores.

Type:

Dict[str, Any]

timestamp#

The timestamp when the action was generated (UTC).

Type:

datetime

index: int | None#
llm_response: str#
metadata: Dict[str, Any]#
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}#

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'index': FieldInfo(annotation=Union[int, NoneType], required=False, default=None, description='...'), 'llm_response': FieldInfo(annotation=str, required=True, description='Generated response from the LLM'), 'metadata': FieldInfo(annotation=Dict[str, Any], required=False, default_factory=dict, description='Additional metadata about the generation'), 'timestamp': FieldInfo(annotation=datetime, required=False, default_factory=<lambda>, description='When the response was generated (UTC)')}#

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

timestamp: datetime#
class camel.environments.Environment(*args, **kwargs)[source]#

Bases: Protocol

async close() None[source]#

Perform a full cleanup of all environment resources.

async reset() Observation[source]#

Reset the environment to an initial state.

Returns:

Initial observation for the episode

async step(action: Action) StepResult[source]#

Take a step in the environment.

Parameters:
  • action – Action containing everything that is needed

  • environment (to progress in the)

Returns:

StepResult containing next observation, reward, done flag, and info

class camel.environments.MultiStepEnv(extractor: BaseExtractor, max_steps: int | None = None, **kwargs)[source]#

Bases: ABC

A multi-step environment for reinforcement learning with LLMs.

async close() None[source]#

Clean up and close all resources used by the environment. This method shuts down the verifier, calls the internal close function that is implemented in any MultiStepEnv, and ensures that the environment is properly closed.

Raises:

Exception – If an error occurs while closing the environment.

abstract async compute_reward() Tuple[float, Dict[str, float]][source]#
property current_step: int#

Get the current step number.

Returns:

The number of the step we are currently in.

Return type:

int

is_done() bool[source]#

Check if the episode should terminate.

This function terminates the episode if the maximum number of steps is reached or if any other terminating criterion is met.

Returns:

A boolean flag.

Return type:

bool

property metadata: Dict[str, Any]#

Retrieve the metadata of the environment.

This provides additional parameters and configuration details.

Returns:

A copy of the environment’s metadata.

Return type:

Dict[str, Any]

async reset() Observation[source]#

Reset the environment to an initial state.

Returns:

The initial observation for the episode.

Return type:

Observation

Raises:

RuntimeError – If we fail to get the initial observation.

async setup() None[source]#

Set up the environment by initializing the verifier and extractor.

This method ensures that the environment is ready for interaction. It sets up necessary components, including the verifier and extractor.

Raises:

Exception – If setup fails due to an internal error.

async step(action: Action) StepResult[source]#

Take a step in the environment using the given action.

This method updates the environment state based on the LLM’s response, computes rewards, checks if the episode is done, and based on that gets the next or final observation.

Parameters:

action (Action) – The action containing the LLM response.

Returns:

StepResult containing next observation, total reward, a dictionary

of rewards, done flag, and info.

Raises:

RuntimeError – If the environment is not set up, the episode has ended, or there is no valid current observation.

class camel.environments.Observation(*, question: str, context: Dict[str, Any] = None, metadata: Dict[str, Any] | None = None)[source]#

Bases: BaseModel

Environment observation.

question#

The question posed to the LLM.

Type:

str

context#

Additional context for the question.

Type:

Dict[str, Any]

metadata#

Optional metadata about the observation.

Type:

Dict[str, Any] | None

context: Dict[str, Any]#
metadata: Dict[str, Any] | None#
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}#

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'context': FieldInfo(annotation=Dict[str, Any], required=False, default_factory=dict, description='Additional context for the question'), 'metadata': FieldInfo(annotation=Union[Dict[str, Any], NoneType], required=False, default=None, description='Optional metadata about the observation'), 'question': FieldInfo(annotation=str, required=True, description='The question posed to the LLM')}#

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

question: str#
class camel.environments.SingleStepEnv(dataset: StaticDataset | BaseGenerator, verifier: BaseVerifier, **kwargs)[source]#

Bases: object

A lightweight environment for single-step RL with LLMs as policy.

This environment models a single interaction between an LLM-based agent and a problem drawn from a dataset—such as a question-answering or math problem—where the agent produces one response and receives feedback.

Core Flow:
  • A question is sampled from a (possibly infinitely long) dataset.

  • The LLM generates a single-step response (the action).

  • The response is verified against the ground truth.

  • A reward is computed based on correctness and optional custom logic.

Key Features:
  • Batched evaluation with per-sample state tracking.

  • Async setup and teardown for verifiers and related resources.

  • Supports deterministic sampling via local RNG (optional seed).

  • Extensible reward computation via subclassing.

ACCURACY_REWARD = 10#
PLACEHOLDER_OBS = Observation(question='Episode ended. This is just a placeholder.', context={}, metadata=None)#
async close() None[source]#

Clean up and close all resources used by the environment.

This method shuts down the verifier, resets the internal state, and ensures that the environment is properly closed.

Raises:

Exception – If an error occurs while closing the environment.

property metadata: Dict[str, Any]#

Retrieve the metadata of the environment.

This provides additional parameters and configuration details.

Returns:

A copy of the environment’s metadata.

Return type:

Dict[str, Any]

async reset(batch_size: int = 1, seed: int | None = None) Observation | List[Observation][source]#

Resets the environment and starts a new episode.

This method samples a new batch of data points from the dataset and returns the corresponding initial observations.

If a seed is provided, a local random number generator is initialized for deterministic sampling. The global random state is not affected.

Parameters:
  • batch_size (int) – Number of data points to sample. (default: 1)

  • seed (Optional[int]) – Seed for deterministic sampling. If None, sampling is non-deterministic. (default: None)

Returns:

Initial observation(s) for the

episode.

Return type:

Observation or List[Observation]

Raises:
  • RuntimeError – If called before all previous states are processed.

  • ValueError – If batch size exceeds dataset size.

  • TypeError – If the dataset is of an unsupported type.

async setup() None[source]#

Set up the environment by initializing the verifier.

This method ensures that the environment is ready for interaction. It sets up necessary components, including the verifier.

Raises:

Exception – If setup fails due to an internal error.

async step(action: Action | List[Action]) Tuple[Observation, float, bool, Dict[str, Any]] | List[Tuple[Observation, float, bool, Dict[str, Any]]][source]#

Process actions for a subset of states and update their finished status.

Parameters:

action – Single action (for batch_size=1 or micro-batch of size 1) or list of actions (for batch_size>=2 with multiple actions). Each action must have an index for batch_size>=2, indicating which state it corresponds to.

Returns:

StepResult or list of

StepResults for the processed states.

Return type:

Union[StepResult, List[StepResult]]

Raises:
  • RuntimeError – If environment isn’t set up or episode has ended.

  • ValueError – If indices are invalid, duplicate, or correspond to finished states.

class camel.environments.StepResult(*, observation: Observation, reward: float, rewards_dict: Dict[str, float] = None, done: bool, info: Dict[str, Any] = None)[source]#

Bases: BaseModel

Result of an environment step.

observation#

The next observation.

Type:

camel.environments.models.Observation

reward#

Dictionary of reward scores for different aspects.

Type:

float

done#

Whether the episode is complete.

Type:

bool

info#

Additional information about the step.

Type:

Dict[str, Any]

as_tuple() Tuple[Observation, float, bool, Dict[str, Any]][source]#

Returns all fields of the model as a tuple, in declaration order

done: bool#
info: Dict[str, Any]#
model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}#

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'done': FieldInfo(annotation=bool, required=True, description='Whether the episode is complete'), 'info': FieldInfo(annotation=Dict[str, Any], required=False, default_factory=dict, description='Additional information about the step'), 'observation': FieldInfo(annotation=Observation, required=True, description='The next observation'), 'reward': FieldInfo(annotation=float, required=True, description='Total reward of the action'), 'rewards_dict': FieldInfo(annotation=Dict[str, float], required=False, default_factory=dict, description='Dictionary of reward scores for different aspects')}#

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

observation: Observation#
reward: float#
rewards_dict: Dict[str, float]#