Camel.environments.tic tac toe - CAMEL-AI Documentation

MoveExtractor

class MoveExtractor(BaseExtractorStrategy):

A strategy for extracting Tic Tac Toe actions from text.

Opponent

class Opponent:

AI opponent for the Tic Tac Toe game.

This class implements different playing strategies for the AI opponent, including an optimal strategy using the minimax algorithm with alpha-beta pruning, and a random strategy.

init

def __init__(self, play_style: Literal['optimal', 'random'] = 'optimal'):

Initialize the opponent with a specific play style.

Parameters:

play_style (Literal["optimal", "random"]): The strategy to use, either “optimal” or “random”. (default: :obj:"optimal")

select_move

def select_move(self, board: List[str]):

Select a move based on the opponent’s play style.

Parameters:

board (List[str]): The current game board as a list of strings.

Returns:

Optional[int]: The index of the selected move, or None if no move is available.

get_optimal_move

def get_optimal_move(self, board: List[str]):

Get the optimal move using the minimax algorithm.

Parameters:

board (List[str]): The current game board as a list of strings.

Returns:

Optional[int]: The index of the optimal move, or None if no move is available.

minimax

def minimax(
    self,
    board: List[str],
    is_maximizing: bool,
    depth: int = 0,
    alpha: float = -math.inf,
    beta: float = math.inf
):

Minimax algorithm with alpha-beta pruning for optimal move selection.

Recursively evaluates all possible moves to find the best one. Uses alpha-beta pruning to reduce the search space.

Parameters:

board (List[str]): The current game board as a list of strings.
is_maximizing (bool): True if maximizing player (O), False if minimizing (X).
depth (int): Current depth in the search tree. (default: :obj:0) (default: 0)
alpha (float): Alpha value for pruning. (default: :obj:-math.inf) (default: -math.inf)
beta (float): Beta value for pruning. (default: :obj:math.inf) (default: math.inf)

Returns:

Tuple[float, Optional[int]]: A tuple containing:

float: The score of the best move (1 for O win, -1 for X win, 0 for draw)
Optional[int]: The index of the best move, or None if terminal state

TicTacToeEnv

class TicTacToeEnv(MultiStepEnv):

A Tic Tac Toe environment for reinforcement learning with LLMs.

This environment implements a standard Tic Tac Toe game where the LLM agent plays as ‘X’ against an AI opponent that plays as ‘O’. The opponent can use either an optimal strategy (minimax with alpha-beta pruning) or a random strategy.

init

def __init__(
    self,
    extractor: Optional[BaseExtractor] = None,
    max_steps: Optional[int] = None,
    play_style: Literal['optimal', 'random'] = 'optimal',
    **kwargs
):

Initialize the Tic Tac Toe environment.

Parameters:

extractor (Optional[BaseExtractor]): Extractor to process LLM responses. If None, a default extractor with MoveExtractor will be used. (default: :obj:None)
max_steps (Optional[int]): Maximum steps per episode. (default: :obj:None)
play_style (Literal["optimal", "random"]): The strategy for the opponent to use, either “optimal” or “random”. (default: :obj:"optimal") **kwargs: Additional environment parameters.

_get_initial_state

def _get_initial_state(self):

Returns:

Dict[str, Any]: A dictionary containing the initial state with an empty board, game status flags, and move history.

_get_next_observation

def _get_next_observation(self):

Returns:

Observation: An Observation object containing the game state description.

_get_terminal_observation

def _get_terminal_observation(self):

Returns:

Observation: An Observation object containing the final game state description.

evaluate_position_for_x

def evaluate_position_for_x(
    board: List[str],
    is_x_turn: bool,
    depth: int = 0,
    max_depth: int = 10
):

Evaluate the current board position from X’s perspective.

Uses minimax to determine the value of the position.

Parameters:

board (List[str]): The current game board as a list of strings.
is_x_turn (bool): True if it’s X’s turn to move, False otherwise.

Returns:

float: A float value representing the position evaluation:

1.0 if X has a winning position
0.0 if O has a winning position
0.5 for a draw
For ongoing positions, returns the expected outcome with perfect play

_is_done

def _is_done(self):

Returns:

True if the game is over, False otherwise.

available_moves

def available_moves(board: List[str]):

Get all available moves on the board.

Parameters:

board (List[str]): The current game board as a list of strings.

Returns:

List[int]: A list of indices representing empty cells on the board.

check_winner

def check_winner(board: List[str]):

Check if there is a winner or a draw on the board.

Parameters:

board (List[str]): The current game board as a list of strings.

Returns:

Optional[Literal[“X”, “O”, “draw”]]: “X” if X has won, “O” if O has won, “draw” if the game is a draw, or None if the game is still ongoing.

render_board

def render_board(self, board: List[str]):

Render the board as a string for display.

Parameters:

board (List[str]): The current game board as a list of strings.

Returns:

str: A formatted string representation of the board.

Camel.environments.single step Camel.extractors.base

On this page

MoveExtractor
Opponent
init
select_move
get_optimal_move
minimax
TicTacToeEnv
init
_get_initial_state
_get_next_observation
_get_terminal_observation
evaluate_position_for_x
_is_done
available_moves
check_winner
render_board

Overview

Agents

Configs

Data Generation

Datasets

Embeddings

Models

Interpreters

Memory

Messages

Prompts

Responses

Retrievers

Societies

Storage

Tasks

Terminators

Toolkits

Types

Verifiers

Bots

Runtime

Utilities

Environments

Extractors

Personas

Benchmarks

Data Collector

Datahubs

Loaders

Schemas

​MoveExtractor

​Opponent

​init

​select_move

​get_optimal_move

​minimax

​TicTacToeEnv

​init

​_get_initial_state

​_get_next_observation

​_get_terminal_observation

​evaluate_position_for_x

​_is_done

​available_moves

​check_winner

​render_board

MoveExtractor

Opponent

init

select_move

get_optimal_move

minimax

TicTacToeEnv

init

_get_initial_state

_get_next_observation

_get_terminal_observation

evaluate_position_for_x

_is_done

available_moves

check_winner

render_board