Camel.environments.tic tac toe
MoveExtractor
A strategy for extracting Tic Tac Toe actions from text.
Opponent
AI opponent for the Tic Tac Toe game.
This class implements different playing strategies for the AI opponent, including an optimal strategy using the minimax algorithm with alpha-beta pruning, and a random strategy.
init
Initialize the opponent with a specific play style.
Parameters:
- play_style (
Literal["optimal", "random"]
): The strategy to use, either “optimal” or “random”. (default: :obj:"optimal"
)
select_move
Select a move based on the opponent’s play style.
Parameters:
- board (List[str]): The current game board as a list of strings.
Returns:
Optional[int]: The index of the selected move, or None if no move is available.
get_optimal_move
Get the optimal move using the minimax algorithm.
Parameters:
- board (List[str]): The current game board as a list of strings.
Returns:
Optional[int]: The index of the optimal move, or None if no move is available.
minimax
Minimax algorithm with alpha-beta pruning for optimal move selection.
Recursively evaluates all possible moves to find the best one. Uses alpha-beta pruning to reduce the search space.
Parameters:
- board (List[str]): The current game board as a list of strings.
- is_maximizing (bool): True if maximizing player (O), False if minimizing (X).
- depth (int): Current depth in the search tree. (default: :obj:
0
) (default: 0) - alpha (float): Alpha value for pruning. (default: :obj:
-math.inf
) (default: -math.inf) - beta (float): Beta value for pruning. (default: :obj:
math.inf
) (default: math.inf)
Returns:
Tuple[float, Optional[int]]: A tuple containing:
- float: The score of the best move (1 for O win, -1 for X win, 0 for draw)
- Optional[int]: The index of the best move, or None if terminal state
TicTacToeEnv
A Tic Tac Toe environment for reinforcement learning with LLMs.
This environment implements a standard Tic Tac Toe game where the LLM agent plays as ‘X’ against an AI opponent that plays as ‘O’. The opponent can use either an optimal strategy (minimax with alpha-beta pruning) or a random strategy.
init
Initialize the Tic Tac Toe environment.
Parameters:
- extractor (Optional[BaseExtractor]): Extractor to process LLM responses. If None, a default extractor with MoveExtractor will be used. (default: :obj:
None
) - max_steps (Optional[int]): Maximum steps per episode. (default: :obj:
None
) - play_style (
Literal["optimal", "random"]
): The strategy for the opponent to use, either “optimal” or “random”. (default: :obj:"optimal"
) **kwargs: Additional environment parameters.
_get_initial_state
Returns:
Dict[str, Any]: A dictionary containing the initial state with an empty board, game status flags, and move history.
_get_next_observation
Returns:
Observation: An Observation object containing the game state description.
_get_terminal_observation
Returns:
Observation: An Observation object containing the final game state description.
evaluate_position_for_x
Evaluate the current board position from X’s perspective.
Uses minimax to determine the value of the position.
Parameters:
- board (List[str]): The current game board as a list of strings.
- is_x_turn (bool): True if it’s X’s turn to move, False otherwise.
Returns:
float: A float value representing the position evaluation:
- 1.0 if X has a winning position
- 0.0 if O has a winning position
- 0.5 for a draw
- For ongoing positions, returns the expected outcome with perfect play
_is_done
Returns:
True if the game is over, False otherwise.
available_moves
Get all available moves on the board.
Parameters:
- board (List[str]): The current game board as a list of strings.
Returns:
List[int]: A list of indices representing empty cells on the board.
check_winner
Check if there is a winner or a draw on the board.
Parameters:
- board (List[str]): The current game board as a list of strings.
Returns:
Optional[Literal[“X”, “O”, “draw”]]: “X” if X has won, “O” if O has won, “draw” if the game is a draw, or None if the game is still ongoing.
render_board
Render the board as a string for display.
Parameters:
- board (List[str]): The current game board as a list of strings.
Returns:
str: A formatted string representation of the board.