MoveExtractor
Opponent
init
- play_style (
Literal["optimal", "random"]
): The strategy to use, either “optimal” or “random”. (default: :obj:"optimal"
)
select_move
- board (List[str]): The current game board as a list of strings.
get_optimal_move
- board (List[str]): The current game board as a list of strings.
minimax
- board (List[str]): The current game board as a list of strings.
- is_maximizing (bool): True if maximizing player (O), False if minimizing (X).
- depth (int): Current depth in the search tree. (default: :obj:
0
) (default: 0) - alpha (float): Alpha value for pruning. (default: :obj:
-math.inf
) (default: -math.inf) - beta (float): Beta value for pruning. (default: :obj:
math.inf
) (default: math.inf)
- float: The score of the best move (1 for O win, -1 for X win, 0 for draw)
- Optional[int]: The index of the best move, or None if terminal state
TicTacToeEnv
init
- extractor (Optional[BaseExtractor]): Extractor to process LLM responses. If None, a default extractor with MoveExtractor will be used. (default: :obj:
None
) - max_steps (Optional[int]): Maximum steps per episode. (default: :obj:
None
) - play_style (
Literal["optimal", "random"]
): The strategy for the opponent to use, either “optimal” or “random”. (default: :obj:"optimal"
) **kwargs: Additional environment parameters.
_get_initial_state
_get_next_observation
_get_terminal_observation
evaluate_position_for_x
- board (List[str]): The current game board as a list of strings.
- is_x_turn (bool): True if it’s X’s turn to move, False otherwise.
- 1.0 if X has a winning position
- 0.0 if O has a winning position
- 0.5 for a draw
- For ongoing positions, returns the expected outcome with perfect play
_is_done
available_moves
- board (List[str]): The current game board as a list of strings.
check_winner
- board (List[str]): The current game board as a list of strings.
render_board
- board (List[str]): The current game board as a list of strings.