OpenAICompatibleModel
- model_type (Union[ModelType, str]): Model for which a backend is created.
- model_config_dict (Optional[Dict[str, Any]], optional): A dictionary that will be fed into:obj:
openai.ChatCompletion.create(). If :obj:None, :obj:{}will be used. (default: :obj:None) - api_key (str): The API key for authenticating with the model service.
- url (str): The url to the model service.
- token_counter (Optional[BaseTokenCounter], optional): Token counter to use for the model. If not provided, :obj:
OpenAITokenCounter( ModelType.GPT_4O_MINI)will be used. (default: :obj:None) - timeout (Optional[float], optional): The timeout value in seconds for API calls. If not provided, will fall back to the MODEL_TIMEOUT environment variable or default to 180 seconds. (default: :obj:
None) - max_retries (int, optional): Maximum number of retries for API calls. (default: :obj:
3) - client (Optional[Any], optional): A custom synchronous OpenAI-compatible client instance. If provided, this client will be used instead of creating a new one. Useful for RL frameworks like AReaL or rLLM that provide OpenAI-compatible clients (e.g., ArealOpenAI). The client should implement the OpenAI client interface with
.chat.completions.create()and.beta.chat. completions.parse()methods. (default: :obj:None) - async_client (Optional[Any], optional): A custom asynchronous OpenAI-compatible client instance. If provided, this client will be used instead of creating a new one. The client should implement the AsyncOpenAI client interface. (default: :obj:
None) **kwargs (Any): Additional arguments to pass to the OpenAI client initialization. These can include parameters like ‘organization’, ‘default_headers’, ‘http_client’, etc. Ignored if custom clients are provided.
init
_run
- messages (List[OpenAIMessage]): Message list with the chat history in OpenAI API format.
- response_format (Optional[Type[BaseModel]]): The format of the response.
- tools (Optional[List[Dict[str, Any]]]): The schema of the tools to use for the request.
ChatCompletion in the non-stream mode, or
Stream[ChatCompletionChunk] in the stream mode.
ChatCompletionStreamManager[BaseModel] for
structured output streaming.