SGLangModel
- model_type (Union[ModelType, str]): Model for which a backend is created.
- model_config_dict (Optional[Dict[str, Any]], optional): A dictionary that will be fed into:obj:
openai.ChatCompletion.create()
. If :obj:None
, :obj:SGLangConfig().as_dict()
will be used. (default: :obj:None
) - api_key (Optional[str], optional): The API key for authenticating with the model service. SGLang doesn’t need API key, it would be ignored if set. (default: :obj:
None
) - url (Optional[str], optional): The url to the model service. If not provided, :obj:
"http://127.0.0.1:30000/v1"
will be used. (default: :obj:None
) - token_counter (Optional[BaseTokenCounter], optional): Token counter to use for the model. If not provided, :obj:
OpenAITokenCounter( ModelType.GPT_4O_MINI)
will be used. (default: :obj:None
) - timeout (Optional[float], optional): The timeout value in seconds for API calls. If not provided, will fall back to the MODEL_TIMEOUT environment variable or default to 180 seconds. (default: :obj:
None
) - max_retries (int, optional): Maximum number of retries for API calls. (default: :obj:
3
) **kwargs (Any): Additional arguments to pass to the client initialization. - Reference: https://sgl-project.github.io/backend/openai_api_completions. html
init
_start_server
_ensure_server_running
_monitor_inactivity
token_counter
_run
- messages (List[OpenAIMessage]): Message list with the chat history in OpenAI API format.
ChatCompletion
in the non-stream mode, or
Stream[ChatCompletionChunk]
in the stream mode.
stream
del
cleanup
_terminate_process
_kill_process_tree
_execute_shell_command
- command: Shell command as a string (can include \ line continuations)
_wait_for_server
- base_url (str): The base URL of the server
- timeout (Optional[float]): Maximum time to wait in seconds. (default: :obj:
30
)