ModelBackendMeta

class ModelBackendMeta(ABCMeta):

Metaclass that automatically preprocesses messages in run method.

Automatically wraps the run method of any class inheriting from BaseModelBackend to preprocess messages (remove <think> tags) before they are sent to the model.

new

def __new__(
    mcs,
    name,
    bases,
    namespace
):

Wraps run method with preprocessing if it exists in the class.

BaseModelBackend

class BaseModelBackend(ABC):

Base class for different model backends. It may be OpenAI API, a local LLM, a stub for unit tests, etc.

Parameters:

  • model_type (Union[ModelType, str]): Model for which a backend is created.
  • model_config_dict (Optional[Dict[str, Any]], optional): A config dictionary. (default: :obj:{})
  • api_key (Optional[str], optional): The API key for authenticating with the model service. (default: :obj:None)
  • url (Optional[str], optional): The url to the model service. (default: :obj:None)
  • token_counter (Optional[BaseTokenCounter], optional): Token counter to use for the model. If not provided, :obj:OpenAITokenCounter will be used. (default: :obj:None)
  • timeout (Optional[float], optional): The timeout value in seconds for API calls. (default: :obj:None)

init

def __init__(
    self,
    model_type: Union[ModelType, str],
    model_config_dict: Optional[Dict[str, Any]] = None,
    api_key: Optional[str] = None,
    url: Optional[str] = None,
    token_counter: Optional[BaseTokenCounter] = None,
    timeout: Optional[float] = None
):

token_counter

def token_counter(self):

Returns:

BaseTokenCounter: The token counter following the model’s tokenization style.

preprocess_messages

def preprocess_messages(self, messages: List[OpenAIMessage]):

Preprocess messages before sending to model API. Removes thinking content from assistant and user messages. Automatically formats messages for parallel tool calls if tools are detected.

Parameters:

  • messages (List[OpenAIMessage]): Original messages.

Returns:

List[OpenAIMessage]: Preprocessed messages

_run

def _run(
    self,
    messages: List[OpenAIMessage],
    response_format: Optional[Type[BaseModel]] = None,
    tools: Optional[List[Dict[str, Any]]] = None
):

run

def run(
    self,
    messages: List[OpenAIMessage],
    response_format: Optional[Type[BaseModel]] = None,
    tools: Optional[List[Dict[str, Any]]] = None
):

Runs the query to the backend model.

Parameters:

  • messages (List[OpenAIMessage]): Message list with the chat history in OpenAI API format.
  • response_format (Optional[Type[BaseModel]]): The response format to use for the model. (default: :obj:None)
  • tools (Optional[List[Tool]]): The schema of tools to use for the model for this request. Will override the tools specified in the model configuration (but not change the configuration). (default: :obj:None)

Returns:

Union[ChatCompletion, Stream[ChatCompletionChunk]]: ChatCompletion in the non-stream mode, or Stream[ChatCompletionChunk] in the stream mode.

check_model_config

def check_model_config(self):

count_tokens_from_messages

def count_tokens_from_messages(self, messages: List[OpenAIMessage]):

Count the number of tokens in the messages using the specific tokenizer.

Parameters:

  • messages (List[Dict]): message list with the chat history in OpenAI API format.

Returns:

int: Number of tokens in the messages.

_to_chat_completion

def _to_chat_completion(self, response: ParsedChatCompletion):

token_limit

def token_limit(self):

Returns:

int: The maximum token limit for the given model.

stream

def stream(self):

Returns:

bool: Whether the model is in stream mode.