OpenAIAudioModels
init
text_to_speech
- input (str): The text to be converted to speech.
- model_type (AudioModelType, optional): The TTS model to use. Defaults to
AudioModelType.TTS_1
. - voice (VoiceType, optional): The voice to be used for generating speech. Defaults to
VoiceType.ALLOY
. - storage_path (str, optional): The local path to store the generated speech file if provided, defaults to
None
. **kwargs (Any): Extra kwargs passed to the TTS API.
_split_audio
- audio_file_path (str): Path to the input audio file.
- chunk_size_mb (int, optional): Size of each chunk in megabytes. Defaults to
24
.
speech_to_text
- audio_file_path (str): The audio file path, supporting one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
- translate_into_english (bool, optional): Whether to translate the speech into English. Defaults to
False
. **kwargs (Any): Extra keyword arguments passed to the Speech-to-Text (STT) API.
audio_question_answering
- audio_file_path (str): The path to the audio file.
- question (str): The question to ask about the audio content.
- model (str, optional): The model to use for audio question answering. (default: :obj:
"gpt-4o-mini-audio-preview"
) **kwargs (Any): Extra keyword arguments passed to the chat completions API.