AutoRetriever

class AutoRetriever:

Facilitates the automatic retrieval of information using a query-based approach with pre-defined elements.

Attributes: url_and_api_key (Optional[Tuple[str, str]]): URL and API key for accessing the vector storage remotely. vector_storage_local_path (Optional[str]): Local path for vector storage, if applicable. storage_type (Optional[StorageType]): The type of vector storage to use. Defaults to StorageType.QDRANT. embedding_model (Optional[BaseEmbedding]): Model used for embedding queries and documents. Defaults to OpenAIEmbedding().

init

def __init__(
    self,
    url_and_api_key: Optional[Tuple[str, str]] = None,
    vector_storage_local_path: Optional[str] = None,
    storage_type: Optional[StorageType] = None,
    embedding_model: Optional[BaseEmbedding] = None
):

_initialize_vector_storage

def _initialize_vector_storage(self, collection_name: Optional[str] = None):

Sets up and returns a vector storage instance with specified parameters.

Parameters:

  • collection_name (Optional[str]): Name of the collection in the vector storage.

Returns:

BaseVectorStorage: Configured vector storage instance.

_collection_name_generator

def _collection_name_generator(self, content: Union[str, 'Element']):

Generates a valid collection name from a given file path or URL.

Parameters:

  • content (Union[str, Element]): Local file path, remote URL, string content or Element object.

Returns:

str: A sanitized, valid collection name suitable for use.

run_vector_retriever

def run_vector_retriever(
    self,
    query: str,
    contents: Union[str, List[str], 'Element', List['Element']],
    top_k: int = Constants.DEFAULT_TOP_K_RESULTS,
    similarity_threshold: float = Constants.DEFAULT_SIMILARITY_THRESHOLD,
    return_detailed_info: bool = False,
    max_characters: int = 500
):

Executes the automatic vector retriever process using vector storage.

Parameters:

  • query (str): Query string for information retriever.
  • contents (Union[str, List[str], Element, List[Element]]): Local file paths, remote URLs, string contents or Element objects.
  • top_k (int, optional): The number of top results to return during retrieve. Must be a positive integer. Defaults to DEFAULT_TOP_K_RESULTS.
  • similarity_threshold (float, optional): The similarity threshold for filtering results. Defaults to DEFAULT_SIMILARITY_THRESHOLD.
  • return_detailed_info (bool, optional): Whether to return detailed information including similarity score, content path and metadata. Defaults to False.
  • max_characters (int): Max number of characters in each chunk. Defaults to 500.

Returns:

dict[str, Sequence[Collection[str]]]: By default, returns only the text information. If return_detailed_info is True, return detailed information including similarity score, content path and metadata.