Camel.retrievers.vector retriever
VectorRetriever
An implementation of the BaseRetriever
by using vector storage and
embedding model.
This class facilitates the retriever of relevant information using a query-based approach, backed by vector embeddings.
Attributes: embedding_model (BaseEmbedding): Embedding model used to generate vector embeddings. storage (BaseVectorStorage): Vector storage to query. unstructured_modules (UnstructuredIO): A module for parsing files and URLs and chunking content based on specified parameters.
init
Initializes the retriever class with an optional embedding model.
Parameters:
- embedding_model (Optional[BaseEmbedding]): The embedding model instance. Defaults to
OpenAIEmbedding
if not provided. - storage (BaseVectorStorage): Vector storage to query.
process
Processes content from local file path, remote URL, string
content, Element object, or a binary file object, divides it into
chunks by using Unstructured IO
, and stores their embeddings in the
specified vector storage.
Parameters:
- content (Union[str, Element, IO[bytes]]): Local file path, remote URL, string content, Element object, or a binary file object.
- chunk_type (str): Type of chunking going to apply. Defaults to “chunk_by_title”.
- max_characters (int): Max number of characters in each chunk. Defaults to
500
. - embed_batch (int): Size of batch for embeddings. Defaults to
50
. (default: 50) - should_chunk (bool): If True, divide the content into chunks, otherwise skip chunking. Defaults to True.
- extra_info (Optional[dict]): Extra information to be added to the payload. Defaults to None.
- metadata_filename (Optional[str]): The metadata filename to be used for storing metadata. Defaults to None. **kwargs (Any): Additional keyword arguments for content parsing.
query
Executes a query in vector storage and compiles the retrieved results into a dictionary.
Parameters:
- query (str): Query string for information retriever.
- similarity_threshold (float, optional): The similarity threshold for filtering results. Defaults to
DEFAULT_SIMILARITY_THRESHOLD
. - top_k (int, optional): The number of top results to return during retriever. Must be a positive integer. Defaults to
DEFAULT_TOP_K_RESULTS
.
Returns:
List[Dict[str, Any]]: Concatenated list of the query results.