Camel.retrievers.vector retriever

VectorRetriever

class VectorRetriever(BaseRetriever):

An implementation of the BaseRetriever by using vector storage and embedding model. This class facilitates the retriever of relevant information using a query-based approach, backed by vector embeddings. Parameters:

embedding_model (BaseEmbedding): Embedding model used to generate vector embeddings.
storage (BaseVectorStorage): Vector storage to query.
unstructured_modules (UnstructuredIO): A module for parsing files and URLs and chunking content based on specified parameters.

init

def __init__(
    self,
    embedding_model: Optional[BaseEmbedding] = None,
    storage: Optional[BaseVectorStorage] = None
):

Initializes the retriever class with an optional embedding model. Parameters:

embedding_model (Optional[BaseEmbedding]): The embedding model instance. Defaults to OpenAIEmbedding if not provided.
storage (BaseVectorStorage): Vector storage to query.

process

def process(
    self,
    content: Union[str, 'Element', IO[bytes]],
    chunk_type: str = 'chunk_by_title',
    max_characters: int = 500,
    embed_batch: int = 50,
    should_chunk: bool = True,
    extra_info: Optional[dict] = None,
    metadata_filename: Optional[str] = None,
    chunker: Optional[BaseChunker] = None,
    **kwargs: Any
):

Processes content from local file path, remote URL, string content, Element object, or a binary file object, divides it into chunks by using Unstructured IO, and stores their embeddings in the specified vector storage. Parameters:

content (Union[str, Element, IO[bytes]]): Local file path, remote URL, string content, Element object, or a binary file object.
chunk_type (str): Type of chunking going to apply. Defaults to “chunk_by_title”.
max_characters (int): Max number of characters in each chunk. Defaults to 500.
embed_batch (int): Size of batch for embeddings. Defaults to 50. (default: 50)
should_chunk (bool): If True, divide the content into chunks, otherwise skip chunking. Defaults to True.
extra_info (Optional[dict]): Extra information to be added to the payload. Defaults to None.
metadata_filename (Optional[str]): The metadata filename to be used for storing metadata. Defaults to None. **kwargs (Any): Additional keyword arguments for content parsing.

query

def query(
    self,
    query: str,
    top_k: int = Constants.DEFAULT_TOP_K_RESULTS,
    similarity_threshold: float = Constants.DEFAULT_SIMILARITY_THRESHOLD
):

Executes a query in vector storage and compiles the retrieved results into a dictionary. Parameters:

query (str): Query string for information retriever.
similarity_threshold (float, optional): The similarity threshold for filtering results. Defaults to DEFAULT_SIMILARITY_THRESHOLD.
top_k (int, optional): The number of top results to return during retriever. Must be a positive integer. Defaults to DEFAULT_TOP_K_RESULTS.

Returns: List[Dict[str, Any]]: Concatenated list of the query results.

Overview

Agents

Configs

Data Generation

Datasets

Embeddings

Models

Interpreters

Memory

Messages

Prompts

Responses

Retrievers

Societies

Storage

Tasks

Terminators

Toolkits

Types

Verifiers

Bots

Utilities

Environments

Extractors

Personas

Benchmarks

Data Collectors

Datahubs

Loaders

Runtimes

Schemas

VectorRetriever

init

process

query

Overview

Agents

Configs

Data Generation

Datasets

Embeddings

Models

Interpreters

Memory

Messages

Prompts

Responses

Retrievers

Societies

Storage

Tasks

Terminators

Toolkits

Types

Verifiers

Bots

Utilities

Environments

Extractors

Personas

Benchmarks

Data Collectors

Datahubs

Loaders

Runtimes

Schemas

​VectorRetriever

​init

​process

​query

VectorRetriever

init

process

query