Camel.retrievers.hybrid retrival

HybridRetriever

class HybridRetriever(BaseRetriever):

init

def __init__(
    self,
    embedding_model: Optional[BaseEmbedding] = None,
    vector_storage: Optional[BaseVectorStorage] = None
):

Initializes the HybridRetriever with optional embedding model and vector storage. Parameters:

embedding_model (Optional[BaseEmbedding]): An optional embedding model used by the VectorRetriever. Defaults to None.
vector_storage (Optional[BaseVectorStorage]): An optional vector storage used by the VectorRetriever. Defaults to None.

process

def process(self, content_input_path: str):

Processes the content input path for both vector and BM25 retrievers. Parameters:

content_input_path (str): File path or URL of the content to be processed.

_sort_rrf_scores

def _sort_rrf_scores(
    self,
    vector_retriever_results: List[Dict[str, Any]],
    bm25_retriever_results: List[Dict[str, Any]],
    top_k: int,
    vector_weight: float,
    bm25_weight: float,
    rank_smoothing_factor: float
):

Sorts and combines results from vector and BM25 retrievers using Reciprocal Rank Fusion (RRF). Parameters:

vector_retriever_results: A list of dictionaries containing the results from the vector retriever, where each dictionary contains a ‘text’ entry.
bm25_retriever_results: A list of dictionaries containing the results from the BM25 retriever, where each dictionary contains a ‘text’ entry.
top_k: The number of top results to return after sorting by RRF score.
vector_weight: The weight to assign to the vector retriever results in the RRF calculation.
bm25_weight: The weight to assign to the BM25 retriever results in the RRF calculation.
rank_smoothing_factor: A hyperparameter for the RRF calculation that helps smooth the rank positions.

Returns: List[Dict[str, Union[str, float]]]: A list of dictionaries representing the sorted results. Each dictionary contains the ‘text’from the retrieved items and their corresponding ‘rrf_score’.

query

def query(
    self,
    query: str,
    top_k: int = 20,
    vector_weight: float = 0.8,
    bm25_weight: float = 0.2,
    rank_smoothing_factor: int = 60,
    vector_retriever_top_k: int = 50,
    vector_retriever_similarity_threshold: float = 0.5,
    bm25_retriever_top_k: int = 50,
    return_detailed_info: bool = False
):

Executes a hybrid retrieval query using both vector and BM25 retrievers. Parameters:

query (str): The search query.
top_k (int): Number of top results to return (default 20).
vector_weight (float): Weight for vector retriever results in RRF.
bm25_weight (float): Weight for BM25 retriever results in RRF.
rank_smoothing_factor (int): RRF hyperparameter for rank smoothing.
vector_retriever_top_k (int): Top results from vector retriever.
vector_retriever_similarity_threshold (float): Similarity threshold for vector retriever.
bm25_retriever_top_k (int): Top results from BM25 retriever.
return_detailed_info (bool): Return detailed info if True.

Returns: Union[ dict[str, Sequence[Collection[str]]], dict[str, Sequence[Union[str, float]]] ]: By default, returns only the text information. If return_detailed_info is True, return detailed information including rrf scores.

Overview

Agents

Configs

Data Generation

Datasets

Embeddings

Models

Interpreters

Memory

Messages

Prompts

Responses

Retrievers

Societies

Storage

Tasks

Terminators

Toolkits

Types

Verifiers

Bots

Utilities

Environments

Extractors

Personas

Benchmarks

Data Collectors

Datahubs

Loaders

Runtimes

Schemas

HybridRetriever

init

process

_sort_rrf_scores

query

Overview

Agents

Configs

Data Generation

Datasets

Embeddings

Models

Interpreters

Memory

Messages

Prompts

Responses

Retrievers

Societies

Storage

Tasks

Terminators

Toolkits

Types

Verifiers

Bots

Utilities

Environments

Extractors

Personas

Benchmarks

Data Collectors

Datahubs

Loaders

Runtimes

Schemas

​HybridRetriever

​init

​process

​_sort_rrf_scores

​query

HybridRetriever

init

process

_sort_rrf_scores

query