Camel.retrievers.hybrid retrival
HybridRetriever
init
Initializes the HybridRetriever with optional embedding model and vector storage.
Parameters:
- embedding_model (Optional[BaseEmbedding]): An optional embedding model used by the VectorRetriever. Defaults to None.
- vector_storage (Optional[BaseVectorStorage]): An optional vector storage used by the VectorRetriever. Defaults to None.
process
Processes the content input path for both vector and BM25 retrievers.
Parameters:
- content_input_path (str): File path or URL of the content to be processed.
_sort_rrf_scores
Sorts and combines results from vector and BM25 retrievers using Reciprocal Rank Fusion (RRF).
Parameters:
- vector_retriever_results: A list of dictionaries containing the results from the vector retriever, where each dictionary contains a ‘text’ entry.
- bm25_retriever_results: A list of dictionaries containing the results from the BM25 retriever, where each dictionary contains a ‘text’ entry.
- top_k: The number of top results to return after sorting by RRF score.
- vector_weight: The weight to assign to the vector retriever results in the RRF calculation.
- bm25_weight: The weight to assign to the BM25 retriever results in the RRF calculation.
- rank_smoothing_factor: A hyperparameter for the RRF calculation that helps smooth the rank positions.
Returns:
List[Dict[str, Union[str, float]]]: A list of dictionaries representing the sorted results. Each dictionary contains the ‘text’from the retrieved items and their corresponding ‘rrf_score’.
query
Executes a hybrid retrieval query using both vector and BM25 retrievers.
Parameters:
- query (str): The search query.
- top_k (int): Number of top results to return (default 20).
- vector_weight (float): Weight for vector retriever results in RRF.
- bm25_weight (float): Weight for BM25 retriever results in RRF.
- rank_smoothing_factor (int): RRF hyperparameter for rank smoothing.
- vector_retriever_top_k (int): Top results from vector retriever.
- vector_retriever_similarity_threshold (float): Similarity threshold for vector retriever.
- bm25_retriever_top_k (int): Top results from BM25 retriever.
- return_detailed_info (bool): Return detailed info if True.
Returns:
Union[
dict[str, Sequence[Collection[str]]],
dict[str, Sequence[Union[str, float]]]
]: By default, returns only the text information. If
return_detailed_info
is True
, return detailed information
including rrf scores.