Sorts and combines results from vector and BM25 retrievers using
Reciprocal Rank Fusion (RRF).Parameters:
vector_retriever_results: A list of dictionaries containing the results from the vector retriever, where each dictionary contains a ‘text’ entry.
bm25_retriever_results: A list of dictionaries containing the results from the BM25 retriever, where each dictionary contains a ‘text’ entry.
top_k: The number of top results to return after sorting by RRF score.
vector_weight: The weight to assign to the vector retriever results in the RRF calculation.
bm25_weight: The weight to assign to the BM25 retriever results in the RRF calculation.
rank_smoothing_factor: A hyperparameter for the RRF calculation that helps smooth the rank positions.
Returns:List[Dict[str, Union[str, float]]]: A list of dictionaries
representing the sorted results. Each dictionary contains the
‘text’from the retrieved items and their corresponding ‘rrf_score’.
def query( self, query: str, top_k: int = 20, vector_weight: float = 0.8, bm25_weight: float = 0.2, rank_smoothing_factor: int = 60, vector_retriever_top_k: int = 50, vector_retriever_similarity_threshold: float = 0.5, bm25_retriever_top_k: int = 50, return_detailed_info: bool = False):
Executes a hybrid retrieval query using both vector and BM25
retrievers.Parameters:
query (str): The search query.
top_k (int): Number of top results to return (default 20).
vector_weight (float): Weight for vector retriever results in RRF.
bm25_weight (float): Weight for BM25 retriever results in RRF.
rank_smoothing_factor (int): RRF hyperparameter for rank smoothing.
vector_retriever_top_k (int): Top results from vector retriever.
vector_retriever_similarity_threshold (float): Similarity threshold for vector retriever.
bm25_retriever_top_k (int): Top results from BM25 retriever.
return_detailed_info (bool): Return detailed info if True.
Returns:Union[
dict[str, Sequence[Collection[str]]],
dict[str, Sequence[Union[str, float]]]
]: By default, returns only the text information. If
return_detailed_info is True, return detailed information
including rrf scores.