HybridRetriever
init
- embedding_model (Optional[BaseEmbedding]): An optional embedding model used by the VectorRetriever. Defaults to None.
- vector_storage (Optional[BaseVectorStorage]): An optional vector storage used by the VectorRetriever. Defaults to None.
process
- content_input_path (str): File path or URL of the content to be processed.
_sort_rrf_scores
- vector_retriever_results: A list of dictionaries containing the results from the vector retriever, where each dictionary contains a ‘text’ entry.
- bm25_retriever_results: A list of dictionaries containing the results from the BM25 retriever, where each dictionary contains a ‘text’ entry.
- top_k: The number of top results to return after sorting by RRF score.
- vector_weight: The weight to assign to the vector retriever results in the RRF calculation.
- bm25_weight: The weight to assign to the BM25 retriever results in the RRF calculation.
- rank_smoothing_factor: A hyperparameter for the RRF calculation that helps smooth the rank positions.
query
- query (str): The search query.
- top_k (int): Number of top results to return (default 20).
- vector_weight (float): Weight for vector retriever results in RRF.
- bm25_weight (float): Weight for BM25 retriever results in RRF.
- rank_smoothing_factor (int): RRF hyperparameter for rank smoothing.
- vector_retriever_top_k (int): Top results from vector retriever.
- vector_retriever_similarity_threshold (float): Similarity threshold for vector retriever.
- bm25_retriever_top_k (int): Top results from BM25 retriever.
- return_detailed_info (bool): Return detailed info if True.
return_detailed_info
is True
, return detailed information
including rrf scores.