BM25Retriever
BaseRetriever
using the BM25
model.
This class facilitates the retriever of relevant information using a
query-based approach, it ranks documents based on the occurrence and
frequency of the query terms.
Parameters:
- bm25 (BM25Okapi): An instance of the BM25Okapi class used for calculating document scores.
- content_input_path (str): The path to the content that has been processed and stored.
- unstructured_modules (UnstructuredIO): A module for parsing files and URLs and chunking content based on specified parameters.
- References:
- https: //github.com/dorianbrown/rank_bm25
init
process
Unstructured IO
,then stored internally. This method must be
called before executing queries with the retriever.
Parameters:
- content_input_path (str): File path or URL of the content to be processed.
- chunk_type (str): Type of chunking going to apply. Defaults to “chunk_by_title”. **kwargs (Any): Additional keyword arguments for content parsing.
query
- query (str): Query string for information retriever.
- top_k (int, optional): The number of top results to return during retriever. Must be a positive integer. Defaults to
DEFAULT_TOP_K_RESULTS
.