JinaURLReader

class JinaURLReader:

URL Reader provided by Jina AI. The output is cleaner and more LLM-friendly than the URL Reader of UnstructuredIO. Can be configured to replace the UnstructuredIO URL Reader in the pipeline.

Parameters:

  • api_key (Optional[str], optional): The API key for Jina AI. If not provided, the reader will have a lower rate limit. Defaults to None.
  • return_format (ReturnFormat, optional): The level of detail of the returned content, which is optimized for LLMs. For now screenshots are not supported. Defaults to ReturnFormat.DEFAULT.
  • json_response (bool, optional): Whether to return the response in JSON format. Defaults to False.
  • timeout (int, optional): The maximum time in seconds to wait for the page to be rendered. Defaults to 30. **kwargs (Any): Additional keyword arguments, including proxies, cookies, etc. It should align with the HTTP Header field and value pairs listed in the reference.
  • References:
  • https: //jina.ai/reader

init

def __init__(
    self,
    api_key: Optional[str] = None,
    return_format: JinaReturnFormat = JinaReturnFormat.DEFAULT,
    json_response: bool = False,
    timeout: int = 30,
    **kwargs: Any
):

read_content

def read_content(self, url: str):

Reads the content of a URL and returns it as a string with given form.

Parameters:

  • url (str): The URL to read.

Returns:

str: The content of the URL.