Firecrawl

class Firecrawl:

Firecrawl allows you to turn entire websites into LLM-ready markdown.

Parameters:

  • api_key (Optional[str]): API key for authenticating with the Firecrawl API.
  • api_url (Optional[str]): Base URL for the Firecrawl API.
  • References:
  • https: //docs.firecrawl.dev/introduction

init

def __init__(
    self,
    api_key: Optional[str] = None,
    api_url: Optional[str] = None
):

crawl

def crawl(
    self,
    url: str,
    params: Optional[Dict[str, Any]] = None,
    **kwargs: Any
):

Crawl a URL and all accessible subpages. Customize the crawl by setting different parameters, and receive the full response or a job ID based on the specified options.

Parameters:

  • url (str): The URL to crawl.
  • params (Optional[Dict[str, Any]]): Additional parameters for the crawl request. Defaults to None. **kwargs (Any): Additional keyword arguments, such as poll_interval, idempotency_key.

Returns:

Any: The crawl job ID or the crawl results if waiting until completion.

check_crawl_job

def check_crawl_job(self, job_id: str):

Check the status of a crawl job.

Parameters:

  • job_id (str): The ID of the crawl job.

Returns:

Dict: The response including status of the crawl job.

scrape

def scrape(self, url: str, params: Optional[Dict[str, Any]] = None):

To scrape a single URL. This function supports advanced scraping by setting different parameters and returns the full scraped data as a dictionary.

Reference: https://docs.firecrawl.dev/advanced-scraping-guide

Parameters:

  • url (str): The URL to read.
  • params (Optional[Dict[str, Any]]): Additional parameters for the scrape request.

Returns:

Dict: The scraped data.

structured_scrape

def structured_scrape(self, url: str, response_format: BaseModel):

Use LLM to extract structured data from given URL.

Parameters:

  • url (str): The URL to read.
  • response_format (BaseModel): A pydantic model that includes value types and field descriptions used to generate a structured response by LLM. This schema helps in defining the expected output format.

Returns:

Dict: The content of the URL.

map_site

def map_site(self, url: str, params: Optional[Dict[str, Any]] = None):

Map a website to retrieve all accessible URLs.

Parameters:

  • url (str): The URL of the site to map.
  • params (Optional[Dict[str, Any]]): Additional parameters for the map request. Defaults to None.

Returns:

list: A list containing the URLs found on the site.