Document extraction service supporting OCR, formula recognition
and tables.Parameters:
api_key (str, optional): Authentication key for MinerU API service. If not provided, will use MINERU_API_KEY environment variable. (default: :obj:None)
api_url (str, optional): Base URL endpoint for the MinerU API service. (default: :obj:"https://mineru.net/api/v4")
Note:
Single file size limit: 200MB
Page limit per file: 600 pages
Daily high-priority parsing quota: 2000 pages
Some URLs (GitHub, AWS) may timeout due to network restrictions
Extract content from multiple document URLs in batch.Parameters:
files (List[Dict[str, Union[str, bool]]]): List of document configurations. Each document requires ‘url’ and optionally ‘is_ocr’ and ‘data_id’ parameters.
Returns:str: Batch identifier for tracking extraction progress.