Toolkit for extracting and processing document content
using MinerU API.Provides comprehensive document processing capabilities including content
extraction from URLs and files, with support for OCR, formula recognition,
and table detection through the MinerU API service.Note:
Maximum file size: 200MB per file
Maximum pages: 600 pages per file
Daily quota: 2000 pages for high-priority parsing
Network restrictions may affect certain URLs (e.g., GitHub, AWS)
Retrieve current status of an individual extraction task.Parameters:
task_id (str): Unique identifier for the extraction task to check.
Returns:Dict: Status information and results (if task is completed) for
the specified task.Note:This is a low-level status checking method. For most use cases,
prefer using extract_from_url with wait=True for automatic
completion handling.
Retrieve current status of a batch extraction task.Parameters:
batch_id (str): Unique identifier for the batch extraction task to check.
Returns:Dict: Comprehensive status information and results for all files
in the batch task.Note:This is a low-level status checking method. For most use cases,
prefer using batch_extract_from_urls with wait=True for automatic
completion handling.