> ## Documentation Index
> Fetch the complete documentation index at: https://docs.camel-ai.org/llms.txt
> Use this file to discover all available pages before exploring further.

# Camel.loaders.mineru extractor

<a id="camel.loaders.mineru_extractor" />

<a id="camel.loaders.mineru_extractor.MinerU" />

## MinerU

```python theme={"system"}
class MinerU:
```

Document extraction service supporting OCR, formula recognition
and tables.

**Parameters:**

* **api\_key** (str, optional): Authentication key for MinerU API service. If not provided, will use MINERU\_API\_KEY environment variable. (default: :obj:`None`)
* **api\_url** (str, optional): Base URL endpoint for the MinerU API service. (default: :obj:`"https://mineru.net/api/v4"`)

**Note:**

* Single file size limit: 200MB
* Page limit per file: 600 pages
* Daily high-priority parsing quota: 2000 pages
* Some URLs (GitHub, AWS) may timeout due to network restrictions

<a id="camel.loaders.mineru_extractor.MinerU.__init__" />

### **init**

```python theme={"system"}
def __init__(
    self,
    api_key: Optional[str] = None,
    api_url: Optional[str] = 'https://mineru.net/api/v4',
    is_ocr: bool = False,
    enable_formula: bool = False,
    enable_table: bool = True,
    layout_model: str = 'doclayout_yolo',
    language: str = 'en'
):
```

Initialize MinerU extractor.

**Parameters:**

* **api\_key** (str, optional): Authentication key for MinerU API service. If not provided, will use MINERU\_API\_KEY environment variable.
* **api\_url** (str, optional): Base URL endpoint for MinerU API service. (default: "[https://mineru.net/api/v4](https://mineru.net/api/v4)")
* **is\_ocr** (bool, optional): Enable optical character recognition. (default: :obj:`False`)
* **enable\_formula** (bool, optional): Enable formula recognition. (default: :obj:`False`)
* **enable\_table** (bool, optional): Enable table detection, extraction. (default: :obj:`True`)
* **layout\_model** (str, optional): Model for document layout detection. Options are 'doclayout\_yolo' or 'layoutlmv3'. (default: :obj:`"doclayout_yolo"`)
* **language** (str, optional): Primary language of the document. (default: :obj:`"en"`)

<a id="camel.loaders.mineru_extractor.MinerU.extract_url" />

### extract\_url

```python theme={"system"}
def extract_url(self, url: str):
```

Extract content from a URL document.

**Parameters:**

* **url** (str): Document URL to extract content from.

**Returns:**

Dict: Task identifier for tracking extraction progress.

<a id="camel.loaders.mineru_extractor.MinerU.batch_extract_urls" />

### batch\_extract\_urls

```python theme={"system"}
def batch_extract_urls(self, files: List[Dict[str, Union[str, bool]]]):
```

Extract content from multiple document URLs in batch.

**Parameters:**

* **files** (List\[Dict\[str, Union\[str, bool]]]): List of document configurations. Each document requires 'url' and optionally 'is\_ocr' and 'data\_id' parameters.

**Returns:**

str: Batch identifier for tracking extraction progress.

<a id="camel.loaders.mineru_extractor.MinerU.get_task_status" />

### get\_task\_status

```python theme={"system"}
def get_task_status(self, task_id: str):
```

Retrieve status of a single extraction task.

**Parameters:**

* **task\_id** (str): Unique identifier of the extraction task.

**Returns:**

Dict: Current task status and results if completed.

<a id="camel.loaders.mineru_extractor.MinerU.get_batch_status" />

### get\_batch\_status

```python theme={"system"}
def get_batch_status(self, batch_id: str):
```

Retrieve status of a batch extraction task.

**Parameters:**

* **batch\_id** (str): Unique identifier of the batch extraction task.

**Returns:**

Dict: Current status and results for all documents in the batch.

<a id="camel.loaders.mineru_extractor.MinerU.wait_for_completion" />

### wait\_for\_completion

```python theme={"system"}
def wait_for_completion(
    self,
    task_id: str,
    is_batch: bool = False,
    timeout: float = 100,
    check_interval: float = 5
):
```

Monitor task until completion or timeout.

**Parameters:**

* **task\_id** (str): Unique identifier of the task or batch.
* **is\_batch** (bool, optional): Indicates if task is a batch operation. (default: :obj:`False`)
* **timeout** (float, optional): Maximum wait time in seconds. (default: :obj:`100`)
* **check\_interval** (float, optional): Time between status checks in seconds. (default: :obj:`5`)

**Returns:**

Dict: Final task status and extraction results.
