Loaders
What are Loaders?
CAMEL’s Loaders provide flexible ways to ingest and process all kinds of data
structured files, unstructured text, web content, and even OCR from images.
They power your agent’s ability to interact with the outside world. itionally,
several data readers were added, including Apify Reader
, Chunkr Reader
,
Firecrawl Reader
, Jina_url Reader
, and Mistral Reader
, which enable
retrieval of external data for improved data integration and analysis.
Types
Base IO
Base IO
Handles core file input/output for formats like PDF, DOCX, HTML, and more.
Lets you represent, read, and process structured files.
Unstructured IO
Unstructured IO
Powerful ETL for parsing, cleaning, extracting, chunking, and staging
unstructured data.
Perfect for RAG pipelines and pre-processing.
Apify Reader
Apify Reader
Integrates with Apify to automate web workflows and scraping.
Supports authentication, actor management, and dataset operations via API.
Chunkr Reader
Chunkr Reader
Connects to the Chunkr API for document chunking, segmentation, and OCR.
Handles everything from simple docs to scanned PDFs.
Firecrawl Reader
Firecrawl Reader
Converts entire websites into LLM-ready markdown using the Firecrawl API.
Useful for quickly ingesting web content as clean text.
JinaURL Reader
JinaURL Reader
Uses Jina AI’s URL reading service to cleanly extract web content.
Designed for LLM-friendly extraction from any URL.
MarkitDown Reader
MarkitDown Reader
Lightweight tool to convert files (HTML, DOCX, PDF, etc.) into Markdown.
Ideal for prepping documents for LLM ingestion or analysis.
Mistral Reader
Mistral Reader
Integrates Mistral AI’s OCR service for extracting text from images and PDFs.
Supports both local and remote file processing for various formats.
Get Started
Using Base IO
This module is designed to read files of various formats, extract their contents, and represent them as File
objects, each tailored to handle a specific file type.
Using Unstructured IO
To get started with the Unstructured IO
module, just import and initialize it. You can parse, clean, extract, chunk, and stage data from files or URLs. Here’s how you use it step by step:
This guide gets you started with Unstructured IO
. For more, see the Unstructured IO Documentation.
Using Apify Reader
Initialize the Apify client, set up the required actors and parameters, and run the actor.
Using Firecrawl Reader
Firecrawl Reader provides a simple way to turn any website into LLM-ready markdown format. Here’s how you can use it step by step:
Initialize the Firecrawl client and start a crawl
First, create a Firecrawl client and crawl a specific URL.
When the status is “completed”
, the content extraction is done and you can retrieve the results.
Retrieve the extracted markdown content
Once finished, access the LLM-ready markdown directly from the response:
That’s it. With just a couple of lines, you can turn any website into clean markdown, ready for LLM pipelines or further processing.
Using Chunkr Reader
Chunkr Reader allows you to process PDFs (and other docs) in chunks, with built-in OCR and format control.
Below is a basic usage pattern:
Initialize the ChunkrReader
and ChunkrReaderConfig
, set the file path and chunking options, then submit your task and fetch results:
A successful task returns a chunked structure like this:
Using Jina Reader
Jina Reader provides a convenient interface to extract clean, LLM-friendly content from any URL in a chosen format (like markdown):
Using MarkitDown Reader
MarkitDown Reader lets you convert files (like HTML or docs) into LLM-ready markdown with a single line.
Example output:
Using Mistral Reader
Mistral Reader offers OCR and text extraction from both PDFs and images, whether local or remote. Just specify the file path or URL:
You can also extract from images or local files:
Response includes structured page data, markdown content, and usage details.