Camel.toolkits.markitdown toolkit

MarkItDownToolkit

class MarkItDownToolkit(BaseToolkit):

A class representing a toolkit for MarkItDown.

init

def __init__(self, timeout: Optional[float] = None):

read_files

def read_files(self, file_paths: List[str]):

Scrapes content from a list of files and converts it to Markdown. This function takes a list of local file paths, attempts to convert each file into Markdown format, and returns the converted content. The conversion is performed in parallel for efficiency. Supported file formats include:

PDF (.pdf)
Microsoft Office: Word (.doc, .docx), Excel (.xls, .xlsx), PowerPoint (.ppt, .pptx)
EPUB (.epub)
HTML (.html, .htm)
Images (.jpg, .jpeg, .png) for OCR
Audio (.mp3, .wav) for transcription
Text-based formats (.csv, .json, .xml, .txt)
ZIP archives (.zip)

Parameters:

file_paths (List[str]): A list of local file paths to be converted.

Returns: Dict[str, str]: A dictionary where keys are the input file paths and values are the corresponding content in Markdown format. If conversion of a file fails, the value will contain an error message.

get_tools

def get_tools(self):

Returns: List[FunctionTool]: A list of FunctionTool objects representing the functions in the toolkit.

Overview

Agents

Configs

Data Generation

Datasets

Embeddings

Models

Interpreters

Memory

Messages

Prompts

Responses

Retrievers

Societies

Storage

Tasks

Terminators

Toolkits

Types

Verifiers

Bots

Utilities

Environments

Extractors

Personas

Benchmarks

Data Collectors

Datahubs

Loaders

Runtimes

Schemas

MarkItDownToolkit

init

read_files

get_tools

Overview

Agents

Configs

Data Generation

Datasets

Embeddings

Models

Interpreters

Memory

Messages

Prompts

Responses

Retrievers

Societies

Storage

Tasks

Terminators

Toolkits

Types

Verifiers

Bots

Utilities

Environments

Extractors

Personas

Benchmarks

Data Collectors

Datahubs

Loaders

Runtimes

Schemas

​MarkItDownToolkit

​init

​read_files

​get_tools

MarkItDownToolkit

init

read_files

get_tools