MarkItDownToolkit

class MarkItDownToolkit(BaseToolkit):

A class representing a toolkit for MarkItDown.

init

def __init__(self, timeout: Optional[float] = None):

load_files

def load_files(self, file_paths: List[str]):

Scrapes content from a list of files and converts it to Markdown.

This function takes a list of local file paths, attempts to convert each file into Markdown format, and returns the converted content. The conversion is performed in parallel for efficiency.

Supported file formats include:

  • PDF (.pdf)
  • Microsoft Office: Word (.doc, .docx), Excel (.xls, .xlsx), PowerPoint (.ppt, .pptx)
  • EPUB (.epub)
  • HTML (.html, .htm)
  • Images (.jpg, .jpeg, .png) for OCR
  • Audio (.mp3, .wav) for transcription
  • Text-based formats (.csv, .json, .xml, .txt)
  • ZIP archives (.zip)

Parameters:

  • file_paths (List[str]): A list of local file paths to be converted.

Returns:

Dict[str, str]: A dictionary where keys are the input file paths and values are the corresponding content in Markdown format. If conversion of a file fails, the value will contain an error message.

get_tools

def get_tools(self):

Returns:

List[FunctionTool]: A list of FunctionTool objects representing the functions in the toolkit.