FileToolkit
- Reading various file formats (text, JSON, YAML, PDF, DOCX)
- Writing to multiple formats (Markdown, DOCX, PDF, plaintext, JSON, YAML, CSV, HTML)
- Editing and modifying existing files with content replacement
- Automatic backup creation before modifications
- Custom encoding and enhanced formatting options
init
- working_directory (str, optional): The default directory for output files. If not provided, it will be determined by the
CAMEL_WORKDIR
environment variable (if set). If the environment variable is not set, it defaults tocamel_working_dir
. - timeout (Optional[float]): The timeout for the toolkit. (default: :obj:
None
) - default_encoding (str): Default character encoding for text operations. (default: :obj:
utf-8
) - backup_enabled (bool): Whether to create backups of existing files before overwriting. (default: :obj:
True
)
_resolve_filepath
- file_path (str): The file path to resolve.
_sanitize_filename
- filename (str): The original filename which may contain spaces or special characters.
_write_text_file
- file_path (Path): The target file path.
- content (str): The text content to write.
- encoding (str): Character encoding to use. (default: :obj:
utf-8
) (default: utf-8)
_create_backup
- file_path (Path): The file path to backup.
_write_docx_file
- file_path (Path): The target file path.
- content (str): The text content to write.
_write_pdf_file
- file_path (Path): The target file path.
- title (str): The document title.
- content (Union[str, List[List[str]]]): The content to write. Can
- be: - String: Supports Markdown-style tables and LaTeX math expressions - List[List[str]]: Table data as list of rows for direct table rendering
- use_latex (bool): Whether to use LaTeX for math rendering. (default: :obj:
False
)
_process_text_content
- story: The reportlab story list to append to
- content (str): The text content to process
- heading_style: Style for headings
- body_style: Style for body text
_find_table_line_ranges
- lines (List[str]): List of lines to analyze.
_register_chinese_font
_parse_markdown_table
- lines (List[str]): List of text lines that may contain tables.
_is_table_row
- line (str): The line to check.
_is_table_separator
- line (str): The line to check.
_parse_table_row
- line (str): The table row line.
_create_pdf_table
- table_data (List[List[str]]): Table data as list of rows.
_convert_markdown_to_html
- text (str): Text with markdown formatting.
_ensure_html_utf8_meta
- content (str): The HTML content.
_write_csv_file
- file_path (Path): The target file path.
- content (Union[str, List[List]]): The CSV content as a string or list of lists.
- encoding (str): Character encoding to use. (default: :obj:
utf-8
) (default: utf-8)
_write_json_file
- file_path (Path): The target file path.
- content (str): The JSON content as a string.
- encoding (str): Character encoding to use. (default: :obj:
utf-8
) (default: utf-8)
_write_simple_text_file
- file_path (Path): The target file path.
- content (str): The content to write.
- encoding (str): Character encoding to use. (default: :obj:
utf-8
) (default: utf-8)
write_to_file
- title (str): The title of the document.
- content (Union[str, List[List[str]]]): The content to write to the file. Content format varies by file type: - Text formats (txt, md, html, yaml): string - CSV: string or list of lists - JSON: string or serializable object
- filename (str): The name or path of the file. If a relative path is supplied, it is resolved to self.working_directory.
- encoding (Optional[str]): The character encoding to use. (default: :obj:
None
) - use_latex (bool): Whether to use LaTeX for math rendering. (default: :obj:
False
)
read_file
- PDF (.pdf)
- Microsoft Office: Word (.doc, .docx), Excel (.xls, .xlsx), PowerPoint (.ppt, .pptx)
- EPUB (.epub)
- HTML (.html, .htm)
- Images (.jpg, .jpeg, .png) for OCR
- Audio (.mp3, .wav) for transcription
- Text-based formats (.csv, .json, .xml, .txt, .md)
- ZIP archives (.zip)
- file_paths (Union[str, List[str]]): A single file path or a list of file paths to read. Paths can be relative or absolute. If relative, they will be resolved relative to the working directory.
- If a single file path is provided: Returns the content as a string.
- If multiple file paths are provided: Returns a dictionary where keys are file paths and values are the corresponding content in Markdown format. If conversion fails, returns an error message.
edit_file
- file_path (str): The path to the file to edit. Can be relative or absolute. If relative, it will be resolved relative to the working directory.
- old_content (str): The exact text to find and replace.
- new_content (str): The text to replace old_content with.