camel.loaders package#
Submodules#
camel.loaders.base_io module#
- class camel.loaders.base_io.DocxFile(name: str, file_id: str, metadata: Dict[str, Any] | None = None, docs: List[Dict[str, Any]] | None = None, raw_bytes: bytes = b'')[source]#
Bases:
File
- class camel.loaders.base_io.File(name: str, file_id: str, metadata: Dict[str, Any] | None = None, docs: List[Dict[str, Any]] | None = None, raw_bytes: bytes = b'')[source]#
Bases:
ABC
Represents an uploaded file comprised of Documents.
- Parameters:
name (str) – The name of the file.
file_id (str) – The unique identifier of the file.
metadata (Dict[str, Any], optional) – Additional metadata associated with the file. Defaults to None.
docs (List[Dict[str, Any]], optional) – A list of documents contained within the file. Defaults to None.
raw_bytes (bytes, optional) – The raw bytes content of the file. Defaults to b””.
- static create_file(file: BytesIO, filename: str) File [source]#
Reads an uploaded file and returns a File object.
- Parameters:
file (BytesIO) – A BytesIO object representing the contents of the file.
filename (str) – The name of the file.
- Returns:
A File object.
- Return type:
- static create_file_from_raw_bytes(raw_bytes: bytes, filename: str) File [source]#
Reads raw bytes and returns a File object.
- Parameters:
raw_bytes (bytes) – The raw bytes content of the file.
filename (str) – The name of the file.
- Returns:
A File object.
- Return type:
- class camel.loaders.base_io.HtmlFile(name: str, file_id: str, metadata: Dict[str, Any] | None = None, docs: List[Dict[str, Any]] | None = None, raw_bytes: bytes = b'')[source]#
Bases:
File
- class camel.loaders.base_io.JsonFile(name: str, file_id: str, metadata: Dict[str, Any] | None = None, docs: List[Dict[str, Any]] | None = None, raw_bytes: bytes = b'')[source]#
Bases:
File
- class camel.loaders.base_io.PdfFile(name: str, file_id: str, metadata: Dict[str, Any] | None = None, docs: List[Dict[str, Any]] | None = None, raw_bytes: bytes = b'')[source]#
Bases:
File
camel.loaders.firecrawl_reader module#
- class camel.loaders.firecrawl_reader.Firecrawl(api_key: str | None = None, api_url: str | None = None)[source]#
Bases:
object
Firecrawl allows you to turn entire websites into LLM-ready markdown.
- Parameters:
api_key (Optional[str]) – API key for authenticating with the Firecrawl API.
api_url (Optional[str]) – Base URL for the Firecrawl API.
References
https://docs.firecrawl.dev/introduction
- check_crawl_job(job_id: str) Dict [source]#
Check the status of a crawl job.
- Parameters:
job_id (str) – The ID of the crawl job.
- Returns:
The response including status of the crawl job.
- Return type:
Dict
- Raises:
RuntimeError – If the check process fails.
- crawl(url: str, params: Dict[str, Any] | None = None, **kwargs: Any) Any [source]#
Crawl a URL and all accessible subpages. Customize the crawl by setting different parameters, and receive the full response or a job ID based on the specified options.
- Parameters:
url (str) – The URL to crawl.
params (Optional[Dict[str, Any]]) – Additional parameters for the crawl request. Defaults to None.
**kwargs (Any) – Additional keyword arguments, such as poll_interval, idempotency_key.
- Returns:
- The crawl job ID or the crawl results if waiting until
completion.
- Return type:
Any
- Raises:
RuntimeError – If the crawling process fails.
- map_site(url: str, params: Dict[str, Any] | None = None) list [source]#
Map a website to retrieve all accessible URLs.
- Parameters:
url (str) – The URL of the site to map.
params (Optional[Dict[str, Any]]) – Additional parameters for the map request. Defaults to None.
- Returns:
A list containing the URLs found on the site.
- Return type:
list
- Raises:
RuntimeError – If the mapping process fails.
- markdown_crawl(url: str) str [source]#
Crawl a URL and all accessible subpages and return the content in Markdown format.
- Parameters:
url (str) – The URL to crawl.
- Returns:
The content of the URL in Markdown format.
- Return type:
str
- Raises:
RuntimeError – If the crawling process fails.
- scrape(url: str, params: Dict[str, Any] | None = None) Dict [source]#
To scrape a single URL. This function supports advanced scraping by setting different parameters and returns the full scraped data as a dictionary.
Reference: https://docs.firecrawl.dev/advanced-scraping-guide
- Parameters:
url (str) – The URL to read.
params (Optional[Dict[str, Any]]) – Additional parameters for the scrape request.
- Returns:
The scraped data.
- Return type:
Dict
- Raises:
RuntimeError – If the scrape process fails.
- structured_scrape(url: str, output_schema: BaseModel) Dict [source]#
Use LLM to extract structured data from given URL.
- Parameters:
url (str) – The URL to read.
output_schema (BaseModel) – A pydantic model that includes value types and field descriptions used to generate a structured response by LLM. This schema helps in defining the expected output format.
- Returns:
The content of the URL.
- Return type:
Dict
- Raises:
RuntimeError – If the scrape process fails.
camel.loaders.jina_url_reader module#
- class camel.loaders.jina_url_reader.JinaURLReader(api_key: str | None = None, return_format: JinaReturnFormat = JinaReturnFormat.DEFAULT, json_response: bool = False, timeout: int = 30, **kwargs: Any)[source]#
Bases:
object
URL Reader provided by Jina AI. The output is cleaner and more LLM-friendly than the URL Reader of UnstructuredIO. Can be configured to replace the UnstructuredIO URL Reader in the pipeline.
- Parameters:
api_key (Optional[str], optional) – The API key for Jina AI. If not provided, the reader will have a lower rate limit. Defaults to None.
return_format (ReturnFormat, optional) – The level of detail of the returned content, which is optimized for LLMs. For now screenshots are not supported. Defaults to ReturnFormat.DEFAULT.
json_response (bool, optional) – Whether to return the response in JSON format. Defaults to False.
timeout (int, optional) – The maximum time in seconds to wait for the page to be rendered. Defaults to 30.
**kwargs (Any) – Additional keyword arguments, including proxies, cookies, etc. It should align with the HTTP Header field and value pairs listed in the reference.
References
camel.loaders.unstructured_io module#
- class camel.loaders.unstructured_io.UnstructuredIO[source]#
Bases:
object
A class to handle various functionalities provided by the Unstructured library, including version checking, parsing, cleaning, extracting, staging, chunking data, and integrating with cloud services like S3 and Azure for data connection.
References
- static chunk_elements(elements: List[Any], chunk_type: str, **kwargs) List[Element] [source]#
Chunks elements by titles.
- Parameters:
elements (List[Element]) – List of Element objects to be chunked.
chunk_type (str) – Type chunk going to apply. Supported types: ‘chunk_by_title’.
**kwargs – Additional keyword arguments for chunking.
- Returns:
List of chunked sections.
- Return type:
List[Dict]
References
- static clean_text_data(text: str, clean_options: List[Tuple[str, Dict[str, Any]]] | None = None) str [source]#
Cleans text data using a variety of cleaning functions provided by the unstructured library.
This function applies multiple text cleaning utilities by calling the unstructured library’s cleaning bricks for operations like replacing unicode quotes, removing extra whitespace, dashes, non-ascii characters, and more.
If no cleaning options are provided, a default set of cleaning operations is applied. These defaults including operations “replace_unicode_quotes”, “clean_non_ascii_chars”, “group_broken_paragraphs”, and “clean_extra_whitespace”.
- Parameters:
text (str) – The text to be cleaned.
clean_options (dict) – A dictionary specifying which cleaning options to apply. The keys should match the names of the cleaning functions, and the values should be dictionaries containing the parameters for each function. Supported types: ‘clean_extra_whitespace’, ‘clean_bullets’, ‘clean_ordered_bullets’, ‘clean_postfix’, ‘clean_prefix’, ‘clean_dashes’, ‘clean_trailing_punctuation’, ‘clean_non_ascii_chars’, ‘group_broken_paragraphs’, ‘remove_punctuation’, ‘replace_unicode_quotes’, ‘bytes_string_to_string’, ‘translate_text’.
- Returns:
The cleaned text.
- Return type:
str
- Raises:
AttributeError – If a cleaning option does not correspond to a valid cleaning function in unstructured.
Notes
The ‘options’ dictionary keys must correspond to valid cleaning brick names from the unstructured library. Each brick’s parameters must be provided in a nested dictionary as the value for the key.
References
- static create_element_from_text(text: str, element_id: str | UUID | None = None, embeddings: List[float] | None = None, filename: str | None = None, file_directory: str | None = None, last_modified: str | None = None, filetype: str | None = None, parent_id: str | UUID | None = None) Element [source]#
Creates a Text element from a given text input, with optional metadata and embeddings.
- Parameters:
text (str) – The text content for the element.
element_id (Optional[Union[str, uuid.UUID]], optional) – Unique identifier for the element. Defaults to None.
embeddings (Optional[List[float]], optional) – A list of float numbers representing the text embeddings. Defaults to None.
filename (Optional[str], optional) – The name of the file the element is associated with. Defaults to None.
file_directory (Optional[str], optional) – The directory path where the file is located. Defaults to None.
last_modified (Optional[str], optional) – The last modified date of the file. Defaults to None.
filetype (Optional[str], optional) – The type of the file. Defaults to None.
parent_id (Optional[Union[str, uuid.UUID]], optional) – The identifier of the parent element. Defaults to None.
- Returns:
- An instance of Text with the provided content and
metadata.
- Return type:
Element
- static extract_data_from_text(text: str, extract_type: Literal['extract_datetimetz', 'extract_email_address', 'extract_ip_address', 'extract_ip_address_name', 'extract_mapi_id', 'extract_ordered_bullets', 'extract_text_after', 'extract_text_before', 'extract_us_phone_number'], **kwargs) Any [source]#
Extracts various types of data from text using functions from unstructured.cleaners.extract.
- Parameters:
text (str) – Text to extract data from.
(Literal['extract_datetimetz' (extract_type) – ‘extract_email_address’, ‘extract_ip_address’, ‘extract_ip_address_name’, ‘extract_mapi_id’, ‘extract_ordered_bullets’, ‘extract_text_after’, ‘extract_text_before’, ‘extract_us_phone_number’]): Type of data to extract.
- :param‘extract_email_address’, ‘extract_ip_address’,
‘extract_ip_address_name’, ‘extract_mapi_id’, ‘extract_ordered_bullets’, ‘extract_text_after’, ‘extract_text_before’, ‘extract_us_phone_number’]): Type of data to extract.
- Parameters:
**kwargs – Additional keyword arguments for specific extraction functions.
- Returns:
The extracted data, type depends on extract_type.
- Return type:
Any
References
- static parse_file_or_url(input_path: str, **kwargs: Any) List[Element] | None [source]#
Loads a file or a URL and parses its contents into elements.
- Parameters:
input_path (str) – Path to the file or URL to be parsed.
**kwargs – Extra kwargs passed to the partition function.
- Returns:
- List of elements after parsing the file
or URL if success.
- Return type:
Union[List[Element],None]
- Raises:
FileNotFoundError – If the file does not exist at the path specified.
Notes
- Available document types:
“csv”, “doc”, “docx”, “epub”, “image”, “md”, “msg”, “odt”, “org”, “pdf”, “ppt”, “pptx”, “rtf”, “rst”, “tsv”, “xlsx”.
References
- static stage_elements(elements: List[Any], stage_type: Literal['convert_to_csv', 'convert_to_dataframe', 'convert_to_dict', 'dict_to_elements', 'stage_csv_for_prodigy', 'stage_for_prodigy', 'stage_for_baseplate', 'stage_for_datasaur', 'stage_for_label_box', 'stage_for_label_studio', 'stage_for_weaviate'], **kwargs) str | List[Dict] | Any [source]#
Stages elements for various platforms based on the specified staging type.
This function applies multiple staging utilities to format data for different NLP annotation and machine learning tools. It uses the ‘unstructured.staging’ module’s functions for operations like converting to CSV, DataFrame, dictionary, or formatting for specific platforms like Prodigy, etc.
- Parameters:
elements (List[Any]) – List of Element objects to be staged.
(Literal['convert_to_csv' (stage_type) – ‘convert_to_dict’, ‘dict_to_elements’, ‘stage_csv_for_prodigy’, ‘stage_for_prodigy’, ‘stage_for_baseplate’, ‘stage_for_datasaur’, ‘stage_for_label_box’, ‘stage_for_label_studio’, ‘stage_for_weaviate’]): Type of staging to perform.
'convert_to_dataframe' – ‘convert_to_dict’, ‘dict_to_elements’, ‘stage_csv_for_prodigy’, ‘stage_for_prodigy’, ‘stage_for_baseplate’, ‘stage_for_datasaur’, ‘stage_for_label_box’, ‘stage_for_label_studio’, ‘stage_for_weaviate’]): Type of staging to perform.
- :param‘convert_to_dict’, ‘dict_to_elements’,
‘stage_csv_for_prodigy’, ‘stage_for_prodigy’, ‘stage_for_baseplate’, ‘stage_for_datasaur’, ‘stage_for_label_box’, ‘stage_for_label_studio’, ‘stage_for_weaviate’]): Type of staging to perform.
- Parameters:
**kwargs – Additional keyword arguments specific to the staging type.
- Returns:
- Staged data in the
format appropriate for the specified staging type.
- Return type:
Union[str, List[Dict], Any]
- Raises:
ValueError – If the staging type is not supported or a required argument is missing.
References
Module contents#
- class camel.loaders.File(name: str, file_id: str, metadata: Dict[str, Any] | None = None, docs: List[Dict[str, Any]] | None = None, raw_bytes: bytes = b'')[source]#
Bases:
ABC
Represents an uploaded file comprised of Documents.
- Parameters:
name (str) – The name of the file.
file_id (str) – The unique identifier of the file.
metadata (Dict[str, Any], optional) – Additional metadata associated with the file. Defaults to None.
docs (List[Dict[str, Any]], optional) – A list of documents contained within the file. Defaults to None.
raw_bytes (bytes, optional) – The raw bytes content of the file. Defaults to b””.
- static create_file(file: BytesIO, filename: str) File [source]#
Reads an uploaded file and returns a File object.
- Parameters:
file (BytesIO) – A BytesIO object representing the contents of the file.
filename (str) – The name of the file.
- Returns:
A File object.
- Return type:
- static create_file_from_raw_bytes(raw_bytes: bytes, filename: str) File [source]#
Reads raw bytes and returns a File object.
- Parameters:
raw_bytes (bytes) – The raw bytes content of the file.
filename (str) – The name of the file.
- Returns:
A File object.
- Return type:
- class camel.loaders.Firecrawl(api_key: str | None = None, api_url: str | None = None)[source]#
Bases:
object
Firecrawl allows you to turn entire websites into LLM-ready markdown.
- Parameters:
api_key (Optional[str]) – API key for authenticating with the Firecrawl API.
api_url (Optional[str]) – Base URL for the Firecrawl API.
References
https://docs.firecrawl.dev/introduction
- check_crawl_job(job_id: str) Dict [source]#
Check the status of a crawl job.
- Parameters:
job_id (str) – The ID of the crawl job.
- Returns:
The response including status of the crawl job.
- Return type:
Dict
- Raises:
RuntimeError – If the check process fails.
- crawl(url: str, params: Dict[str, Any] | None = None, **kwargs: Any) Any [source]#
Crawl a URL and all accessible subpages. Customize the crawl by setting different parameters, and receive the full response or a job ID based on the specified options.
- Parameters:
url (str) – The URL to crawl.
params (Optional[Dict[str, Any]]) – Additional parameters for the crawl request. Defaults to None.
**kwargs (Any) – Additional keyword arguments, such as poll_interval, idempotency_key.
- Returns:
- The crawl job ID or the crawl results if waiting until
completion.
- Return type:
Any
- Raises:
RuntimeError – If the crawling process fails.
- map_site(url: str, params: Dict[str, Any] | None = None) list [source]#
Map a website to retrieve all accessible URLs.
- Parameters:
url (str) – The URL of the site to map.
params (Optional[Dict[str, Any]]) – Additional parameters for the map request. Defaults to None.
- Returns:
A list containing the URLs found on the site.
- Return type:
list
- Raises:
RuntimeError – If the mapping process fails.
- markdown_crawl(url: str) str [source]#
Crawl a URL and all accessible subpages and return the content in Markdown format.
- Parameters:
url (str) – The URL to crawl.
- Returns:
The content of the URL in Markdown format.
- Return type:
str
- Raises:
RuntimeError – If the crawling process fails.
- scrape(url: str, params: Dict[str, Any] | None = None) Dict [source]#
To scrape a single URL. This function supports advanced scraping by setting different parameters and returns the full scraped data as a dictionary.
Reference: https://docs.firecrawl.dev/advanced-scraping-guide
- Parameters:
url (str) – The URL to read.
params (Optional[Dict[str, Any]]) – Additional parameters for the scrape request.
- Returns:
The scraped data.
- Return type:
Dict
- Raises:
RuntimeError – If the scrape process fails.
- structured_scrape(url: str, output_schema: BaseModel) Dict [source]#
Use LLM to extract structured data from given URL.
- Parameters:
url (str) – The URL to read.
output_schema (BaseModel) – A pydantic model that includes value types and field descriptions used to generate a structured response by LLM. This schema helps in defining the expected output format.
- Returns:
The content of the URL.
- Return type:
Dict
- Raises:
RuntimeError – If the scrape process fails.
- class camel.loaders.JinaURLReader(api_key: str | None = None, return_format: JinaReturnFormat = JinaReturnFormat.DEFAULT, json_response: bool = False, timeout: int = 30, **kwargs: Any)[source]#
Bases:
object
URL Reader provided by Jina AI. The output is cleaner and more LLM-friendly than the URL Reader of UnstructuredIO. Can be configured to replace the UnstructuredIO URL Reader in the pipeline.
- Parameters:
api_key (Optional[str], optional) – The API key for Jina AI. If not provided, the reader will have a lower rate limit. Defaults to None.
return_format (ReturnFormat, optional) – The level of detail of the returned content, which is optimized for LLMs. For now screenshots are not supported. Defaults to ReturnFormat.DEFAULT.
json_response (bool, optional) – Whether to return the response in JSON format. Defaults to False.
timeout (int, optional) – The maximum time in seconds to wait for the page to be rendered. Defaults to 30.
**kwargs (Any) – Additional keyword arguments, including proxies, cookies, etc. It should align with the HTTP Header field and value pairs listed in the reference.
References
- class camel.loaders.UnstructuredIO[source]#
Bases:
object
A class to handle various functionalities provided by the Unstructured library, including version checking, parsing, cleaning, extracting, staging, chunking data, and integrating with cloud services like S3 and Azure for data connection.
References
- static chunk_elements(elements: List[Any], chunk_type: str, **kwargs) List[Element] [source]#
Chunks elements by titles.
- Parameters:
elements (List[Element]) – List of Element objects to be chunked.
chunk_type (str) – Type chunk going to apply. Supported types: ‘chunk_by_title’.
**kwargs – Additional keyword arguments for chunking.
- Returns:
List of chunked sections.
- Return type:
List[Dict]
References
- static clean_text_data(text: str, clean_options: List[Tuple[str, Dict[str, Any]]] | None = None) str [source]#
Cleans text data using a variety of cleaning functions provided by the unstructured library.
This function applies multiple text cleaning utilities by calling the unstructured library’s cleaning bricks for operations like replacing unicode quotes, removing extra whitespace, dashes, non-ascii characters, and more.
If no cleaning options are provided, a default set of cleaning operations is applied. These defaults including operations “replace_unicode_quotes”, “clean_non_ascii_chars”, “group_broken_paragraphs”, and “clean_extra_whitespace”.
- Parameters:
text (str) – The text to be cleaned.
clean_options (dict) – A dictionary specifying which cleaning options to apply. The keys should match the names of the cleaning functions, and the values should be dictionaries containing the parameters for each function. Supported types: ‘clean_extra_whitespace’, ‘clean_bullets’, ‘clean_ordered_bullets’, ‘clean_postfix’, ‘clean_prefix’, ‘clean_dashes’, ‘clean_trailing_punctuation’, ‘clean_non_ascii_chars’, ‘group_broken_paragraphs’, ‘remove_punctuation’, ‘replace_unicode_quotes’, ‘bytes_string_to_string’, ‘translate_text’.
- Returns:
The cleaned text.
- Return type:
str
- Raises:
AttributeError – If a cleaning option does not correspond to a valid cleaning function in unstructured.
Notes
The ‘options’ dictionary keys must correspond to valid cleaning brick names from the unstructured library. Each brick’s parameters must be provided in a nested dictionary as the value for the key.
References
- static create_element_from_text(text: str, element_id: str | UUID | None = None, embeddings: List[float] | None = None, filename: str | None = None, file_directory: str | None = None, last_modified: str | None = None, filetype: str | None = None, parent_id: str | UUID | None = None) Element [source]#
Creates a Text element from a given text input, with optional metadata and embeddings.
- Parameters:
text (str) – The text content for the element.
element_id (Optional[Union[str, uuid.UUID]], optional) – Unique identifier for the element. Defaults to None.
embeddings (Optional[List[float]], optional) – A list of float numbers representing the text embeddings. Defaults to None.
filename (Optional[str], optional) – The name of the file the element is associated with. Defaults to None.
file_directory (Optional[str], optional) – The directory path where the file is located. Defaults to None.
last_modified (Optional[str], optional) – The last modified date of the file. Defaults to None.
filetype (Optional[str], optional) – The type of the file. Defaults to None.
parent_id (Optional[Union[str, uuid.UUID]], optional) – The identifier of the parent element. Defaults to None.
- Returns:
- An instance of Text with the provided content and
metadata.
- Return type:
Element
- static extract_data_from_text(text: str, extract_type: Literal['extract_datetimetz', 'extract_email_address', 'extract_ip_address', 'extract_ip_address_name', 'extract_mapi_id', 'extract_ordered_bullets', 'extract_text_after', 'extract_text_before', 'extract_us_phone_number'], **kwargs) Any [source]#
Extracts various types of data from text using functions from unstructured.cleaners.extract.
- Parameters:
text (str) – Text to extract data from.
(Literal['extract_datetimetz' (extract_type) – ‘extract_email_address’, ‘extract_ip_address’, ‘extract_ip_address_name’, ‘extract_mapi_id’, ‘extract_ordered_bullets’, ‘extract_text_after’, ‘extract_text_before’, ‘extract_us_phone_number’]): Type of data to extract.
- :param‘extract_email_address’, ‘extract_ip_address’,
‘extract_ip_address_name’, ‘extract_mapi_id’, ‘extract_ordered_bullets’, ‘extract_text_after’, ‘extract_text_before’, ‘extract_us_phone_number’]): Type of data to extract.
- Parameters:
**kwargs – Additional keyword arguments for specific extraction functions.
- Returns:
The extracted data, type depends on extract_type.
- Return type:
Any
References
- static parse_file_or_url(input_path: str, **kwargs: Any) List[Element] | None [source]#
Loads a file or a URL and parses its contents into elements.
- Parameters:
input_path (str) – Path to the file or URL to be parsed.
**kwargs – Extra kwargs passed to the partition function.
- Returns:
- List of elements after parsing the file
or URL if success.
- Return type:
Union[List[Element],None]
- Raises:
FileNotFoundError – If the file does not exist at the path specified.
Notes
- Available document types:
“csv”, “doc”, “docx”, “epub”, “image”, “md”, “msg”, “odt”, “org”, “pdf”, “ppt”, “pptx”, “rtf”, “rst”, “tsv”, “xlsx”.
References
- static stage_elements(elements: List[Any], stage_type: Literal['convert_to_csv', 'convert_to_dataframe', 'convert_to_dict', 'dict_to_elements', 'stage_csv_for_prodigy', 'stage_for_prodigy', 'stage_for_baseplate', 'stage_for_datasaur', 'stage_for_label_box', 'stage_for_label_studio', 'stage_for_weaviate'], **kwargs) str | List[Dict] | Any [source]#
Stages elements for various platforms based on the specified staging type.
This function applies multiple staging utilities to format data for different NLP annotation and machine learning tools. It uses the ‘unstructured.staging’ module’s functions for operations like converting to CSV, DataFrame, dictionary, or formatting for specific platforms like Prodigy, etc.
- Parameters:
elements (List[Any]) – List of Element objects to be staged.
(Literal['convert_to_csv' (stage_type) – ‘convert_to_dict’, ‘dict_to_elements’, ‘stage_csv_for_prodigy’, ‘stage_for_prodigy’, ‘stage_for_baseplate’, ‘stage_for_datasaur’, ‘stage_for_label_box’, ‘stage_for_label_studio’, ‘stage_for_weaviate’]): Type of staging to perform.
'convert_to_dataframe' – ‘convert_to_dict’, ‘dict_to_elements’, ‘stage_csv_for_prodigy’, ‘stage_for_prodigy’, ‘stage_for_baseplate’, ‘stage_for_datasaur’, ‘stage_for_label_box’, ‘stage_for_label_studio’, ‘stage_for_weaviate’]): Type of staging to perform.
- :param‘convert_to_dict’, ‘dict_to_elements’,
‘stage_csv_for_prodigy’, ‘stage_for_prodigy’, ‘stage_for_baseplate’, ‘stage_for_datasaur’, ‘stage_for_label_box’, ‘stage_for_label_studio’, ‘stage_for_weaviate’]): Type of staging to perform.
- Parameters:
**kwargs – Additional keyword arguments specific to the staging type.
- Returns:
- Staged data in the
format appropriate for the specified staging type.
- Return type:
Union[str, List[Dict], Any]
- Raises:
ValueError – If the staging type is not supported or a required argument is missing.
References