None
)None
)None
)None
)None
)None
)None
)None
.
Note:
Supported file types:
“csv”, “doc”, “docx”, “epub”, “image”, “md”, “msg”, “odt”,
“org”, “pdf”, “ppt”, “pptx”, “rtf”, “rst”, “tsv”, “xlsx”.
References:
https://docs.unstructured.io/open-source/core-functionality/partitioning
unstructured
library.
This function applies multiple text cleaning utilities by calling the
unstructured
library’s cleaning bricks for operations like
replacing Unicode quotes, removing extra whitespace, dashes, non-ascii
characters, and more.
If no cleaning options are provided, a default set of cleaning
operations is applied. These defaults including operations
“replace_unicode_quotes”, “clean_non_ascii_chars”,
“group_broken_paragraphs”, and “clean_extra_whitespace”.
Parameters:
unstructured
library.
Each brick’s parameters must be provided in a nested dictionary
as the value for the key.
References:
https://unstructured-io.github.io/unstructured/