CodeChunker
- chunk_size (int, optional): The maximum token size per chunk. (default: :obj:
8192
) - remove_image: (bool, optional): If the chunker should skip the images.
- model_name (str, optional): The tokenizer model name used for token counting. (default: :obj:
"cl100k_base"
)
init
count_tokens
- text (str): The input text to be tokenized.
_split_oversized
- line (str): The oversized line to be split.
chunk
- content (List[str]): The content to be chunked.