Documentation Index
Fetch the complete documentation index at: https://docs.camel-ai.org/llms.txt
Use this file to discover all available pages before exploring further.
CodeChunker
class CodeChunker(BaseChunker):
A class for chunking code or text while respecting structure
and token limits.
This class ensures that structured elements such as functions,
classes, and regions are not arbitrarily split across chunks.
It also handles oversized lines and Base64-encoded images.
Parameters:
- chunk_size (int, optional): The maximum token size per chunk. (default: :obj:
8192)
- remove_image: (bool, optional): If the chunker should skip the images.
- model_name (str, optional): The tokenizer model name used for token counting. (default: :obj:
"cl100k_base")
init
def __init__(
self,
chunk_size: int = 8192,
model_name: str = 'cl100k_base',
remove_image: Optional[bool] = True
):
count_tokens
def count_tokens(self, text: str):
Counts the number of tokens in the given text.
Parameters:
- text (str): The input text to be tokenized.
Returns:
int: The number of tokens in the input text.
_split_oversized
def _split_oversized(self, line: str):
Splits an oversized line into multiple chunks based on token limits
Parameters:
- line (str): The oversized line to be split.
Returns:
List[str]: A list of smaller chunks after splitting the
oversized line.
chunk
def chunk(self, content: List[str]):
Splits the content into smaller chunks while preserving
structure and adhering to token constraints.
Parameters:
- content (List[str]): The content to be chunked.
Returns:
List[str]: A list of chunked text segments.