You can also check this cookbook in colab here
⭐ Star us on GitHub, join our Discord, or follow us on X
This notebook provides a comprehensive guide to generating user queries and structured tool call data using CAMEL’s ChatAgent framework. By utilizing real tools and the Hermes JSON format for function calls, the tutorial demonstrates a structured approach to scalable and flexible data generation.
In this notebook, you’ll explore:
-
CAMEL’s ChatAgent Framework: A multi-agent system for generating human-like queries and structured tool call data, leveraging its modular and adaptable design.
-
Hermes Function Calling Format: A standardized JSON-based format for encoding function calls, ensuring consistency and interoperability in structured data outputs.
-
OpenAI API Integration: Enabling advanced natural language understanding and generation for crafting user queries and processing tool responses.
-
Toolkits Integration: Leveraging tools such as MathToolkit, SearchToolkit, and others to provide diverse functionalities for realistic scenarios.
-
Automated Data Generation: End-to-end pipeline for generating tool-specific user queries, structuring their outputs, and saving them as JSON files.
Installation
Ensure you have CAMEL AI and desired dependencies installed in your Python environment:
!pip install 'camel-ai[rag,web_tools]==0.2.18'
Step 1: Import Required Libraries and Modules
Start by importing necessary libraries and modules.
import json
from typing import Callable, List, Union, Any
# Import necessary classes and functions from camel library
from camel.agents import ChatAgent
from camel.messages import FunctionCallingMessage
from camel.messages import HermesFunctionFormatter
from camel.messages import ShareGPTConversation
from camel.messages import ShareGPTMessage
from camel.models import ModelFactory
from camel.toolkits import FunctionTool, MathToolkit, SearchToolkit, \
RetrievalToolkit
from camel.types import ModelPlatformType, ModelType
import os
from getpass import getpass
# Prompt for the OpenAI API key securely
openai_api_key = getpass('Enter your OpenAI API key: ')
os.environ["OPENAI_API_KEY"] = openai_api_key
Alternatively, if running on Colab, you could save your API keys and tokens as Colab Secrets, and use them across notebooks.
To do so, comment out the above manual API key prompt code block(s), and uncomment the following codeblock.
⚠️ Don’t forget granting access to the API key you would be using to the current notebook.
# import os
# from google.colab import userdata
# os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")
Step 2: Define Function to Generate User Queries
This function leverages specific tools to generate human-like queries. We’ll set up a ChatAgent to create relevant, tool-specific user queries.
def generate_user_query(selected_tool: Union[Callable, FunctionTool],
n: int = 1) -> List[str]:
r"""Generates user queries by leveraging specific tools, helping the
ChatAgent craft
human-like queries that take advantage of each tool's functionality.
Args:
selected_tool (Union[Callable, FunctionTool]): The tool to leverage for
query generation.
n (int, optional): The number of queries to generate. Defaults to 1.
"""
tool_call_sys_msg = (
"You are a smart query generator designed to utilize specific tools "
"based on user needs. "
"Formulate queries as a human user would."
"Instructions:\n"
"1. Envision a real-world scenario, but don't say it out loud."
"2. Craft a realistically phrased actionable query fitting that scenario"
" that could be satisfied with the provided tool(s)\n"
"3. With the tool(s) in mind, phrase the query naturally and "
"informatively.\n"
"4. Only rely on information from tools they appear likely to "
"provide\n"
"5. Pose the query as if the user doesn't know what tools are "
"available."
)
# Convert to FunctionTool if necessary
if not isinstance(selected_tool, FunctionTool):
selected_tool = FunctionTool(selected_tool)
# Create a model instance for generating queries
query_model = ModelFactory.create(
model_platform=ModelPlatformType.OPENAI,
model_type=ModelType.GPT_4O_MINI,
model_config_dict={"temperature": 1}
)
# Initialize ChatAgent with the system message
query_agent = ChatAgent(
system_message=tool_call_sys_msg,
model=query_model,
)
queries = []
# Prepare tools info message for guiding query generation
tools_info_message = (
"Generate a relevant query based on the following tool "
"details:\n\n" +
"\n".join(f"Tool Schema: {selected_tool.get_openai_tool_schema()}")
)
# Generate queries
for _ in range(n):
response = query_agent.step(tools_info_message)
queries.append(response.msgs[0].content) # Extract the generated query
return queries
This function will structure tool call data based on user queries by leveraging each selected tool.
def generate_tool_call_data(user_messages: List[str],
selected_tool: Union[Callable, FunctionTool]) -> \
list[Any]:
r"""Generates structured tool call data for a list of user messages by
using each specified tool in selected_tools.
"""
# Convert to FunctionTool if necessary
if not isinstance(selected_tool, FunctionTool):
selected_tool = FunctionTool(selected_tool)
# Define system message guiding ChatAgent on function calls
base_system_msg = "You are a function calling AI model. "
hermes_tool_call_sys_msg = (
f"You are a function calling AI model. You are provided with "
f"function signatures within <tools> </tools> XML tags. You may call "
f"one or more functions to assist with the user query. If available "
f"tools are not relevant in assisting with user query, just respond "
f"in natural conversational language. Don't make assumptions about "
f"what values to plug into functions. After calling & executing the "
f"functions, you will be provided with function results within "
f"<tool_response> </tool_response> XML tags."
f"\n <tools> \n"
f"{[selected_tool.get_openai_tool_schema()]}"
f"\n </tools>\n"
"For each function call return a JSON object, with the following "
"pydantic model json schema:{'title': 'FunctionCall', 'type': "
"'object', 'properties': {'name': {'title': 'Name', 'type': "
"'string'}, 'arguments': {'title': 'Arguments', 'type': 'object'}}, "
"'required': ['arguments', 'name']}\n"
f"Each function call should be enclosed within <tool_call> "
f"</tool_call> XML tags.\n"
f"Example:\n"
f"<tool_call>\n"
"{'name': <function-name>, 'arguments': <args-dict>}\n"
"</tool_call>"
f"")
sys_hermes = ShareGPTMessage(from_='system',
value=hermes_tool_call_sys_msg)
# Initialize model for tool call data generation
tool_model = ModelFactory.create(
model_platform=ModelPlatformType.OPENAI,
model_type=ModelType.GPT_4O_MINI,
)
all_tool_data = {}
tool_output_data = []
for user_message in user_messages:
# Set up ChatAgent with the system message, model, and the single tool
tool_agent = ChatAgent(
system_message=base_system_msg,
model=tool_model,
tools=[selected_tool],
)
# Generate response using ChatAgent and structured output
try:
tool_agent.step(user_message)
except Exception as e:
print(f"Error: {e}")
continue
messages = [record.memory_record.message for record in
tool_agent.memory.retrieve()]
sharegpt_hermes_msgs = \
[sys_hermes] + [msg.to_sharegpt(HermesFunctionFormatter()) for msg
in messages[1:]]
# Only include conversations with function calls
if any(type(message) == FunctionCallingMessage for message in
messages):
tool_output_data.append(
json.loads(
ShareGPTConversation(sharegpt_hermes_msgs)
.model_dump_json(by_alias=True))
)
# Add generated data to the dictionary with tool name as the key
all_tool_data[selected_tool.func.__name__] = tool_output_data
return tool_output_data
We’ll set up toolkits we want to be used
# API keys can be setup in the notebook like this
# google_api_key = getpass('Enter your API key: ')
# os.environ["GOOGLE_API_KEY"] = google_api_key
# weather_api_key = getpass('Enter your API key: ')
# os.environ["OPENWEATHERMAP_API_KEY"] = weather_api_key
selected_tools = [
*MathToolkit().get_tools(),
# Add more tools as needed, though they require API keys
*[ # Search tools with no API keys required
FunctionTool(SearchToolkit().search_duckduckgo),
FunctionTool(SearchToolkit().search_wiki),
],
# FunctionTool(SearchToolkit().search_google),
# *ArxivToolkit().get_tools(),
# *GoogleMapsToolkit().get_tools(),
# *WeatherToolkit().get_tools(),
]
Step 5: Generate Data and Save to JSON
We now loop through each tool, generate queries, create tool call data, and save it all in JSON format.
results = {
"generated_queries": [],
"tool_call_data": []
}
for selected_tool in selected_tools:
user_queries = generate_user_query(selected_tool=selected_tool, n=5)
tool_call_data = generate_tool_call_data(user_queries, selected_tool)
# Append results to the lists instead of overwriting
results["generated_queries"].extend(user_queries)
results["tool_call_data"].extend(tool_call_data)
# Specify output file path
output_file = "generated_tool_call_data.json"
# Save data to JSON file
with open(output_file, "w") as f:
json.dump(results, f, indent=4)
print(f"Data saved to {output_file}")
Step 6: Verify the JSON Output
Open the generated JSON file and verify that the tool call data has been saved correctly.
# Load and display data to ensure correctness
with open(output_file, "r") as f:
data = json.load(f)
print("Sample data:", json.dumps(data["generated_queries"][:100], indent=4)) # Display sample queries
print("\nSample tool call data:", json.dumps(data["tool_call_data"][:100], indent=4)) # Display sample tool call data
Summary
In this tutorial, you learned how to generate user queries and structure tool call data by leveraging multiple tools with the camel library. This setup enables scalable and flexible data generation for various toolkits and tasks.
⭐ Star us on GitHub, join our Discord, or follow us on X