You can also check this cookbook in colab here

Star us on GitHub, join our Discord, or follow us on X


This notebook provides a comprehensive guide to generating user queries and structured tool call data using CAMEL’s ChatAgent framework. By utilizing real tools and the Hermes JSON format for function calls, the tutorial demonstrates a structured approach to scalable and flexible data generation.

In this notebook, you’ll explore:

  • CAMEL’s ChatAgent Framework: A multi-agent system for generating human-like queries and structured tool call data, leveraging its modular and adaptable design.

  • Hermes Function Calling Format: A standardized JSON-based format for encoding function calls, ensuring consistency and interoperability in structured data outputs.

  • OpenAI API Integration: Enabling advanced natural language understanding and generation for crafting user queries and processing tool responses.

  • Toolkits Integration: Leveraging tools such as MathToolkit, SearchToolkit, and others to provide diverse functionalities for realistic scenarios.

  • Automated Data Generation: End-to-end pipeline for generating tool-specific user queries, structuring their outputs, and saving them as JSON files.

Installation

Ensure you have CAMEL AI and desired dependencies installed in your Python environment:

!pip install 'camel-ai[rag,web_tools]==0.2.18'

Step 1: Import Required Libraries and Modules

Start by importing necessary libraries and modules.

import json
from typing import Callable, List, Union, Any

# Import necessary classes and functions from camel library
from camel.agents import ChatAgent
from camel.messages import FunctionCallingMessage
from camel.messages import HermesFunctionFormatter
from camel.messages import ShareGPTConversation
from camel.messages import ShareGPTMessage
from camel.models import ModelFactory
from camel.toolkits import FunctionTool, MathToolkit, SearchToolkit, \
    RetrievalToolkit
from camel.types import ModelPlatformType, ModelType
import os
from getpass import getpass
# Prompt for the OpenAI API key securely
openai_api_key = getpass('Enter your OpenAI API key: ')
os.environ["OPENAI_API_KEY"] = openai_api_key

Alternatively, if running on Colab, you could save your API keys and tokens as Colab Secrets, and use them across notebooks.

To do so, comment out the above manual API key prompt code block(s), and uncomment the following codeblock.

⚠️ Don’t forget granting access to the API key you would be using to the current notebook.

# import os
# from google.colab import userdata

# os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")

Step 2: Define Function to Generate User Queries

This function leverages specific tools to generate human-like queries. We’ll set up a ChatAgent to create relevant, tool-specific user queries.

def generate_user_query(selected_tool: Union[Callable, FunctionTool],
                        n: int = 1) -> List[str]:
    r"""Generates user queries by leveraging specific tools, helping the
    ChatAgent craft
    human-like queries that take advantage of each tool's functionality.

    Args:
        selected_tool (Union[Callable, FunctionTool]): The tool to leverage for
        query generation.
        n (int, optional): The number of queries to generate. Defaults to 1.
    """

    tool_call_sys_msg = (
        "You are a smart query generator designed to utilize specific tools "
        "based on user needs. "
        "Formulate queries as a human user would."

        "Instructions:\n"
        "1. Envision a real-world scenario, but don't say it out loud."
        "2. Craft a realistically phrased actionable query fitting that scenario"
        " that could be satisfied with the provided tool(s)\n"
        "3. With the tool(s) in mind, phrase the query naturally and "
        "informatively.\n"
        "4. Only rely on information from tools they appear likely to "
        "provide\n"
        "5. Pose the query as if the user doesn't know what tools are "
        "available."
    )

    # Convert to FunctionTool if necessary
    if not isinstance(selected_tool, FunctionTool):
        selected_tool = FunctionTool(selected_tool)

    # Create a model instance for generating queries
    query_model = ModelFactory.create(
        model_platform=ModelPlatformType.OPENAI,
        model_type=ModelType.GPT_4O_MINI,
        model_config_dict={"temperature": 1}
    )

    # Initialize ChatAgent with the system message
    query_agent = ChatAgent(
        system_message=tool_call_sys_msg,
        model=query_model,
    )
    queries = []

    # Prepare tools info message for guiding query generation
    tools_info_message = (
            "Generate a relevant query based on the following tool "
            "details:\n\n" +
            "\n".join(f"Tool Schema: {selected_tool.get_openai_tool_schema()}")
    )

    # Generate queries
    for _ in range(n):
        response = query_agent.step(tools_info_message)
        queries.append(response.msgs[0].content)  # Extract the generated query

    return queries

Step 3: Define Function to Generate Structured Tool Call Data

This function will structure tool call data based on user queries by leveraging each selected tool.

def generate_tool_call_data(user_messages: List[str],
                            selected_tool: Union[Callable, FunctionTool]) -> \
        list[Any]:
    r"""Generates structured tool call data for a list of user messages by
    using each specified tool in selected_tools.
    """

    # Convert to FunctionTool if necessary
    if not isinstance(selected_tool, FunctionTool):
        selected_tool = FunctionTool(selected_tool)

    # Define system message guiding ChatAgent on function calls
    base_system_msg = "You are a function calling AI model. "
    hermes_tool_call_sys_msg = (
        f"You are a function calling AI model. You are provided with "
        f"function signatures within <tools> </tools> XML tags. You may call "
        f"one or more functions to assist with the user query. If available "
        f"tools are not relevant in assisting with user query, just respond "
        f"in natural conversational language. Don't make assumptions about "
        f"what values to plug into functions. After calling & executing the "
        f"functions, you will be provided with function results within "
        f"<tool_response> </tool_response> XML tags."
        f"\n <tools> \n"
        f"{[selected_tool.get_openai_tool_schema()]}"
        f"\n </tools>\n"
        "For each function call return a JSON object, with the following "
        "pydantic model json schema:{'title': 'FunctionCall', 'type': "
        "'object', 'properties': {'name': {'title': 'Name', 'type': "
        "'string'}, 'arguments': {'title': 'Arguments', 'type': 'object'}}, "
        "'required': ['arguments', 'name']}\n"
        f"Each function call should be enclosed within <tool_call> "
        f"</tool_call> XML tags.\n"
        f"Example:\n"
        f"<tool_call>\n"
        "{'name': <function-name>, 'arguments': <args-dict>}\n"
        "</tool_call>"
        f"")
    sys_hermes = ShareGPTMessage(from_='system',
                                 value=hermes_tool_call_sys_msg)
    # Initialize model for tool call data generation
    tool_model = ModelFactory.create(
        model_platform=ModelPlatformType.OPENAI,
        model_type=ModelType.GPT_4O_MINI,
    )

    all_tool_data = {}

    tool_output_data = []
    for user_message in user_messages:
        # Set up ChatAgent with the system message, model, and the single tool
        tool_agent = ChatAgent(
            system_message=base_system_msg,
            model=tool_model,
            tools=[selected_tool],
        )

        # Generate response using ChatAgent and structured output
        try:
            tool_agent.step(user_message)
        except Exception as e:
            print(f"Error: {e}")
            continue
        messages = [record.memory_record.message for record in
                    tool_agent.memory.retrieve()]

        sharegpt_hermes_msgs = \
            [sys_hermes] + [msg.to_sharegpt(HermesFunctionFormatter()) for msg
                            in messages[1:]]

        # Only include conversations with function calls
        if any(type(message) == FunctionCallingMessage for message in
               messages):
            tool_output_data.append(
                json.loads(
                    ShareGPTConversation(sharegpt_hermes_msgs)
                    .model_dump_json(by_alias=True))
            )

    # Add generated data to the dictionary with tool name as the key
    all_tool_data[selected_tool.func.__name__] = tool_output_data

    return tool_output_data

Step 4: Initialize Toolkits and Define Tools List

We’ll set up toolkits we want to be used

# API keys can be setup in the notebook like this
# google_api_key = getpass('Enter your API key: ')
# os.environ["GOOGLE_API_KEY"] = google_api_key
# weather_api_key = getpass('Enter your API key: ')
# os.environ["OPENWEATHERMAP_API_KEY"] = weather_api_key

selected_tools = [
    *MathToolkit().get_tools(),
    # Add more tools as needed, though they require API keys
    *[  # Search tools with no API keys required
        FunctionTool(SearchToolkit().search_duckduckgo),
        FunctionTool(SearchToolkit().search_wiki),
    ],
    # FunctionTool(SearchToolkit().search_google),
    # *ArxivToolkit().get_tools(),
    # *GoogleMapsToolkit().get_tools(),
    # *WeatherToolkit().get_tools(),
]

Step 5: Generate Data and Save to JSON

We now loop through each tool, generate queries, create tool call data, and save it all in JSON format.

results = {
    "generated_queries": [],
    "tool_call_data": []
}

for selected_tool in selected_tools:
    user_queries = generate_user_query(selected_tool=selected_tool, n=5)
    tool_call_data = generate_tool_call_data(user_queries, selected_tool)

    # Append results to the lists instead of overwriting
    results["generated_queries"].extend(user_queries)
    results["tool_call_data"].extend(tool_call_data)

# Specify output file path
output_file = "generated_tool_call_data.json"

# Save data to JSON file
with open(output_file, "w") as f:
    json.dump(results, f, indent=4)

print(f"Data saved to {output_file}")

Step 6: Verify the JSON Output

Open the generated JSON file and verify that the tool call data has been saved correctly.

# Load and display data to ensure correctness
with open(output_file, "r") as f:
    data = json.load(f)

print("Sample data:", json.dumps(data["generated_queries"][:100], indent=4))  # Display sample queries
print("\nSample tool call data:", json.dumps(data["tool_call_data"][:100], indent=4))  # Display sample tool call data

Summary

In this tutorial, you learned how to generate user queries and structure tool call data by leveraging multiple tools with the camel library. This setup enables scalable and flexible data generation for various toolkits and tasks.

Star us on GitHub, join our Discord, or follow us on X