RAG Cookbook#

You can also check this cookbook in colab here

Overview#

In this notebook, we show the useage of CAMEL Retrieve Module in both customized way and auto way. We will also show how to combine AutoRetriever with ChatAgent, and further combine AutoRetriever with RolePlaying by using Function Calling.

4 main parts included: - Customized RAG - Auto RAG - Single Agent with Auto RAG - Role-playing with Auto RAG

Installation#

Ensure you have CAMEL AI installed in your Python environment:

[ ]:
!pip install "camel-ai[all]==0.2.1"

Load Data#

Let’s first load the CAMEL paper from https://arxiv.org/pdf/2303.17760.pdf. This will be our local example data.

[2]:
import os
import requests

os.makedirs('local_data', exist_ok=True)

url = "https://arxiv.org/pdf/2303.17760.pdf"
response = requests.get(url)
with open('local_data/camel_paper.pdf', 'wb') as file:
     file.write(response.content)

1. Customized RAG#

In this section we will set our customized RAG pipeline, we will take VectorRetriever as an example. Set embedding model, we will use OpenAIEmbedding as the embedding model, so we need to set the OPENAI_API_KEY in below.

[3]:
from getpass import getpass
# Prompt for the OpenAI API key securely
openai_api_key = getpass('Enter your API key: ')
os.environ["OPENAI_API_KEY"] = openai_api_key
Enter your API key: ··········

Import and set the embedding instance:

[4]:
from camel.embeddings import OpenAIEmbedding
from camel.types import EmbeddingModelType

embedding_instance = OpenAIEmbedding(model_type=EmbeddingModelType.TEXT_EMBEDDING_3_LARGE)

Import and set the vector storage instance:

[5]:
from camel.storages import QdrantStorage

storage_instance = QdrantStorage(
    vector_dim=embedding_instance.get_output_dim(),
    path="local_data",
    collection_name="camel_paper",
)

Import and set the retriever instance:

[6]:
from camel.retrievers import VectorRetriever

vector_retriever = VectorRetriever(embedding_model=embedding_instance,
                                   storage=storage_instance)

We use integrated Unstructured Module to splite the content into small chunks, the content will be splited automacitlly with its chunk_by_title function, the max character for each chunk is 500 characters, which is a suitable length for OpenAIEmbedding. All the text in the chunks will be embed and stored to the vector storage instance, it will take some time, please wait..

[ ]:
vector_retriever.process(
    content="local_data/camel_paper.pdf",
)

Now we can retrieve information from the vector storage by giving a query. By default it will give you back the text content from top 1 chunk with highest Cosine similarity score, and the similarity score should be higher than 0.75 to ensure the retrieved content is relevant to the query. You can also change the top_k value and similarity_threshold value with your needs.

The returned dictionary list includes: - similarity score - content path - metadata - text

[8]:
retrieved_info = vector_retriever.query(
    query="To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named role-playing .",
    top_k=1
)
print(retrieved_info)
[{'similarity score': '0.8733044970215764', 'content path': 'local_data/camel_paper.pdf', 'metadata': {'last_modified': '2024-09-28T03:37:11', 'filetype': 'application/pdf', 'page_number': 10}, 'text': '6 Conclusion In this paper, we explore the potential of autonomous cooperation among communicative agents and propose a novel cooperative agent framework named role-playing . Our approach enables communicative agents to collaborate autonomously toward completing tasks while requiring minimal human intervention, leading to better solutions are per our thorough evaluations. Through our analysis, we show that achieving autonomous cooperation is challenging due to issues like conversation deviation,'}]

Let’s try an irrelevant query:

[9]:
retrieved_info_irrevelant = vector_retriever.query(
    query="Compared with dumpling and rice, which should I take for dinner?",
    top_k=1,
)

print(retrieved_info_irrevelant)
[{'text': 'No suitable information retrieved from local_data/camel_paper.pdf with similarity_threshold = 0.7.'}]

2. Auto RAG#

In this section we will run the AutoRetriever with default settings. It uses OpenAIEmbedding as default embedding model and Milvus as default vector storage.

What you need to do is: - Set content input paths, which can be local paths or remote urls - Set remote url and api key for Milvus - Give a query

The Auto RAG pipeline would create collections for given content input paths, the collection name will be set automaticlly based on the content input path name, if the collection exists, it will do the retrieve directly.

[10]:
from camel.retrievers import AutoRetriever
from camel.types import StorageType

auto_retriever = AutoRetriever(
        vector_storage_local_path="local_data2/",
        storage_type=StorageType.QDRANT,
        embedding_model=embedding_instance)

retrieved_info = auto_retriever.run_vector_retriever(
    query="If I'm interest in contributing to the CAMEL projec, what should I do?",
    contents=[
        "local_data/camel_paper.pdf",  # example local path
        "https://github.com/camel-ai/camel/wiki/Contributing-Guidlines",  # example remote url
    ],
    top_k=1,
    return_detailed_info=True,
    similarity_threshold=0.5
)

print(retrieved_info)
{'Original Query': "If I'm interest in contributing to the CAMEL projec, what should I do?", 'Retrieved Context': [{'similarity score': '0.7054343316726102', 'content path': 'https://github.com/camel-ai/camel/wiki/Contributing-Guidlines', 'metadata': {'filetype': 'text/html', 'languages': ['eng'], 'page_number': 1, 'url': 'https://github.com/camel-ai/camel/wiki/Contributing-Guidlines', 'link_urls': ['https://join.slack.com/t/camel-kwr1314/shared_invite/zt-1vy8u9lbo-ZQmhIAyWSEfSwLCl2r2eKA'], 'link_texts': ['slack']}, 'text': "Thank you for your interest in contributing to the CAMEL project! 🎉 We're excited to have your support. As an open-source initiative in a rapidly evolving and open-ended field, we wholeheartedly welcome contributions of all kinds. Whether you want to introduce new features, enhance the infrastructure, improve documentation, asking issues, add more examples, implement state-of-the-art research ideas, or fix bugs, we appreciate your enthusiasm and efforts. 🙌  You are welcome to join our slack for "}]}

3. Single Agent with Auto RAG#

In this section we will show how to combine the AutoRetriever with one ChatAgent.

Let’s set an agent function, in this function we can get the response by providing a query to this agent.

[11]:
from camel.agents import ChatAgent
from camel.messages import BaseMessage
from camel.types import RoleType
from camel.retrievers import AutoRetriever
from camel.types import StorageType

def single_agent(query: str) ->str :
    # Set agent role
    assistant_sys_msg = BaseMessage(
        role_name="Assistant",
        role_type=RoleType.ASSISTANT,
        meta_dict=None,
        content="""You are a helpful assistant to answer question,
         I will give you the Original Query and Retrieved Context,
        answer the Original Query based on the Retrieved Context,
        if you can't answer the question just say I don't know.""",
    )

    # Add auto retriever
    auto_retriever = AutoRetriever(
            vector_storage_local_path="local_data2/",
            storage_type=StorageType.QDRANT,
            embedding_model=embedding_instance)

    retrieved_info = auto_retriever.run_vector_retriever(
        query=query,
        contents=[
            "local_data/camel_paper.pdf",  # example local path
            "https://github.com/camel-ai/camel/wiki/Contributing-Guidlines",  # example remote url
        ],
        top_k=1,
        return_detailed_info=False,
        similarity_threshold=0.5
    )

    # Pass the retrieved infomation to agent
    user_msg = BaseMessage.make_user_message(role_name="User", content=str(retrieved_info))
    agent = ChatAgent(assistant_sys_msg)

    # Get response
    assistant_response = agent.step(user_msg)
    return assistant_response.msg.content

print(single_agent("If I'm interest in contributing to the CAMEL projec, what should I do?"))
If you're interested in contributing to the CAMEL project, you can introduce new features, enhance the infrastructure, improve documentation, address issues, add more examples, implement state-of-the-art research ideas, or fix bugs. Additionally, you are welcome to join their Slack for further engagement and collaboration.

4. Role-playing with Auto RAG#

In this section we will show how to combine the RETRIEVAL_FUNCS with RolePlaying by applying Function Calling.

[19]:
from typing import List
from colorama import Fore

from camel.agents.chat_agent import FunctionCallingRecord
from camel.configs import ChatGPTConfig
from camel.toolkits import (
    MathToolkit,
    RetrievalToolkit,
)
from camel.societies import RolePlaying
from camel.types import ModelType, ModelPlatformType
from camel.utils import print_text_animated
from camel.models import ModelFactory

def role_playing_with_rag(
    task_prompt,
    model_platform=ModelPlatformType.OPENAI,
    model_type=ModelType.GPT_4O,
    chat_turn_limit=5,
) -> None:
    task_prompt = task_prompt

    user_model_config = ChatGPTConfig(temperature=0.0)

    tools_list = [
        *MathToolkit().get_tools(),
        *RetrievalToolkit().get_tools(),
    ]
    assistant_model_config = ChatGPTConfig(
        tools=tools_list,
        temperature=0.0,
    )

    role_play_session = RolePlaying(
        assistant_role_name="Searcher",
        user_role_name="Professor",
        assistant_agent_kwargs=dict(
            model=ModelFactory.create(
                model_platform=model_platform,
                model_type=model_type,
                model_config_dict=assistant_model_config.as_dict(),
            ),
            tools=tools_list,
        ),
        user_agent_kwargs=dict(
            model=ModelFactory.create(
                model_platform=model_platform,
                model_type=model_type,
                model_config_dict=user_model_config.as_dict(),
            ),
        ),
        task_prompt=task_prompt,
        with_task_specify=False,
    )

    print(
        Fore.GREEN
        + f"AI Assistant sys message:\n{role_play_session.assistant_sys_msg}\n"
    )
    print(
        Fore.BLUE + f"AI User sys message:\n{role_play_session.user_sys_msg}\n"
    )

    print(Fore.YELLOW + f"Original task prompt:\n{task_prompt}\n")
    print(
        Fore.CYAN
        + "Specified task prompt:"
        + f"\n{role_play_session.specified_task_prompt}\n"
    )
    print(Fore.RED + f"Final task prompt:\n{role_play_session.task_prompt}\n")

    n = 0
    input_msg = role_play_session.init_chat()
    while n < chat_turn_limit:
        n += 1
        assistant_response, user_response = role_play_session.step(input_msg)

        if assistant_response.terminated:
            print(
                Fore.GREEN
                + (
                    "AI Assistant terminated. Reason: "
                    f"{assistant_response.info['termination_reasons']}."
                )
            )
            break
        if user_response.terminated:
            print(
                Fore.GREEN
                + (
                    "AI User terminated. "
                    f"Reason: {user_response.info['termination_reasons']}."
                )
            )
            break

        # Print output from the user
        print_text_animated(
            Fore.BLUE + f"AI User:\n\n{user_response.msg.content}\n"
        )

        # Print output from the assistant, including any function
        # execution information
        print_text_animated(Fore.GREEN + "AI Assistant:")
        tool_calls: List[FunctionCallingRecord] = [
            FunctionCallingRecord(**call.as_dict())
            for call in assistant_response.info['tool_calls']
        ]
        for func_record in tool_calls:
            print_text_animated(f"{func_record}")
        print_text_animated(f"{assistant_response.msg.content}\n")

        if "CAMEL_TASK_DONE" in user_response.msg.content:
            break

        input_msg = assistant_response.msg

Run the role-playing with defined retriever function:

[22]:
role_playing_with_rag(task_prompt = """If I'm interest in contributing to the CAMEL projec and I encounter some challenges during the setup process, what should I do? You should refer to the content in url https://github.com/camel-ai/camel/wiki/Contributing-Guidlines to answer my question, don't generate the answer by yourself, adjust the similarity threshold to lower value is necessary""")
AI Assistant sys message:
BaseMessage(role_name='Searcher', role_type=<RoleType.ASSISTANT: 'assistant'>, meta_dict={'task': "If I'm interest in contributing to the CAMEL projec and I encounter some challenges during the setup process, what should I do? You should refer to the content in url https://github.com/camel-ai/camel/wiki/Contributing-Guidlines to answer my question, don't generate the answer by yourself, adjust the similarity threshold to lower value is necessary", 'assistant_role': 'Searcher', 'user_role': 'Professor'}, content="===== RULES OF ASSISTANT =====\nNever forget you are a Searcher and I am a Professor. Never flip roles! Never instruct me!\nWe share a common interest in collaborating to successfully complete a task.\nYou must help me to complete the task.\nHere is the task: If I'm interest in contributing to the CAMEL projec and I encounter some challenges during the setup process, what should I do? You should refer to the content in url https://github.com/camel-ai/camel/wiki/Contributing-Guidlines to answer my question, don't generate the answer by yourself, adjust the similarity threshold to lower value is necessary. Never forget our task!\nI must instruct you based on your expertise and my needs to complete the task.\n\nI must give you one instruction at a time.\nYou must write a specific solution that appropriately solves the requested instruction and explain your solutions.\nYou must decline my instruction honestly if you cannot perform the instruction due to physical, moral, legal reasons or your capability and explain the reasons.\nUnless I say the task is completed, you should always start with:\n\nSolution: <YOUR_SOLUTION>\n\n<YOUR_SOLUTION> should be very specific, include detailed explanations and provide preferable detailed implementations and examples and lists for task-solving.\nAlways end <YOUR_SOLUTION> with: Next request.", video_bytes=None, image_list=None, image_detail='auto', video_detail='low')

AI User sys message:
BaseMessage(role_name='Professor', role_type=<RoleType.USER: 'user'>, meta_dict={'task': "If I'm interest in contributing to the CAMEL projec and I encounter some challenges during the setup process, what should I do? You should refer to the content in url https://github.com/camel-ai/camel/wiki/Contributing-Guidlines to answer my question, don't generate the answer by yourself, adjust the similarity threshold to lower value is necessary", 'assistant_role': 'Searcher', 'user_role': 'Professor'}, content='===== RULES OF USER =====\nNever forget you are a Professor and I am a Searcher. Never flip roles! You will always instruct me.\nWe share a common interest in collaborating to successfully complete a task.\nI must help you to complete the task.\nHere is the task: If I\'m interest in contributing to the CAMEL projec and I encounter some challenges during the setup process, what should I do? You should refer to the content in url https://github.com/camel-ai/camel/wiki/Contributing-Guidlines to answer my question, don\'t generate the answer by yourself, adjust the similarity threshold to lower value is necessary. Never forget our task!\nYou must instruct me based on my expertise and your needs to solve the task ONLY in the following two ways:\n\n1. Instruct with a necessary input:\nInstruction: <YOUR_INSTRUCTION>\nInput: <YOUR_INPUT>\n\n2. Instruct without any input:\nInstruction: <YOUR_INSTRUCTION>\nInput: None\n\nThe "Instruction" describes a task or question. The paired "Input" provides further context or information for the requested "Instruction".\n\nYou must give me one instruction at a time.\nI must write a response that appropriately solves the requested instruction.\nI must decline your instruction honestly if I cannot perform the instruction due to physical, moral, legal reasons or my capability and explain the reasons.\nYou should instruct me not ask me questions.\nNow you must start to instruct me using the two ways described above.\nDo not add anything else other than your instruction and the optional corresponding input!\nKeep giving me instructions and necessary inputs until you think the task is completed.\nWhen the task is completed, you must only reply with a single word <CAMEL_TASK_DONE>.\nNever say <CAMEL_TASK_DONE> unless my responses have solved your task.', video_bytes=None, image_list=None, image_detail='auto', video_detail='low')

Original task prompt:
If I'm interest in contributing to the CAMEL projec and I encounter some challenges during the setup process, what should I do? You should refer to the content in url https://github.com/camel-ai/camel/wiki/Contributing-Guidlines to answer my question, don't generate the answer by yourself, adjust the similarity threshold to lower value is necessary

Specified task prompt:
None

Final task prompt:
If I'm interest in contributing to the CAMEL projec and I encounter some challenges during the setup process, what should I do? You should refer to the content in url https://github.com/camel-ai/camel/wiki/Contributing-Guidlines to answer my question, don't generate the answer by yourself, adjust the similarity threshold to lower value is necessary

AI User:

Instruction: Visit the URL provided to access the CAMEL project's contributing guidelines.
Input: https://github.com/camel-ai/camel/wiki/Contributing-Guidlines


AI Assistant:

Function Execution: information_retrieval
        Args: {'query': 'contributing guidelines for CAMEL project', 'contents': 'https://github.com/camel-ai/camel/wiki/Contributing-Guidlines'}
        Result: {'Original Query': 'contributing guidelines for CAMEL project', 'Retrieved Context': ['No suitable information retrieved from https://github.com/camel-ai/camel/wiki/Contributing-Guidlines with similarity_threshold = 0.7.']}

Solution: It appears that no suitable information was retrieved from the provided URL with the current similarity threshold. I will adjust the similarity threshold to a lower value to attempt to retrieve the relevant information.

Next request.


AI User:

Instruction: Adjust the similarity threshold to a lower value and attempt to retrieve the relevant information from the provided URL again.
Input: https://github.com/camel-ai/camel/wiki/Contributing-Guidlines


AI Assistant:

Function Execution: information_retrieval
        Args: {'query': 'contributing guidelines for CAMEL project', 'contents': 'https://github.com/camel-ai/camel/wiki/Contributing-Guidlines', 'similarity_threshold': 0.5}
        Result: {'Original Query': 'contributing guidelines for CAMEL project', 'Retrieved Context': ["Thank you for your interest in contributing to the CAMEL project! 🎉 We're excited to have your support. As an open-source initiative in a rapidly evolving and open-ended field, we wholeheartedly welcome contributions of all kinds. Whether you want to introduce new features, enhance the infrastructure, improve documentation, asking issues, add more examples, implement state-of-the-art research ideas, or fix bugs, we appreciate your enthusiasm and efforts. 🙌  You are welcome to join our slack for "]}

Solution: Here is the relevant information retrieved from the provided URL:

"Thank you for your interest in contributing to the CAMEL project! 🎉 We're excited to have your support. As an open-source initiative in a rapidly evolving and open-ended field, we wholeheartedly welcome contributions of all kinds. Whether you want to introduce new features, enhance the infrastructure, improve documentation, ask issues, add more examples, implement state-of-the-art research ideas, or fix bugs, we appreciate your enthusiasm and efforts. 🙌 You are welcome to join our Slack for..."

This indicates that the CAMEL project is open to various types of contributions, including new features, infrastructure enhancements, documentation improvements, issue reporting, adding examples, implementing research ideas, and bug fixes. Additionally, it suggests joining their Slack for further engagement.

Next request.


AI User:

Instruction: Look for specific instructions or steps provided in the contributing guidelines for setting up the development environment or resolving setup challenges.
Input: https://github.com/camel-ai/camel/wiki/Contributing-Guidlines


AI Assistant:

Function Execution: information_retrieval
        Args: {'query': 'setup development environment or resolving setup challenges for CAMEL project', 'contents': 'https://github.com/camel-ai/camel/wiki/Contributing-Guidlines', 'similarity_threshold': 0.5}
        Result: {'Original Query': 'setup development environment or resolving setup challenges for CAMEL project', 'Retrieved Context': ['Quick Start 🚀\n\nTo get started with CAMEL, follow these steps:\n\n# Clone github repo\ngit clone https://github.com/camel-ai/camel.git\n\n# Change directory into project directory\n\ncd camel\n\n# Activate camel virtual environment\npoetry shell\n\n# Install camel from source\n\n# It takes about 75s to resolve dependencies and 10s to install them, depending on your hardware and network,\npoetry install --with dev,docs\n\n# Or if you want to use all other extra packages\npoetry install --with dev,docs -E all']}

Solution: Here are the specific instructions for setting up the development environment for the CAMEL project:

### Quick Start 🚀

To get started with CAMEL, follow these steps:

1. **Clone the GitHub repository:**
   ```sh
   git clone https://github.com/camel-ai/camel.git
   ```

2. **Change directory into the project directory:**
   ```sh
   cd camel
   ```

3. **Activate the CAMEL virtual environment:**
   ```sh
   poetry shell
   ```

4. **Install CAMEL from source:**
   ```sh
   poetry install --with dev,docs
   ```
   - It takes about 75 seconds to resolve dependencies and 10 seconds to install them, depending on your hardware and network.

5. **Alternatively, if you want to use all other extra packages:**
   ```sh
   poetry install --with dev,docs -E all
   ```

These steps should help you set up the development environment for the CAMEL project. If you encounter any challenges during the setup process, these instructions should guide you through the initial setup.

Next request.


AI User:

Instruction: Follow the provided steps to set up the development environment for the CAMEL project. If you encounter any specific challenges during the setup process, document them.
Input: None


AI Assistant:

Solution: As an AI, I am unable to physically perform tasks such as setting up a development environment. However, I can guide you through the process and help you troubleshoot any issues you might encounter.

Here are the steps you should follow to set up the development environment for the CAMEL project:

1. **Clone the GitHub repository:**
   ```sh
   git clone https://github.com/camel-ai/camel.git
   ```

2. **Change directory into the project directory:**
   ```sh
   cd camel
   ```

3. **Activate the CAMEL virtual environment:**
   ```sh
   poetry shell
   ```

4. **Install CAMEL from source:**
   ```sh
   poetry install --with dev,docs
   ```
   - It takes about 75 seconds to resolve dependencies and 10 seconds to install them, depending on your hardware and network.

5. **Alternatively, if you want to use all other extra packages:**
   ```sh
   poetry install --with dev,docs -E all
   ```

### Troubleshooting Common Issues

- **Issue: `poetry: command not found`**
  - **Solution:** Ensure that Poetry is installed on your system. You can install it by following the instructions on the [Poetry website](https://python-poetry.org/docs/#installation).

- **Issue: Network-related errors during `poetry install`**
  - **Solution:** Check your internet connection and try again. If the issue persists, you might want to use a different network or check if there are any firewall restrictions.

- **Issue: Dependency resolution taking too long**
  - **Solution:** Ensure that your system has sufficient resources (CPU, memory). You can also try running the command during off-peak hours when network traffic is lower.

- **Issue: Virtual environment activation issues**
  - **Solution:** Ensure that you have the necessary permissions to create and activate virtual environments. You might need to run the command with elevated privileges (e.g., using `sudo` on Unix-based systems).

If you encounter any specific challenges during the setup process, please document them, and I can help you troubleshoot further.

Next request.


AI User:

Instruction: Attempt to set up the development environment for the CAMEL project using the provided steps. Document any specific challenges you encounter during the process.
Input: None


AI Assistant:

Solution: As an AI, I am unable to physically perform tasks such as setting up a development environment. However, I can provide a detailed guide and troubleshooting tips to help you through the process. Here’s a step-by-step guide along with potential challenges and their solutions:

### Step-by-Step Guide to Set Up the Development Environment

1. **Clone the GitHub repository:**
   ```sh
   git clone https://github.com/camel-ai/camel.git
   ```
   - **Potential Challenge:** If you encounter an error like `git: command not found`, ensure that Git is installed on your system. You can download and install Git from [here](https://git-scm.com/downloads).

2. **Change directory into the project directory:**
   ```sh
   cd camel
   ```
   - **Potential Challenge:** If you get an error like `No such file or directory`, ensure that the repository was cloned successfully and that you are in the correct directory.

3. **Activate the CAMEL virtual environment:**
   ```sh
   poetry shell
   ```
   - **Potential Challenge:** If you encounter an error like `poetry: command not found`, ensure that Poetry is installed on your system. You can install it by following the instructions on the [Poetry website](https://python-poetry.org/docs/#installation).

4. **Install CAMEL from source:**
   ```sh
   poetry install --with dev,docs
   ```
   - **Potential Challenge:** If you encounter network-related errors during `poetry install`, check your internet connection and try again. If the issue persists, you might want to use a different network or check if there are any firewall restrictions.

5. **Alternatively, if you want to use all other extra packages:**
   ```sh
   poetry install --with dev,docs -E all
   ```
   - **Potential Challenge:** If dependency resolution takes too long, ensure that your system has sufficient resources (CPU, memory). You can also try running the command during off-peak hours when network traffic is lower.

### Documenting Specific Challenges

If you encounter any specific challenges during the setup process, document them as follows:

1. **Describe the Issue:**
   - Example: "Encountered an error `poetry: command not found` when trying to activate the virtual environment."

2. **Steps Taken to Resolve:**
   - Example: "Checked if Poetry was installed by running `poetry --version`. Found that Poetry was not installed. Installed Poetry using the command `curl -sSL https://install.python-poetry.org | python3 -`."

3. **Outcome:**
   - Example: "After installing Poetry, I was able to activate the virtual environment successfully."

By following these steps and documenting any challenges, you should be able to set up the development environment for the CAMEL project. If you encounter any issues that you cannot resolve, feel free to ask for further assistance.

Next request.