Role-Playing Scraper for Report & Knowledge Graph Generation
You can also check this cookbook in colab here
⭐ Star us on Github, join our Discord or follow our X
This notebook demonstrates how to set up and leverage CAMEL’s Retrieval-Augmented Generation (RAG) combined with Firecrawl for efficient web scraping, multi-agent role-playing tasks, and knowledge graph construction. We will walk through an example of conducting a comprehensive study of the Turkish shooter in the 2024 Paris Olympics by using Mistral’s models.
In this notebook, you’ll explore:
- CAMEL: A powerful multi-agent framework that enables Retrieval-Augmented Generation and multi-agent role-playing scenarios, allowing for sophisticated AI-driven tasks.
- Mistral: Utilized for its state-of-the-art language models, which enable tool-calling capabilities to execute external functions, while its powerful embeddings are employed for semantic search and content retrieval.
- Firecrawl: A robust web scraping tool that simplifies extracting and cleaning content from various web pages.
- AgentOps: Track and analysis the running of CAMEL Agents.
- Qdrant: An efficient vector storage system used with CAMEL’s AutoRetriever to store and retrieve relevant information based on vector similarities.
- Neo4j: A leading graph database management system used for constructing and storing knowledge graphs, enabling complex relationships between entities to be mapped and queried efficiently.
- DuckDuckGo Search: Utilized within the SearchToolkit to gather relevant URLs and information from the web, serving as the primary search engine for retrieving initial content.
- Unstructured IO: Used for content chunking, facilitating the management of unstructured data for more efficient processing.
This setup not only demonstrates a practical application but also serves as a flexible framework that can be adapted for various scenarios requiring advanced web information retrieval, AI collaboration, and multi-source data aggregation.
⭐ Star the Repo
If you find CAMEL useful or interesting, please consider giving it a star on our CAMEL GitHub Repo! Your stars help others find this project and motivate us to continue improving it.
📦 Installation
First, install the CAMEL package with all its dependencies:
🔑 Setting Up API Keys
You’ll need to set up your API keys for Mistral AI, Firecrawl and AgentOps. This ensures that the tools can interact with external services securely.
You can go to here to get free API Key from AgentOps
Your can go to here to get API Key from Mistral AI with free credits.
Set up the Mistral Large 2 model using the CAMEL ModelFactory. You can also configure other models as needed.
Your can go to here to get API Key from Firecrawl with free credits.
Alternatively, if running on Colab, you could save your API keys and tokens as Colab Secrets, and use them across notebooks.
To do so, comment out the above manual API key prompt code block(s), and uncomment the following codeblock.
⚠️ Don’t forget granting access to the API key you would be using to the current notebook.
🌐 Web Scraping with Firecrawl
Firecrawl is a powerful tool that simplifies web scraping and cleaning content from web pages. In this section, we will scrape content from a specific post on the CAMEL AI website as an example.
🎉 Firecrawl makes obtaining clean, LLM-friendly content from URL effortless!
🛠️ Web Information Retrieval using CAMEL’s RAG and Firecrawl
In this section, we’ll demonstrate how to retrieve relevant information from a list of URLs using CAMEL’s RAG model. This is particularly useful for aggregating and analyzing data from multiple sources.
Setting Up Firecrawl with CAMEL’s RAG
The following function retrieves relevant information from a list of URLs based on a given query. It combines web scraping with Firecrawl and CAMEL’s AutoRetriever for a seamless information retrieval process.
Let’s put the retrieval function to the test by gathering some information about the 2024 Olympics. The first run may take about 50 seconds as it needs to build a local vector database.
🎉 Thanks to CAMEL’s RAG pipeline and Firecrawl’s tidy scraping capabilities, this function effectively retrieves relevant information from the specified URLs! You can now integrate this function into CAMEL’s Agents to automate the retrieval process further.
📹 Monitoring AI Agents with AgentOps
🧠 Knowledge Graph Construction
A powerful feature of CAMEL is its ability to build and store knowledge graphs from text data. This allows for advanced analysis and visualization of relationships within the data.
Set up your Neo4j instance by providing the URL, username, and password, here is the guidance, check your credentials in the downloaded .txt file. Note that you may need to wait up to 60 seconds if the instance has just been set up.
🤖🤖 Multi-Agent Role-Playing with CAMEL
This section sets up a role-playing session where AI agents interact to accomplish a task using various tools. We will guide the assistant agent to perform a comprehensive study of the Turkish shooter in the 2024 Paris Olympics.
Defining the Task Prompt
We will configure the assistant agent with tools for mathematical calculations, web information retrieval, and knowledge graph building.
Setting Up the Role-Playing Session
Print the system message and task prompt
Set the termination rule and start the interaction between agents
NOTE: This session will take approximately 8 minutes and will consume around 60k tokens by using Mistral Large 2 Model.
🎉 Go to the AgentOps link shown above, you will be able to see the detailed record for this running like below.
NOTE: The AgentOps link is private and tied to the AgentOps account. To access the link, you’ll need to run the session using your own AgentOps API Key, which will then allow you to open the link with the session’s running information.
Currently AgentOps can’t get the running cost for Mistral AI directly.
🎉 You can also go the the Neo4j Aura to check the knowledge graph generated by CAMEL’s Agent like below.
🌟 Highlights
This notebook has guided you through setting up and running a CAMEL RAG workflow with Firecrawl for a complex, multi-agent role-playing task. You can adapt and expand this example for various other scenarios requiring advanced web information retrieval and AI collaboration.
Key tools utilized in this notebook include:
- CAMEL: A powerful multi-agent framework that enables Retrieval-Augmented Generation and multi-agent role-playing scenarios, allowing for sophisticated AI-driven tasks.
- Mistral: Utilized for its state-of-the-art language models, which enable tool-calling capabilities to execute external functions, while its powerful embeddings are employed for semantic search and content retrieval.
- Firecrawl: A robust web scraping tool that simplifies extracting and cleaning content from various web pages.
- AgentOps: Track and analysis the running of CAMEL Agents.
- Qdrant: An efficient vector storage system used with CAMEL’s AutoRetriever to store and retrieve relevant information based on vector similarities.
- Neo4j: A leading graph database management system used for constructing and storing knowledge graphs, enabling complex relationships between entities to be mapped and queried efficiently.
- DuckDuckGo Search: Utilized within the SearchToolkit to gather relevant URLs and information from the web, serving as the primary search engine for retrieving initial content.
- Unstructured IO: Used for content chunking, facilitating the management of unstructured data for more efficient processing.
This comprehensive setup allows you to adapt and expand the example for various scenarios requiring advanced web information retrieval, AI collaboration, and multi-source data aggregation.
CAMEL also support advanced GraphRAG, for more information please check here
⭐ Star the Repo
If you find CAMEL useful or interesting, please consider giving it a star on GitHub! Your stars help others find this project and motivate us to continue improving it.