{ "cells": [ { "cell_type": "markdown", "source": [ "To run this, press \"*Runtime*\" and press \"*Run all*\" on a **free** Tesla T4 Google Colab instance!\n", "
" ], "metadata": { "id": "Uh_7i19qP7bg" } }, { "cell_type": "markdown", "source": [ "For more detailed usage information, please refer to our [cookbook](https://colab.research.google.com/drive/1lYgArBw7ARVPSpdwgKLYnp_NEXiNDOd-?usp=sharing)" ], "metadata": { "id": "ouSigsSoSJR7" } }, { "cell_type": "markdown", "source": [ "### Agenetic SFT Data generation with CAMEL and finetuning Mistral mdoels with Unsloth\n", "\n", "CAMEL and Unsloth make an excellent pair. In this notebook we will combine the two to train a model to be proficient at content on a page\n", "\n", "You will learn how to do data generation with CAMEL, how to train, and how to run the model." ], "metadata": { "id": "gHNlIaljQI54" } }, { "cell_type": "markdown", "source": [ "" ], "metadata": { "id": "ldplIxguQuTl" } }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "2eSvM9zX_2d3" }, "outputs": [], "source": [ "%%capture\n", "!pip install unsloth\n", "# Install CAMEL-AI with no optional dependencies\n", "!pip install camel-ai==0.2.14\n", "# Get Unsloth\n", "!pip install --upgrade --no-deps \"unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git@0de54572525788d09a6a9ef1efc7611e65dd7547\"\n", "!pip install firecrawl" ] }, { "cell_type": "markdown", "metadata": { "id": "r2v_X2fA0Df5" }, "source": [ "First we will set the OPENAI_API_KEY that will be used to generate the data.\n", "\n", "CAMEL supports many other models. See [here](https://docs.camel-ai.org/key_modules/models.html) for a list." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "gq4Q0u0ZTvGs", "outputId": "f9f35cb9-d54a-4f64-8f7e-0c090a7c75e9" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Enter your OpenAI API key: ··········\n", "Enter your Firecrawl API key: ··········\n" ] } ], "source": [ "from getpass import getpass\n", "import os\n", "\n", "openai_api_key = getpass('Enter your OpenAI API key: ')\n", "os.environ[\"OPENAI_API_KEY\"] = openai_api_key\n", "\n", "# Generate an API key at https://www.firecrawl.dev/app/api-keys\n", "firecrawl_api_key = getpass('Enter your Firecrawl API key: ')\n", "os.environ[\"FIRECRAWL_API_KEY\"] = firecrawl_api_key" ] }, { "cell_type": "markdown", "metadata": { "id": "iP5-hPz-0T6x" }, "source": [ "Next we will setup our model for training using Unsloth." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 497, "referenced_widgets": [ "afdac2c6f67049a9b00c7d8499198f4d", "196c370e59814c4d94b91fecce193e1f", "386534e2ee6245d4931abf5a9a366dec", "52e1ab786d8d4514907d3c9b519eea86", "2f4a07efa98f4243b6f780793820bd70", "04c4fc581add4f23b99ada998599a342", "4f71f21a05124dfdb2d29f64de18bc79", "7ae08cc6b65548189cff799d47b929a2", "c62a747103824874aade983876e872b4", "e7e2b3e9223c42798ba0a3a34eb1d71e", "d542ac82ac354efc8231698f59e91da8", "e47421fec52c494598cc4c6c49641030", "c9a6a1c8c96c4cdba555697372bafb74", "b89a1c85fa0d44d9ade2f2d149e84019", "63e2e4d767044da8b2c2caacecfa8b53", "e9d91ef8c01d42e6bf7cd4cf263026f4", "53b4571358fc49e881f7583475315de2", "7d8fb825409241fe8c854bf6635ed0c6", "77823f5661cd413c922db91a1883f3af", "7f9d7ebc992e4ffdaeb040d3a3b37b41", "fbd55bebf3644df692e661e02c5d9b7c", "2f7eba09d59342b2909b77f0339cb61e", "945b1d65759a46f896dac5dc510435f8", "4526ec1d4ec745b1bc49be760743bf45", "025aad4d520a4c12be40b1ec60965f78", "f97151ccc3124f7598acb926a0b70711", "0179fd870ce04ab78d916c4111cd79a8", "53f5d541916943569bc8b1b11f8af8e3", "ac212a92b6214af49730678dff56dd80", "7ebfbbf5b43a4ef5a951363d6701baaa", "2b3aa959d2d74d569b6e060dd8a65687", "4d9ad5bbe7ea487b849b589dd8405a86", "0d9bb083a6b7410a98be4058306c1b7e", "31faaf9cf2d2410a8189bc048d1f4759", "7b3946f6cd584e1fbd584dd350c1cc31", "ed44806f6ae34cbaa96da955530f25af", "040ab5df78b041ccbd8bd8d1eab1accc", "2592bd11c93e4efd9caaec3c041e4d40", "8daa9ffcf5874c68b82ba7d6bad2afd9", "8da629cd46b34cb5a7a533b31756caab", "9feb96a46c524916a4d77410a42573ff", "9ffe1c53f79e462a972456c9263011d5", "d1d69e6a21fd4a468c2103977d69a7fd", "49d898b037f340aa9f05bb1fd7511628", "a3ef9ae9c03e4562b0f1133305795cdf", "a64abdcfa37447749b8df3ee3ac3343c", "1a443295c0be4725869a94d9413d63fa", "bc6ee36df75b4c2bb40ebc1c7e030341", "0a8761d8502540439cda9ccb39e46d15", "8a72479983604802acb0450e2f075d72", "8af7e959a0894c4eaebcc756186969e4", "850910a24d794c60b518d6b3ce30c0f1", "a87e42ebb8a9415197c4cd2d24ab156d", "0ef0b079a60a481bb471db824c63614c", "32447f1a28fa435bbf4a943274047f06", "ed6e77f6ef45430fabd16bf3a5d6d4dd", "f8a429b61bea4a9b8c78f80e6b00d6da", "fd83b0e954bd4ce680cb14c0fbe2a9c7", "a9e83f8d7f3e4f8681ac6a23fabce60e", "acc7efcd0fe243c19633e91a754d65de", "c20970bea6f14a6eb13f9bb38732fb05", "b41dd2e291b64862aca912a7641cbffb", "af88ce43bc6c4537a6107733bfc6d8f5", "67eacf7bcab7461fad095a715ce1fb76", "d3ad2412060f48bdbf069711d20fa44d", "d9a964938105455ba2b4ff8e0f9bdb21" ] }, "id": "QmUBVEnvCDJv", "outputId": "31c8b6a7-a713-46b5-d238-cef8fe7e471e" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.\n", "🦥 Unsloth Zoo will now patch everything to make training faster!\n", "==((====))== Unsloth 2024.11.6: Fast Mistral patching. Transformers = 4.47.0.\n", " \\\\ /| GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.\n", "O^O/ \\_/ \\ Pytorch: 2.5.1+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.\n", "\\ / Bfloat16 = FALSE. FA [Xformers = 0.0.28.post3. FA2 = False]\n", " \"-____-\" Free Apache license: http://github.com/unslothai/unsloth\n", "Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!\n" ] }, { "output_type": "display_data", "data": { "text/plain": [ "model.safetensors: 0%| | 0.00/4.13G [00:00, ?B/s]" ], "application/vnd.jupyter.widget-view+json": { "version_major": 2, "version_minor": 0, "model_id": "afdac2c6f67049a9b00c7d8499198f4d" } }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": [ "generation_config.json: 0%| | 0.00/155 [00:00, ?B/s]" ], "application/vnd.jupyter.widget-view+json": { "version_major": 2, "version_minor": 0, "model_id": "e47421fec52c494598cc4c6c49641030" } }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": [ "tokenizer_config.json: 0%| | 0.00/2.13k [00:00, ?B/s]" ], "application/vnd.jupyter.widget-view+json": { "version_major": 2, "version_minor": 0, "model_id": "945b1d65759a46f896dac5dc510435f8" } }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": [ "tokenizer.model: 0%| | 0.00/493k [00:00, ?B/s]" ], "application/vnd.jupyter.widget-view+json": { "version_major": 2, "version_minor": 0, "model_id": "31faaf9cf2d2410a8189bc048d1f4759" } }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": [ "special_tokens_map.json: 0%| | 0.00/438 [00:00, ?B/s]" ], "application/vnd.jupyter.widget-view+json": { "version_major": 2, "version_minor": 0, "model_id": "a3ef9ae9c03e4562b0f1133305795cdf" } }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": [ "tokenizer.json: 0%| | 0.00/1.80M [00:00, ?B/s]" ], "application/vnd.jupyter.widget-view+json": { "version_major": 2, "version_minor": 0, "model_id": "ed6e77f6ef45430fabd16bf3a5d6d4dd" } }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Unsloth: Offloading input_embeddings to disk to save VRAM\n" ] }, { "output_type": "stream", "name": "stderr", "text": [ "/usr/local/lib/python3.10/dist-packages/unsloth/models/_utils.py:697: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.\n", " offloaded_W = torch.load(filename, map_location = \"cpu\", mmap = True)\n" ] }, { "output_type": "stream", "name": "stdout", "text": [ "Unsloth: Offloading output_embeddings to disk to save VRAM\n" ] }, { "output_type": "stream", "name": "stderr", "text": [ "Unsloth 2024.11.6 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.\n" ] }, { "output_type": "stream", "name": "stdout", "text": [ "Unsloth: Training embed_tokens in mixed precision to save VRAM\n", "Unsloth: Training lm_head in mixed precision to save VRAM\n" ] } ], "source": [ "from unsloth import FastLanguageModel\n", "import torch\n", "max_seq_length = 4096\n", "dtype = None\n", "load_in_4bit = True\n", "\n", "model, tokenizer = FastLanguageModel.from_pretrained(\n", " model_name = \"unsloth/mistral-7b-instruct-v0.2-bnb-4bit\",\n", " max_seq_length = max_seq_length,\n", " dtype = dtype,\n", " load_in_4bit = load_in_4bit,\n", ")\n", "model = FastLanguageModel.get_peft_model(\n", " model,\n", " r = 16,\n", " target_modules = [\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\n", " \"gate_proj\", \"up_proj\", \"down_proj\",\n", " \"embed_tokens\", \"lm_head\"],\n", " lora_alpha = 16,\n", " use_gradient_checkpointing = \"unsloth\",\n", " random_state = 3407,\n", " use_rslora = False, # We support rank stabilized LoRA\n", " loftq_config = None, # And LoftQ\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "RCb9quu7bcGE" }, "source": [ "Now as a control, lets see how this model does with our CAMEL-specific question" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "YqQz8vdrbbrF", "outputId": "5f19d578-56ef-4c01-a1ed-d0bd402f21c5" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "[\"Step | \n", "Training Loss | \n", "
---|---|
1 | \n", "7.342400 | \n", "
2 | \n", "7.534600 | \n", "
3 | \n", "9.473700 | \n", "
4 | \n", "4.720600 | \n", "
5 | \n", "5.680200 | \n", "
6 | \n", "3.815700 | \n", "
7 | \n", "5.580000 | \n", "
8 | \n", "4.242500 | \n", "
9 | \n", "5.318600 | \n", "
10 | \n", "4.605900 | \n", "
11 | \n", "5.133800 | \n", "
12 | \n", "3.160600 | \n", "
13 | \n", "2.873900 | \n", "
14 | \n", "2.490700 | \n", "
15 | \n", "2.824400 | \n", "
16 | \n", "8.742200 | \n", "
17 | \n", "11.397600 | \n", "
18 | \n", "3.157300 | \n", "
19 | \n", "3.475900 | \n", "
20 | \n", "2.405500 | \n", "
21 | \n", "1.764400 | \n", "
22 | \n", "4.982500 | \n", "
23 | \n", "3.095300 | \n", "
24 | \n", "1.628900 | \n", "
25 | \n", "1.476900 | \n", "
26 | \n", "0.821400 | \n", "
27 | \n", "0.791900 | \n", "
28 | \n", "0.864900 | \n", "
29 | \n", "0.781200 | \n", "
30 | \n", "0.491400 | \n", "
31 | \n", "0.648500 | \n", "
32 | \n", "0.412500 | \n", "
33 | \n", "0.450900 | \n", "
34 | \n", "0.313500 | \n", "
35 | \n", "0.394900 | \n", "
36 | \n", "0.227000 | \n", "
37 | \n", "0.213300 | \n", "
38 | \n", "0.202700 | \n", "
39 | \n", "0.190500 | \n", "
40 | \n", "0.086400 | \n", "
"
]
},
"metadata": {}
}
],
"source": [
"dtrainer_stats = trainer.train()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ekOmTR1hSNcr"
},
"source": [
"\n",
"### Inference\n",
"Let's run the model! You can change the instruction and input - leave the output blank!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "kR3gIAX-SM2q",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "4a94b91e-43ef-4cee-a412-f5d9ea6a10bb"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[' ### Instruction:\\nExplain how can I stay up to date with the CAMEL community.\\n\\n### Input:\\n\\n\\n### Response:\\nTo ensure that contributors can keep informed about upcoming meetings and community engagement, the CAMEL community emphasizes the importance of joining communication channels such as Discord, WeChat, and Slack.']"
]
},
"metadata": {},
"execution_count": 11
}
],
"source": [
"FastLanguageModel.for_inference(model) # Enable native 2x faster inference\n",
"inputs = tokenizer(\n",
"[\n",
"\n",
" AlpacaItem(\n",
" instruction=\"Explain how can I stay up to date with the CAMEL community.\",\n",
" input=\"\",\n",
" output=\"\", # leave this blank for generation!\n",
" ).to_string()\n",
"\n",
"], return_tensors = \"pt\").to(\"cuda\")\n",
"\n",
"outputs = model.generate(**inputs, max_new_tokens = 512, use_cache = True)\n",
"tokenizer.batch_decode(outputs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xSepmhPrgOct"
},
"source": [
"**Summary**\n",
"\n",
"\n",
"We have generated realistic user queries and responses from a real page and trained on them to produce a model that understands the underlying content."
]
},
{
"cell_type": "markdown",
"source": [
"That's everything: Got questions about 🐫 CAMEL-AI? Join us on [Discord](https://discord.camel-ai.org)! Whether you want to share feedback, explore the latest in multi-agent systems, get support, or connect with others on exciting projects, we’d love to have you in the community! 🤝\n",
"\n",
"Check out some of our other work:\n",
"\n",
"1. 🐫 Creating Your First CAMEL Agent [free Colab](https://docs.camel-ai.org/cookbooks/create_your_first_agent.html)\n",
"\n",
"2. Graph RAG Cookbook [free Colab](https://colab.research.google.com/drive/1uZKQSuu0qW6ukkuSv9TukLB9bVaS1H0U?usp=sharing)\n",
"\n",
"3. 🧑⚖️ Create A Hackathon Judge Committee with Workforce [free Colab](https://colab.research.google.com/drive/18ajYUMfwDx3WyrjHow3EvUMpKQDcrLtr?usp=sharing)\n",
"\n",
"4. 🔥 3 ways to ingest data from websites with Firecrawl & CAMEL [free Colab](https://colab.research.google.com/drive/1lOmM3VmgR1hLwDKdeLGFve_75RFW0R9I?usp=sharing)\n",
"\n",
"5. 🦥 Agentic SFT Data Generation with CAMEL and Meta Models, Fine-Tuned with Unsloth [free Colab](https://colab.research.google.com/drive/1fdBns2QA1XNwF_tsvG3Hc27QGdViHH3b?usp=sharing)\n",
"\n",
"6. 🦥 Agentic SFT Data Generation with CAMEL and Qwen Models, Fine-Tuned with Unsloth [free Colab](https://colab.research.google.com/drive/1sMnWOvdmASEMhsRIOUSAeYuEywby6FRV?usp=sharing)\n",
"\n",
"Thanks from everyone at 🐫 CAMEL-AI\n",
"\n",
"\n",
"