CoT Data Generation and SFT Qwen With Unsloth#

To run this, press β€œRuntime” and press β€œRun all” on a free Tesla T4 Google Colab instance!

70b2b19d411c4c2cbd25d9df139a128f 9c0f27150bbf4fdd854d27dfb5db5bac

⭐ Star us on Github, join our Discord or follow our X

You can also check this cookbook in colab here (Use the colab share link)

This notebook demonstrates how to set up and leverage CAMEL’s CoTDataGenerator for generating high-quality question-answer pairs like o1 thinking data, fine-tuning a language model using Unsloth, and uploading the results to Hugging Face.

In this notebook, you’ll explore:

  • CAMEL: A powerful multi-agent framework that enables SFT data generation and multi-agent role-playing scenarios, allowing for sophisticated AI-driven tasks.

  • CoTDataGenerator: A tool for generating like o1 thinking data.

  • Unsloth: An efficient library for fine-tuning large language models with LoRA (Low-Rank Adaptation) and other optimization techniques.

  • Hugging Face Integration: Uploading datasets and fine-tuned models to the Hugging Face platform for sharing.

image.png

πŸ“¦ Installation#

[ ]:
%%capture
!pip install camel-ai==0.2.16

Unsloth require GPU environment, To install Unsloth on your own computer, follow the installation instructions here.#

[ ]:
%%capture
!pip install unsloth
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
[ ]:
import os
from datetime import datetime
import json
from camel.datagen.cotdatagen import CoTDataGenerator

πŸ”‘ Setting Up API Keys#

First we will set the OPENAI_API_KEY that will be used to generate the data.

[ ]:
from getpass import getpass
[ ]:
openai_api_key = getpass('Enter your OpenAI API key: ')
os.environ["OPENAI_API_KEY"] = openai_api_key
Enter your OpenAI API key: Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·

Set ChatAgent#

Create a system message to define agent’s default role and behaviors.

[ ]:
sys_msg = 'You are a genius at slow-thinking data and code'

Use ModelFactory to set up the backend model for agent

CAMEL supports many other models. See here for a list.

[ ]:
from camel.models import ModelFactory
from camel.types import ModelPlatformType, ModelType
from camel.configs import ChatGPTConfig
[ ]:
# Define the model, here in this case we use gpt-4o-mini
model = ModelFactory.create(
    model_platform=ModelPlatformType.OPENAI,
    model_type=ModelType.GPT_4O_MINI,
    model_config_dict=ChatGPTConfig().as_dict(), # [Optional] the config for model
)
[ ]:
from camel.agents import ChatAgent
chat_agent = ChatAgent(
    system_message=sys_msg,
    model=model,
    message_window_size=10,
)

Load Q&A data from a JSON file#

please prepare the qa data like below in json file:#

β€˜β€™β€™ { β€œquestion1”: β€œanswer1”, β€œquestion2”: β€œanswer2”, … } β€˜β€™β€™

[ ]:
!pwd
/content

The script fetches a example JSON file containing question-answer pairs from a GitHub repository and saves it locally. The JSON file is then loaded into the qa_data variable.

[ ]:
#get example json data

import requests
import json

# URL of the JSON file
url = 'https://raw.githubusercontent.com/zjrwtx/alldata/refs/heads/main/qa_data.json'

# Send a GET request to fetch the JSON file
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Parse the response content as JSON
    json_data = response.json()

    # Specify the file path to save the JSON data
    file_path = 'qa_data.json'

    # Write the JSON data to the file
    with open(file_path, 'w', encoding='utf-8') as json_file:
        json.dump(json_data, json_file, ensure_ascii=False, indent=4)

    print(f"JSON data successfully saved to {file_path}")
else:
    print(f"Failed to retrieve JSON file. Status code: {response.status_code}")
JSON data successfully saved to qa_data.json
[ ]:
with open(file_path, 'r', encoding='utf-8') as f:
    qa_data = json.load(f)

Create an instance of CoTDataGenerator#

[ ]:
# Create an instance of CoTDataGenerator
testo1 = CoTDataGenerator(chat_agent, golden_answers=qa_data)
[ ]:
# Record generated answers
generated_answers = {}

Test Q&A#

The script iterates through the questions, generates answers, and verifies their correctness. The generated answers are stored in a dictionary

[ ]:
# Test Q&A
for question in qa_data.keys():
    print(f"Question: {question}")

    # Get AI's thought process and answer
    answer = testo1.get_answer(question)
    generated_answers[question] = answer
    print(f"AI's thought process and answer:\n{answer}")

    # Verify the answer
    is_correct = testo1.verify_answer(question, answer)
    print(f"Answer verification result: {'Correct' if is_correct else 'Incorrect'}")
    print("-" * 50)
    print()  # Add a new line at the end of each iteration
Question: What is the coefficient of $x^2y^6$ in the expansion of $\left(\frac{3}{5}x-\frac{y}{2}\right)^8$?  Express your answer as a common fraction
AI's thought process and answer:
To find the coefficient of \(x^2y^6\) in the expansion of \(\left(\frac{3}{5}x - \frac{y}{2}\right)^8\), we will follow a systematic approach.

### Step 1: Analyze the Problem Requirements
We need to expand the expression \(\left(\frac{3}{5}x - \frac{y}{2}\right)^8\) and identify the coefficient of the term \(x^2y^6\). This requires us to use the binomial theorem, which states that:

\[
(a + b)^n = \sum_{k=0}^{n} \binom{n}{k} a^{n-k} b^k
\]

In our case, \(a = \frac{3}{5}x\), \(b = -\frac{y}{2}\), and \(n = 8\).

### Step 2: List the Steps to Solve the Problem
1. Identify the values of \(a\), \(b\), and \(n\).
2. Use the binomial theorem to express the expansion.
3. Determine the specific term that corresponds to \(x^2y^6\).
4. Calculate the coefficient of that term.

### Step 3: Execute the Solution Process
1. **Identify \(a\), \(b\), and \(n\)**:
   - \(a = \frac{3}{5}x\)
   - \(b = -\frac{y}{2}\)
   - \(n = 8\)

2. **Use the Binomial Theorem**:
   The general term in the expansion is given by:

\[
\binom{n}{k} a^{n-k} b^k
\]

Substituting our values, we have:

\[
\binom{8}{k} \left(\frac{3}{5}x\right)^{8-k} \left(-\frac{y}{2}\right)^k
\]

3. **Determine the specific term for \(x^2y^6\)**:
   We need \(x^2\) and \(y^6\). This means:
   - The power of \(x\) is \(2\), so \(8 - k = 2\) which gives \(k = 6\).
   - The power of \(y\) is \(6\), which matches our \(k\).

4. **Calculate the coefficient**:
   Now we substitute \(k = 6\) into the general term:

\[
\binom{8}{6} \left(\frac{3}{5}x\right)^{2} \left(-\frac{y}{2}\right)^{6}
\]

Calculating each part:
- \(\binom{8}{6} = \binom{8}{2} = \frac{8 \times 7}{2 \times 1} = 28\)
- \(\left(\frac{3}{5}\right)^{2} = \frac{9}{25}\)
- \(\left(-\frac{1}{2}\right)^{6} = \frac{1}{64}\)

Now, putting it all together:

\[
\text{Coefficient} = 28 \cdot \frac{9}{25} \cdot \frac{1}{64}
\]

Calculating this step-by-step:

1. Multiply \(28\) and \(9\):

\[
28 \cdot 9 = 252
\]

2. Now multiply by \(\frac{1}{25}\):

\[
\frac{252}{25}
\]

3. Finally, multiply by \(\frac{1}{64}\):

\[
\frac{252}{25 \cdot 64} = \frac{252}{1600}
\]

### Step 4: Simplify the Fraction
To simplify \(\frac{252}{1600}\), we find the greatest common divisor (GCD) of \(252\) and \(1600\). The prime factorization gives:

- \(252 = 2^2 \cdot 3^2 \cdot 7\)
- \(1600 = 2^6 \cdot 5^2\)

The GCD is \(4\). Dividing both the numerator and denominator by \(4\):

\[
\frac{252 \div 4}{1600 \div 4} = \frac{63}{400}
\]

### Final Answer
Thus, the coefficient of \(x^2y^6\) in the expansion of \(\left(\frac{3}{5}x - \frac{y}{2}\right)^8\) is:

\[
\boxed{\frac{63}{400}}
\]
Answer verification result: Correct
--------------------------------------------------

Question: how many a in banana?
AI's thought process and answer:
Sure! Let's break down the problem of counting how many times the letter "a" appears in the word "banana" step by step.

### Step 1: Analyze the Problem Requirements
The problem requires us to determine the frequency of the letter "a" in the word "banana." We need to:
- Identify the target letter, which is "a."
- Count how many times this letter appears in the given word.

### Step 2: List the Steps to Solve the Problem
To solve the problem, we can follow these steps:
1. Write down the word "banana."
2. Identify each letter in the word.
3. Count the occurrences of the letter "a."
4. Summarize the count.

### Step 3: Execute the Solution Process
Now, let's execute the steps we outlined:

1. The word we are analyzing is **"banana."**
2. The letters in "banana" are:
   - b
   - a
   - n
   - a
   - n
   - a
3. Now, we will count the occurrences of the letter "a":
   - The first letter is **b** (not "a").
   - The second letter is **a** (count = 1).
   - The third letter is **n** (not "a").
   - The fourth letter is **a** (count = 2).
   - The fifth letter is **n** (not "a").
   - The sixth letter is **a** (count = 3).

### Step 4: Provide the Final Answer
After counting, we find that the letter "a" appears **3 times** in the word "banana."

### Summary of Thought Process
- We started by understanding the requirement: counting a specific letter in a word.
- We broke down the problem into manageable steps, ensuring clarity in our approach.
- We executed the steps methodically, ensuring we counted each occurrence accurately.
- Finally, we summarized our findings to provide a clear answer.

Thus, the final answer is that there are **3 occurrences of the letter "a" in the word "banana."**
Answer verification result: Correct
--------------------------------------------------

Export the generated answers to a JSON file and transform these to Alpaca traing data format#

[ ]:
simplified_output = {
    'timestamp': datetime.now().isoformat(),
    'qa_pairs': generated_answers
}
simplified_file = f'generated_answers_{datetime.now().strftime("%Y%m%d_%H%M%S")}.json'
with open(simplified_file, 'w', encoding='utf-8') as f:
    json.dump(simplified_output, f, ensure_ascii=False, indent=2)
print(f"The generated answers have been exported to: {simplified_file}")
The generated answers have been exported to: generated_answers_20250111_114951.json

The script transforms the Q&A data into the Alpaca training data format, which is suitable for supervised fine-tuning (SFT). The transformed data is saved to a new JSON file.

[ ]:
import json
from datetime import datetime

def transform_qa_format(input_file):
    # Read the input JSON file
    with open(input_file, 'r', encoding='utf-8') as f:
        data = json.load(f)

    # Transform the data
    transformed_data = []
    for question, answer in data['qa_pairs'].items():
        transformed_pair = {
            "instruction": question,
            "input": "",
            "output": answer
        }
        transformed_data.append(transformed_pair)

    # Generate output filename with timestamp
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    output_file = f'transformed_qa_{timestamp}.json'

    # Write the transformed data
    with open(output_file, 'w', encoding='utf-8') as f:
        json.dump(transformed_data, f, ensure_ascii=False, indent=2)

    return output_file, transformed_data
[ ]:
output_file, transformed_data = transform_qa_format(simplified_file)
print(f"Transformation complete. Output saved to: {output_file}")
Transformation complete. Output saved to: transformed_qa_20250111_115000.json

Upload the Data to Huggingface#

This defines a function upload_to_huggingface that uploads a dataset to Hugging Face. The script is modular, with helper functions handling specific tasks such as dataset name generation, dataset creation, metadata card creation, and record addition

[ ]:
# Import necessary modules and classes
from camel.datahubs.huggingface import HuggingFaceDatasetManager  # Manages interactions with Hugging Face datasets
from camel.datahubs.models import Record  # Represents a single record in the dataset
from datetime import datetime  # Handles date and time operations

# Main function: Upload dataset to Hugging Face
def upload_to_huggingface(transformed_data, username, dataset_name=None):
    r"""Uploads transformed data to the Hugging Face dataset platform.

    Args:
        transformed_data (list): Transformed data, typically a list of dictionaries.
        username (str): Hugging Face username.
        dataset_name (str, optional): Custom dataset name.

    Returns:
        str: URL of the uploaded dataset.
    """
    # Initialize HuggingFaceDatasetManager to interact with Hugging Face datasets
    manager = HuggingFaceDatasetManager()

    # Generate or validate the dataset name
    dataset_name = generate_or_validate_dataset_name(username, dataset_name)

    # Create the dataset on Hugging Face and get the dataset URL
    dataset_url = create_dataset(manager, dataset_name)

    # Create a dataset card to add metadata
    create_dataset_card(manager, dataset_name, username)

    # Convert the transformed data into a list of Record objects
    records = create_records(transformed_data)

    # Add the Record objects to the dataset
    add_records_to_dataset(manager, dataset_name, records)

    # Return the dataset URL
    return dataset_url

# Generate or validate the dataset name
def generate_or_validate_dataset_name(username, dataset_name):
    r"""Generates a default dataset name or validates and formats a user-provided name.

    Args:
        username (str): Hugging Face username.
        dataset_name (str, optional): User-provided custom dataset name.

    Returns:
        str: Formatted dataset name.
    """
    if dataset_name is None:
        # If no dataset name is provided, generate a default name with the username and current date
        dataset_name = f"{username}/qa-dataset-{datetime.now().strftime('%Y%m%d')}"
    else:
        # If a dataset name is provided, format it to include the username
        dataset_name = f"{username}/{dataset_name}"
    return dataset_name

# Create a dataset on Hugging Face
def create_dataset(manager, dataset_name):
    r"""Creates a new dataset on Hugging Face and returns the dataset URL.

    Args:
        manager (HuggingFaceDatasetManager): Instance of HuggingFaceDatasetManager.
        dataset_name (str): Name of the dataset.

    Returns:
        str: URL of the created dataset.
    """
    print(f"Creating dataset: {dataset_name}")
    # Use HuggingFaceDatasetManager to create the dataset
    dataset_url = manager.create_dataset(name=dataset_name)
    print(f"Dataset created: {dataset_url}")
    return dataset_url

# Create a dataset card with metadata
def create_dataset_card(manager, dataset_name, username):
    r"""Creates a dataset card to add metadata

    Args:
        manager (HuggingFaceDatasetManager): Instance of HuggingFaceDatasetManager.
        dataset_name (str): Name of the dataset.
        username (str): Hugging Face username.
    """
    print("Creating dataset card...")
    # Use HuggingFaceDatasetManager to create the dataset card
    manager.create_dataset_card(
        dataset_name=dataset_name,
        description="Question-Answer dataset generated by CAMEL CoTDataGenerator",  # Dataset description
        license="mit",  # Dataset license
        language=["en"],  # Dataset language
        size_category="<1MB",  # Dataset size category
        version="0.1.0",  # Dataset version
        tags=["camel", "question-answering"],  # Dataset tags
        task_categories=["question-answering"],  # Dataset task categories
        authors=[username]  # Dataset authors
    )
    print("Dataset card created successfully.")

# Convert transformed data into Record objects
def create_records(transformed_data):
    r"""Converts transformed data into a list of Record objects.

    Args:
        transformed_data (list): Transformed data, typically a list of dictionaries.

    Returns:
        list: List of Record objects.
    """
    records = []
    # Iterate through the transformed data and convert each dictionary into a Record object
    for item in transformed_data:
        record = Record(**item)  # Use the dictionary key-value pairs to create a Record object
        records.append(record)
    return records

# Add Record objects to the dataset
def add_records_to_dataset(manager, dataset_name, records):
    r"""Adds a list of Record objects to the dataset.

    Args:
        manager (HuggingFaceDatasetManager): Instance of HuggingFaceDatasetManager.
        dataset_name (str): Name of the dataset.
        records (list): List of Record objects.
    """
    print("Adding records to the dataset...")
    # Use HuggingFaceDatasetManager to add the records to the dataset
    manager.add_records(dataset_name=dataset_name, records=records)
    print("Records added successfully.")

Config Access Token of Huggingface#

You can go to here to get API Key from Huggingface

image.png
[ ]:
HUGGING_FACE_TOKEN = getpass('Enter your HUGGING_FACE_TOKEN: ')
os.environ["HUGGING_FACE_TOKEN"] = HUGGING_FACE_TOKEN
Enter your HUGGING_FACE_TOKEN: Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·
[ ]:
# Set your personal huggingface config, then upload to HuggingFace
username = input("Enter your HuggingFace username: ")
dataset_name = input("Enter dataset name (press Enter to use default): ").strip()
if not dataset_name:
    dataset_name = None

try:
    dataset_url = upload_to_huggingface(transformed_data, username, dataset_name)
    print(f"\nData successfully uploaded to HuggingFace!")
    print(f"Dataset URL: {dataset_url}")
except Exception as e:
    print(f"Error uploading to HuggingFace: {str(e)}")

Enter your HuggingFace username: zjrwtxtechstudio
Enter dataset name (press Enter to use default): cotdata01
Creating dataset: zjrwtxtechstudio/cotdata01
Dataset created: https://huggingface.co/datasets/zjrwtxtechstudio/cotdata01
Creating dataset card...
Dataset card created successfully.
Adding records to the dataset...
Records added successfully.

Data successfully uploaded to HuggingFace!
Dataset URL: https://huggingface.co/datasets/zjrwtxtechstudio/cotdata01
image.png

Configure the Unsloth environment#

choose the base model#

[ ]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 15 trillion tokens model 2x faster!
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # We also uploaded 4bit for 405b!
    "unsloth/Mistral-Nemo-Base-2407-bnb-4bit", # New Mistral 12b 2x faster!
    "unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit",
    "unsloth/mistral-7b-v0.3-bnb-4bit",        # Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3.5-mini-instruct",           # Phi-3.5 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    # Can select any from the below:
    # "unsloth/Qwen2.5-0.5B", "unsloth/Qwen2.5-1.5B", "unsloth/Qwen2.5-3B"
    # "unsloth/Qwen2.5-14B",  "unsloth/Qwen2.5-32B",  "unsloth/Qwen2.5-72B",
    # And also all Instruct versions and Math. Coding verisons!
    model_name = "unsloth/Qwen2.5-1.5B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
πŸ¦₯ Unsloth: Will patch your computer to enable 2x faster free finetuning.
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
πŸ¦₯ Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.1.5: Fast Qwen2 patching. Transformers: 4.47.1.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 7.5. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!#

[ ]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)
Unsloth 2025.1.5 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.

Convert CoT data into an SFT-compliant training data format#

[ ]:
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

from datasets import load_dataset
dataset = load_dataset("zjrwtxtechstudio/o1data06", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)

### Train the model Now let’s use Huggingface TRL’s SFTTrainer! More docs here: TRL SFT docs. We do 60 steps to speed things up, but you can set num_train_epochs=1 for a full run, and turn off max_steps=None. We also support TRL’s DPOTrainer!

[ ]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)
[ ]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")
GPU = Tesla T4. Max memory = 14.748 GB.
1.535 GB of memory reserved.

Start model training#

[ ]:
trainer_stats = trainer.train()
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 12 | Num Epochs = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 18,464,768
[60/60 03:07, Epoch 30/60]
Step Training Loss
1 0.747600
2 0.803200
3 0.729800
4 0.752100
5 0.690500
6 0.532300
7 0.565500
8 0.421100
9 0.398400
10 0.378300
11 0.322400
12 0.267700
13 0.225400
14 0.221800
15 0.165200
16 0.167600
17 0.135000
18 0.131100
19 0.105400
20 0.116300
21 0.081000
22 0.095600
23 0.082300
24 0.041800
25 0.044300
26 0.069300
27 0.035900
28 0.056600
29 0.040600
30 0.029200
31 0.036600
32 0.019900
33 0.027400
34 0.020000
35 0.023700
36 0.017500
37 0.013100
38 0.026700
39 0.017100
40 0.012900
41 0.011200
42 0.015800
43 0.011500
44 0.010600
45 0.009600
46 0.008800
47 0.009400
48 0.007300
49 0.008300
50 0.007600
51 0.008300
52 0.005800
53 0.007400
54 0.006100
55 0.007500
56 0.005300
57 0.005800
58 0.008200
59 0.007300
60 0.005300

[ ]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")
204.267 seconds used for training.
3.4 minutes used for training.
Peak reserved memory = 2.207 GB.
Peak reserved memory for training = 0.672 GB.
Peak reserved memory % of max memory = 14.965 %.
Peak reserved memory for training % of max memory = 4.557 %.

Let’s run the model! You can change the instruction and input - leave the output blank!

[ ]:
# alpaca_prompt is copied from above
FastLanguageModel.for_inference(model)  # Enable native 2x faster inference

# Prepare the input for inference
inputs = tokenizer(
    [
        alpaca_prompt.format(
            "how many r in strawberry?",  # Instruction
            "",  # Input (empty for this example)
            "",  # Output (leave blank for generation)
        )
    ],
    return_tensors="pt"
).to("cuda")

# Generate the output
outputs = model.generate(
    **inputs,
    max_new_tokens=4096,  # Maximum number of tokens to generate
    use_cache=True        # Use cache for faster inference
)

# Decode the generated output and clean it
decoded_outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)

# Print the cleaned output
print(decoded_outputs[0])  # Print the first (and only) output
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
how many r in strawberry?

### Input:


### Response:
To determine how many times the letter "r" appears in the word "strawberry," we can follow a systematic approach.

### Step 1: Analyze the problem requirements
The problem requires us to count the occurrences of the letter "r" in the word "strawberry." This is a straightforward counting problem, and we need to ensure that we consider both uppercase and lowercase letters if applicable. However, since "strawberry" is entirely lowercase, we will focus on that.

### Step 2: List the steps to solve the problem
1. Write down the word "strawberry."
2. Identify the letter we need to count, which is "r."
3. Go through each letter in the word and count how many times "r" appears.
4. Provide the final count.

### Step 3: Execute the solution process
1. The word is "strawberry."
2. We are looking for the letter "r."
3. Now, let's break down the word letter by letter:
   - s
   - t
   - r (1st occurrence)
   - a
   - w
   - b
   - e
   - r (2nd occurrence)
   - r (3rd occurrence)
   - y

4. As we go through the letters, we find:
   - The first "r" is the 3rd letter.
   - The second "r" is the 8th letter.
   - The third "r" is the 9th letter.

### Step 4: Provide the final answer
After counting, we find that the letter "r" appears **3 times** in the word "strawberry."

Thus, the final answer is:

\[
\boxed{3}
\]

Here are the results of the official Qwen2.5-1.5b-instruct demo answering the same question:Qwen2.5-1.5b-instruct-demo

image.png

### Saving, loading finetuned models To save the final model as LoRA adapters, either use Huggingface’s push_to_hub for an online save or save_pretrained for a local save.

[NOTE] This ONLY saves the LoRA adapters, and not the full model.

[ ]:
model.save_pretrained("lora_model") # Local saving
tokenizer.save_pretrained("lora_model")
model.push_to_hub("zjrwtxtechstudio/qwen2.5-1.5b-cot", token = "  ") # Online saving
tokenizer.push_to_hub("zjrwtxtechstudio/qwen2.5-1.5b-cot", token = "  ") # Online saving
Saved model to https://huggingface.co/zjrwtxtechstudio/qwen2.5-1.5b-cot

Now if you want to load the LoRA adapters we just saved for inference, set False to True:

[ ]:
if True:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "zjrwtxtechstudio/qwen2.5-1.5b-cot", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference

# alpaca_prompt = You MUST copy from above!

inputs = tokenizer(
[
    alpaca_prompt.format(
        "which one is bigger bewteen 9.11 and 9.9?", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 4098)
==((====))==  Unsloth 2025.1.5: Fast Qwen2 patching. Transformers: 4.47.1.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 7.5. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
which one is bigger bewteen 9.11 and 9.9?

### Input:


### Response:
To determine which of the two numbers, 9.11 and 9.9, is bigger, we can follow a systematic approach.

### Step 1: Analyze the problem requirements
The problem requires us to compare two decimal numbers and identify which one is larger. This is a straightforward comparison problem that involves understanding the place value of decimal digits.

### Step 2: List the steps to solve the problem
1. Write down the two numbers: 9.11 and 9.9.
2. Compare the digits from the leftmost to the rightmost, paying attention to the place value of each digit.
3. Since both numbers have the same integer part (9), we only need to compare the decimal parts.
4. The digit in the tenths place for 9.11 is 1, and for 9.9 is 9. Since 1 is less than 9, 9.11 is less than 9.9.
5. Therefore, 9.9 is larger than 9.11.

### Step 3: Execute the comparison process
- The integer part of both numbers is the same, which is 9.
- Now, we focus on the decimal parts:
  - 9.11 has a 1 in the tenths place.
  - 9.9 has a 9 in the tenths place.
- As mentioned, 1 is less than 9.

### Step 4: Provide the final answer
Based on the comparison, we conclude that 9.9 is bigger than 9.11.

Thus, the final answer is:

\[
\boxed{9.9}
\]<|endoftext|>

🌟 Highlights#

Through this notebook demonstration, we showcased how to use the CoTDataGenerator from the CAMEL framework to generate high-quality question-answer data and efficiently fine-tune language models using the Unsloth library. The entire process covers the end-to-end workflow from data generation, model fine-tuning, to model deployment, demonstrating how to leverage modern AI tools and platforms to build and optimize question-answering systems.

Key Takeaways:#

Data Generation: Using CoTDataGenerator from CAMEL, we were able to generate high-quality question-answer data similar to o1 thinking. This data can be used for training and evaluating question-answering systems.

Model Fine-Tuning: With the Unsloth library, we were able to fine-tune large language models with minimal computational resources. By leveraging LoRA (Low-Rank Adaptation) technology, we only needed to update a small portion of the model parameters, significantly reducing the resources required for training.

Data and Model Upload: We uploaded the generated data and fine-tuned models to the Hugging Face platform for easy sharing and deployment. Hugging Face provides powerful dataset management and model hosting capabilities, making the entire process more efficient and convenient.

Inference and Deployment: After fine-tuning the model, we used it for inference to generate high-quality answers. By saving and loading LoRA adapters, we can easily deploy and use the fine-tuned model in different environments.

That’s everything: Got questions about 🐫 CAMEL-AI? Join us on Discord! Whether you want to share feedback, explore the latest in multi-agent systems, get support, or connect with others on exciting projects, we’d love to have you in the community! 🀝

Check out some of our other work:

  1. 🐫 Creating Your First CAMEL Agent free Colab

  2. Graph RAG Cookbook free Colab

  3. πŸ§‘β€βš–οΈ Create A Hackathon Judge Committee with Workforce free Colab

  4. πŸ”₯ 3 ways to ingest data from websites with Firecrawl & CAMEL free Colab

  5. πŸ¦₯ Agentic SFT Data Generation with CAMEL and Mistral Models, Fine-Tuned with Unsloth free Colab

Thanks from everyone at 🐫 CAMEL-AI

4d0374bfba2843da826da048a2edefd4 5a601812dddc45ed895617eaba040863

⭐ Star us on Github , join our Discord or follow our X ⭐